Algorithmic Accountability: The Big Problems | SAP

Tom Slee (SAP); Algorithmic Accountability: The Big Problems; Their Blog; 2017-10.

tl;dr → You have problems, SAP has expertise in this practice area. Call now.

Original Sources

Yvonne Baur, Brenda Reid, Steve Hunt, Fawn Fitter (SAP); How AI Can End Bias; In Their Other Blog, entitled The D!gitalist; 2017-01-16.
Teaser: Harmful human bias—both intentional and unconscious—can be avoided with the help of artificial intelligence, but only if we teach it to play fair and constantly question the results.

Mentions

  • The Canon is rehearsed.
  • General Data Protection Regulation (GDPR)
    • European
    • “in effect in” 2018 (2018-05-28).

Indictment
Anti-patterns, Negative (Worst) Practices

  • Bad statistics
  • Ill-defined scales
  • Bad Incentives
  • Lack of transparency

Five Axes of Unfairness
Unfairness ↔ Disparate Impact

  1. Target variables
  2. Training data
  3. Feature selection
  4. Proxies
  5. Masking

Remediation

  • Explanation
  • Transparency
  • Audits
  • Fairness

Who

  • Solon Barocas, self [Princeton]
    Trade: theorist.
  • Cynthia Dwork, self [Microsoft]
    Trade: pioneer [theorist]..
  • Seth Flaxman, staff, Oxford University.
    Trade: expert.
  • Bryce Goodman, staff, Oxford University.
    Trade: expert.
  • Cathy O’Neil, self.
    Trade: data scientist statistician who works on a Macintosh Computer and lives in San Francisco.
  • Frank Pasquale, professor, law [Maryland]
    Ttrade: educator.
  • Andrew Selbst, self [U.S. .Court of Appeals]
    Trade: theorist

Referenced

We’ll see you, anon | The Economist

We’ll see you, anon; staff; In The Economist; 2015-08-15.
Teaser: Can big databases be kept both anonymous and useful?

Mentions

  • Hook: the story
    • Hustler, New York City
    • Anthony Tockar
    • Taxi Rides
  • De-identification
  • Re-identification
  • homomorphic encryption
  • The U.S. Census Bureau has used differential privacy
  • <quote>Google is employing it at the moment as part of a project in which a browser plug-in gathers lots of data about a user’s software, all the while guaranteeing anonymity.</quote>
  • Cynthia Dwork, Microsoft

Quoted

for color, background & verisimilitude

  • Viktor Mayer-Schönberger of the Oxford Internet Institute,
  • Paul Ohm of Georgetown University, in Washington, DC.
  • Tim Althoff, staff? (postdoc?), Stanford University.
  • Daniel Barth-Jones, an epidemiologist at Columbia University, in New York.
  • Salil Vadhan, Center for Research on Computation and Society at Harvard.
  • Jane Naylor, Office for National Statistics (ONS), United Kingdom

Differential Privacy: A Survey of Results | Cynthia Dwork

Cynthia Dwork (Microsoft); Differential Privacy: A Survey of Results; Manindra Agrawal, Angsheng Li (editors); In Theory and Applications of Models of Computations (TAMC); 2008; 19 pages.

Abstract

Over the past five years a new approach to privacy-preserving data analysis has born fruit [13, 18, 7, 19, 5, 37, 35, 8, 32]. This approach differs from much (but not all!) of the related literature in the statistics, databases, theory, and cryptography communities, in that a formal and ad omnia privacy guarantee is defined, and the data analysis techniques presented are rigorously proved to satisfy the guarantee. The key privacy guarantee that has emerged is differential privacy. Roughly speaking, this ensures that (almost, and quantifiably) no risk is incurred by joining a statistical database. In this survey, we recall the definition of differential privacy and two basic techniques for achieving it. We then show some interesting applications of these techniques, presenting algorithms for three specific tasks and three general results on differentially private learning.

Highlighted

  1. B. Barak, K. Chaudhuri, C. Dwork, S. Kale, F. McSherry, K. Talwar. Privacy, Accuracy, and Consistency Too. A Holistic Solution to Contingency Table Release. In Proceedings of the 26th Symposium on Principles of Database Systems, pages 273–282 (2007)
  2. A. Blum, C. Dwork, F. McSherry, K. Nissim. Practical Privacy. The SuLQ framework. In Proceedings of the 24th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (2005-06)
  3. A. Blum, K. Ligett, A. Roth. A Learning Theory Approach to Non-Interactive Database Privacy. In Proceedings of the 40th ACM SIGACT Symposium on Thoery of Computing (2008)
  4. I. Dinur, K. Nissim. Revealing Information While Preserving Privacy. In Proceedings of the Twenty-Second ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pp. 202–210 (2003)
  5. C. Dwork, K. Nissim. Privacy-Preserving Datamining on Vertically Partitioned Databases. In M. Franklin (editor), Proceedings of CRYPTO, 2004. LNCS, vol. 3152, pp. 528–544. Springer, Heidelberg (2004)
  6. C. Dwork, F. McSherry, K. Nissim, A. Smith. Calibrating Noise to Sensitivity in Private Data Analysis. In Proceedings of the 3rd Theory of Cryptography Conference, pp. 265–284 (2006)
  7. S. Kasiviswanathan, H. Lee, K. Nissim, S. Raskhodnikova, S. Smith. What Can We Learn Privately? (manuscript, 2007)
  8. F. McSherry, K. Talwar. Mechanism Design via Differential Privacy. In Proceedings of the 48th Annual Symposium on Foundations of Computer Science (FOCS) (2007)
  9. K. Nissim, S. Raskhodnikova, A. Smith. Smooth Sensitivity and Sampling in Private Data Analysis. In Proceedings of the 39th ACM Symposium on Theory of Computing, pages 75–84 (2007)

References

  1. J.O. Achugbue, F.Y. Chin. The Effectiveness of Output Modification by Rounding for Protection of Statistical Databases. In INFOR 17(3), 209–218 (1979)
  2. N.R. Adam, J.C. Wortmann: Security-Control Methods for Statistical Databases. A Comparative Study. In omputing Surveys, ACM, 21(4), 515–556 (1989)
  3. D. Agrawal, C. Aggarwal: On the Design and Quantification of Privacy Preserving Data Mining Algorithms. In Proceedings of the 20th Symposium on Principles of Database Systems (2001)
  4. R. Agrawal, R. Srikant. Privacy-Preserving Data Mining. In Proceedings of the ACM SIGMOD Conference on Management of Data, pages 439–450 (2000)
  5. B. Barak, K. Chaudhuri, C. Dwork, S. Kale, F. McSherry, K. Talwar. Privacy, Accuracy, and Consistency Too. A Holistic Solution to Contingency Table Release. In Proceedings of the 26th Symposium on Principles of Database Systems, pages 273–282 (2007)
  6. L.L. Beck. A Security Mechanism for Statistical Databases. In Transactions on Database Systems, ACM, 5(3), 316–338 (1980)
  7. A. Blum, C. Dwork, F. McSherry, K. Nissim. Practical Privacy. The SuLQ framework. In Proceedings of the 24th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (2005-06)
  8. A. Blum, K. Ligett, A. Roth. A Learning Theory Approach to Non-Interactive Database Privacy. In Proceedings of the 40th ACM SIGACT Symposium on Thoery of Computing (2008)
  9. S. Chawla, C. Dwork, F. McSherry, A. Smith, H. Wee. Toward Privacy in Public Databases. In Proceedings of the 2nd Theory of Cryptography Conference (2005)
  10. T. Dalenius. Towards a methodology for statistical disclosure control. In Statistik Tidskrift, 15, 222–429 (1977)
  11. D.E. Denning. Secure Statistical Databases with Random Sample Queries. In Transactions on Database Systems, ACM, 5(3), 291–315 (1980)
  12. D. Denning, P. Denning, M. Schwartz. The Tracker. A Threat to Statistical Database Security. In Transactions on Database Systems, ACM, 4(1), 76–96 (1979)
  13. I. Dinur, K. Nissim. Revealing Information While Preserving Privacy. In Proceedings of the Twenty-Second ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pp. 202–210 (2003)
  14. G. Duncan. Confidentiality and statistical disclosure limitation. In N. Smelser, P. Baltes (editors) International Encyclopedia of the Social and Behavioral Sciences, Elsevier, New York (2001)
  15. C. Dwork. Differential Privacy. In, M. Bugliesi, B. Preneel, V. Sassone, I. Wegener (editors). Proceedings of the International Colloquium on Languages, Automata and Programming (ICALP). 2006. LNCS Vol. 4052, pp. 1–12. Springer, Heidelberg (2006)
  16. C. Dwork, K. Kenthapadi, F. McSherry, I. Mironov, M. Naor. Our Data, Ourselves. Privacy Via Distributed Noise Generation. In S. Vaudenay (editor), Proceedings of EUROCRYPT, 2006. LNCS, vol. 4004, pp. 486–503. Springer, Heidelberg (2006)
  17. C. Dwork, F. McSherry, K. Talwar. The Price of Privacy and the Limits of LP Decoding. In Proceedings of the 39th ACM Symposium on Theory of Computing, pp. 85–94 (2007)
  18. C. Dwork, K. Nissim. Privacy-Preserving Datamining on Vertically Partitioned Databases. In M. Franklin (editor), Proceedings of CRYPTO, 2004. LNCS, vol. 3152, pp. 528–544. Springer, Heidelberg (2004)
  19. C. Dwork, F. McSherry, K. Nissim, A. Smith. Calibrating Noise to Sensitivity in Private Data Analysis. In Proceedings of the 3rd Theory of Cryptography Conference, pp. 265–284 (2006)
  20. C. Dwork, S. Yekhanin. New Efficient Attacks on Statistical Disclosure Control Mechanisms (manuscript, 2008)
  21. A.V. Evfimievski, J. Gehrke, R. Srikant. Limiting Privacy Breaches in Privacy Preserving Data Mining. In Proceedings of the Twenty-Second ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pp. 211–222 (2003)
  22. D. Agrawal, C.C. Aggarwal. On the design and Quantification of Privacy Pre- serving Data Mining Algorithms. In Proceedings of the 20th Symposium on Principles of Database Systems, pages 247–255 (2001)
  23. R. Agrawal, R. Srikant. Privacy-Preserving Data Mining. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 439–450 (2000)
  24. F.Y. Chin, G. Ozsoyoglu. Auditing and infrence control in statistical databases. In Transactions on Software Engineering, IEEE, SE-8(6), 113–139 (1982)
  25. D. Dobkin, A. Jones, R. Lipton. Secure Databases. Protection Against User Influence. In Transactions on Databases, ACM, S 4(1), 97–106 (1979)
  26. I. Fellegi. On the question of statistical confidentiality. In Journal of the American Statistical Association 67, 7–18 (1972)
  27. S. Fienberg. Confidentiality and Data Protection Through Disclosure Limitation. Evolving Principles and Technical Advances. In IAOS Conference on Statistics, Development and Human Rights (2000-09),
  28. S. Fienberg, U. Makov, R. Steele. Disclosure Limitation and Related Methods for Categorical Data. In Journal of Official Statistics, 14, 485–502 (1998)
  29. L. Franconi, G. Merola. Implementing Statistical Disclosure Control for Aggregated Data Released Via Remote Access. In. United Nations Statistical Commission and European Commission, joint ECE/EUROSTAT work session on statistical data confidentiality, Working Paper No.30 (2003-04),
  30. S. Goldwasser, S. Micali. Probabilistic Encryption. J. Comput. Syst. Sci. 28(2), 270–299 (1984)
  31. D. Gusfield. A Graph Theoretic Approach to Statistical Data Security. In Journal of Computing, SIAM, 17(3), 552–571 (1988)
  32. S. Kasiviswanathan, H. Lee, K. Nissim, S. Raskhodnikova, S. Smith. What Can We Learn Privately? (manuscript, 2007)
  33. E. Lefons, A. Silvestri, F. Tangorra. An analytic approach to statistical databases. In Proceedings of the 9th International Conference on Very Large Data Bases (VLDB), 1983-10/1983-11, pages 260–274. Morgan Kaufmann, San Francisco (1983)
  34. A. Machanavajjhala, J. Gehrke, D. Kifer, M. Venkitasubramaniam. l-Diversity. Privacy Beyond k-Anonymity. In Proceedings of the 22nd International Conference on Data Engineering (ICDE 2006), page 24 (2006)
  35. F. McSherry, K. Talwar. Mechanism Design via Differential Privacy. In Proceedings of the 48th Annual Symposium on Foundations of Computer Science (FOCS) (2007)
  36. A. Narayanan, V. Shmatikov. How to Break Anonymity of the Netflix Prize Dataset,
  37. K. Nissim, S. Raskhodnikova, A. Smith. Smooth Sensitivity and Sampling in Private Data Analysis. In Proceedings of the 39th ACM Symposium on Theory of Computing, pages 75–84 (2007)
  38. T.E. Raghunathan, J.P. Reiter, D.B. Rubin. Multiple Imputation for Statistical Disclosure Limitation. Journal of Official Statistics, 19(1), 1–16 (2003)
  39. S. Reiss. Practical Data Swapping. The First Steps. In Transactions on Database Systems, ACM, 9(1), 20–37 (1984)
  40. D.B. Rubin. Discussion. Statistical Disclosure Limitation. In Journal of Official Statistics, 9(2), 461–469 (1993)
  41. A. Shoshani. Statistical databases. Characteristics, problems and some solutions. In Proceedings of the 8th International Conference on Very Large Data Bases (VLDB 1982), pages 208–222 (1982)
  42. P. Samarati, L. Sweeney. Protecting Privacy when Disclosing Information. k-Anonymity and its Enforcement Through Generalization and Specialization, Technical Report SRI-CSL-98-04, SRI International. (1998)
  43. P. Samarati, L. Sweeney. Generalizing Data to Provide Anonymity when Disclosing Information (Abstract). In Proceedings of the Seventeenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pages 188 (1998)
  44. L. Sweeney. Weaving Technology and Policy Together to Maintain Confidentiality. in Journal of Law Medicine & Ethics, 25(2-3), 98–110 (1997)
  45. L. Sweeney. k-anonymity. A Model for Protecting Privacy. International Journal on Uncertainty, In Fuzziness and Knowledge-based Systems 10(5), 557–570 (2002)
  46. L. Sweeney. Achieving k-Anonymity Privacy Protection Using Generalization and Suppression. In International Journal on Uncertainty, Fuzziness and Knowledge-based Systems, 10(5), pages 571–588 (2002)
  47. L.G. Valiant. A Theory of the Learnable. In Proceedings of the 16th Annual ACM SIGACT Symposium on Theory of Computing, pages 436–445 (1984)
  48. X. Xiao, Y. Tao. M-invariance. towards privacy preserving re-publication of dynamic datasets. In Proceedings of ACM Special Interest Group on the Management of Data(SIGMOD) 2007, pages 689–700 (2007)

Differential Privacy | Dwork

  • Cynthia Dwork; “Differential privacy”. In Automata, Languages and Programming; Springer; 2008; pages 1-12; paywall.
  • Cynthia Dwork, Differential Privacy; In Proceedings of the International Colloquium on Automata, Languages and Programming (ICALP); 2006; pages 1–12; CiteSeer

Abstract

In 1977 Dalenius articulated a desideratum for statistical databases: nothing about an individual should be learnable from the database that cannot be learned without access to the database. We give a general impossibility result showing that a formalization of Dalenius’ goal along the lines of semantic security cannot be achieved. Contrary to intuition, a variant of the result threatens the privacy even of someone not in the database. This state of affairs suggests a new measure, differential privacy, which, intuitively, captures the increased risk to one’s privacy incurred by participating in a database. The techniques devel- oped in a sequence of papers [8, 13, 3], culminating in those described in [12], can achieve any desired level of privacy under this measure. In many cases, extremely accurate information about the database can be provided while simultaneously ensuring very high levels of privacy.

Highlighted

  1. A. Blum, C. Dwork, F. McSherry, K. Nissim. Practical privacy: The SuLQ framework. In Proceedings of the 24th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pages 128–138, 2005-06.
  2. I. Dinur, K. Nissim. Revealing information while preserving privacy. In Proceedings of the 22nd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pages 202–210, 2003.
  3. C. Dwork, F. McSherry, K. Nissim, A. Smith. Calibrating noise to sensitivity in private data analysis. In Proceedings of the 3rd Theory of Cryptography Conference, pages 265–284, 2006.
  4. C. Dwork, K. Nissim. Privacy-preserving datamining on vertically partitioned databases. In Advances in Cryptology: Proceedings of Crypto, pages 528–544, 2004.

References

  1. N. R. Adam, J. C. Wortmann, Security-Control Methods for Statistical Databases: A Comparative Study, In ACM Computing Surveys 21(4): 515-556 (1989).
  2. R. Agrawal, R. Srikant. Privacy-preserving data mining. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 439–450, 2000.
  3. A. Blum, C. Dwork, F. McSherry, K. Nissim. Practical privacy: The SuLQ framework. In Proceedings of the 24th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pages 128–138, 2005-06.
  4. S. Chawla, C. Dwork, F. McSherry, A. Smith, H. Wee. Toward privacy in public databases. In Proceedings of the 2nd Theory of Cryptography Conference, pages 363–385, 2005.
  5. S. Chawla, C. Dwork, F. McSherry, K. Talwar. On the utility of privacy-preserving histograms. In Proceedings of the 21st Conference on Uncertainty in Artificial Intelligence, 2005.
  6. T. Dalenius, Towards a methodology for statistical disclosure control.
    In Statistik Tidskrift, Volume 15, pp. 429–222, 1977.
  7. D. E. Denning, Secure statistical databases with random sample queries, In Transactions on Database Systems, ACM, 5(3):291–315, 1980-09.
  8. I. Dinur, K. Nissim. Revealing information while preserving privacy. In Proceedings of the 22nd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pages 202–210, 2003.
  9. D. Dobkin, A.K. Jones, R.J. Lipton, Secure databases: Protection against user influence. In Transactions on Database Systems, ACM. 4(1), pp. 97–106, 1979.
  10. Y. Dodis, L. Reyzin, A. Smith, Fuzzy extractors: How to generate strong keys from biometrics, other noisy data. In Proceedings of EUROCRYPT, 2004, pages 523–540.
  11. Y. Dodis, A. Smith, Correcting Errors Without Leaking Partial Information, In Proceedings of the 37th ACM Symposium on Theory of Computing, pp. 654–663, 2005.
  12. C. Dwork, F. McSherry, K. Nissim, A. Smith. Calibrating noise to sensitivity in private data analysis. In Proceedings of the 3rd Theory of Cryptography Conference, pages 265–284, 2006.
  13. C. Dwork, K. Nissim. Privacy-preserving datamining on vertically partitioned databases. In Advances in Cryptology: Proceedings of Crypto, pages 528–544, 2004.
  14. A. Evfimievski, J. Gehrke, R. Srikant. Limiting privacy breaches in privacy-preserving data mining. In Proceedings of the 22nd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pages 211–222, 2003-06.
  15. S. Goldwasser, S. Micali, Probabilistic encryption. Journal of Computer and System Sciences, 28, pp. 270–299, 1984; prelminary version appeared in Proceedings 14th Annual ACM Symposium on Theory of Computing, 1982.
  16. N. Nisan, D. Zuckerman. Randomness is linear in space. In Journal of Computing Systems Science, 52(1):43–52, 1996.
  17. Ronen Shaltiel. Recent developments in explicit constructions of extractors. In Bulletin of the EATCS, 77:67–95, 2002.
  18. Sweeney, L., Weaving technology, policy together to maintain confidentiality. In Journal of Law Medicine & Ethics, 1997. 25(2-3): pages 98-110.
  19. L. Sweeney, Achieving k-anonymity privacy protection using generalization and suppression. In International Journal on Uncertainty, Fuzziness, Knowledge-based Systems, 10(5), 2002; 571-588.

Wherefore art thou r3579x?: anonymized social networks, hidden patterns, and structural steganography | Backstrom, Dwork, Kleinberg

Lars Backstrom, Cynthia Dwork, Jon Michael Kleinberg; Wherefore art thou r3579x?: anonymized social networks, hidden patterns, and structural steganography, In Proceedings of the 16th International Conference on World Wide Web (WWW’07); 2007; pages 181-190 (9 pages); landing, ACM.

Abstract

In a social network, nodes correspond to people or other social entities, and edges correspond to social links between them. In an effort to preserve privacy, the practice of anonymization replaces names with meaningless unique identifiers. We describe a family of attacks such that even from a single anonymized copy of a social network, it is possible for an adversary to learn whether edges exist or not between specific targeted pairs of node.

Trial

Daniel Jackoway; Wherefore Art Thou R3579X? Anonymized Social Networks, Hidden Patterns, and Structural Steganograpy by Backstrom, Dwork & Kleinberg; draft, perhaps in Some Venue; 2014-05-21; 13 pages.
A trial of the method … seems like a term paper.

Privacy, Big Data, and the Public Good: Frameworks for Engagement | Lane, Stodden, Bender

Julia Lane, Victoria Stodden, Stefan Bender (editors); Privacy, Big Data, and the Public Good: Frameworks for Engagement; Cambridge University Press; 2014-06-30; 340 pages; kindle: no, paper: $23+SHT; microsite.

Table of Contents

Introduction

I. Conceptual Framework

  1. Monitoring, Datafication, and Consent: Legal Approaches to Privacy in the Big Data Context; Katherine J. Strandburg (NYU).
  2. Big Data’s End Run around Anonymity and Consent; Solon Barocas, Helen Nissenbaum (NYU).
  3. The Economics and Behavioral Economics of Privacy; Alessandro Acquisti (CMU); Privacy is not about having something negative to hide ~15:00
  4. Changing the Rules: General Principles for Data Use and Analysis; Paul Ohm (Colorado).
  5. Enabling Reproducibility in Big Data Research: Balancing Confidentiality and Scientific Transparency; Victoria Stodden (Columbia).

II. Practical Framework

  1. The Value of Big Data for Urban Science; Steven E. Koonin, Michael J. Holland (CUSP).
  2. Data for the Public Good: Challenges and Barriers in the Context of Cities; Robert M. Goerge (Chicago).
  3. A European Perspective on Research and Big Data Access; Peter Elias (Warwick)
  4. The New Deal on Data: A Framework for Institutional Controls; Daniel Greenwood (MIT), Arkadiusz Stopczynski (DTU), Brian Sweatt (MIT), Thomas Hardjono (MIT),
    Alex Pentland (MIT).
  5. Engineered Controls for Dealing with Big Data; Carl Landwehr (George Washington).
  6. Portable Approaches to Informed Consent and Open Data; John Wilbanks (Sage Bionetworks & Kauffman Foundation).

III. Statistical Framework

  1. Extracting Information from Big Data: A Privacy and Confidentiality Perspective; Frauke Kreuter (Maryland), Roger Peng (Johns Hopkins).
  2. Using Statistics to Protect Privacy; Alan F. Karr (NISS), Jerome P. Reiter (Duke).
  3. Differential Privacy: A Cryptographic Approach to Private Data Analysis; Cynthia Dwork (Microsoft)

It’s Not Privacy, and It’s Not Fair | Dwork, Mulligan

Cynthia Dwork, Deirdre K. Mulligan; It’s Not Privacy, and It’s Not Fair; In Stanford Law Review; Print 66, Online 35; 2013-09-03; PDF; (a series or discussion venue) Privacy and Big Data.
  • Cynthia Dwork is Distinguished Scientist, Microsoft Research.
  • Deirdre K. Mulligan is Assistant Professor of School of Information, Berkeley Law; Co-Director, Berkeley Center for Law and Technology.