AI and ‘Enormous Data’ Could Make Tech Giants Harder to Topple | Wired

AI and ‘Enormous Data’ Could Make Tech Giants Harder to Topple; ; In Wired; 2017-07-13.

tl;dr → <quote>such releases don’t usually offer much of value to potential competitors. </quote> They are promotional and self-serving.



  • TensorFlow
  • Common Visual Data Foundation
    • open image data sets
    • A “nonprofit”
    • Sponsors
      • Facebook
      • Microsoft
  • Other data sets
    • from YouTube, by Google
    • from Wikipedia, by Salesforce


  • Google
  • Microsoft
  • and others!
    • Salesforce
    • Uber
  • Manifold, a boutique
  •, a boutique


  • Luke de Oliveira
    • partner, Manifold
    • temp staff, visitor, Lawrence Berkeley National Lab
  • Abhinav Gupta, Carnegie Mellon University (CMU)
  • Rachel Thomas, cofounder,


Enormous Data
Are you kidding me? Do you even use computers?
incumbents’ usual data advantage
innovative and un-monopolistic by disruption
Appears in the 1st paragraph


The Wikitext Long Term Dependency Language Modeling Dataset; On Some Site

  • an announcement, but WHEN?


In Wired


The Princeton Web Transparency And Accountability Project | Narayanan, Reisman

Arvind Narayanan, Dillon Reisman; The Princeton Web Transparency and Accountability Project; In Tania Cerquitelli, Daniele Quercia, Frank Pasquale (editors); Transparent Data Mining for Big and Small Data; Springer; 2017.

tl;dr → There be dragons. Princeton was is there. Tell it! Testify!


When you browse the web, hidden “third parties” collect a large amount of data about your behavior. This data feeds algorithms to target ads to you, tailor your news recommendations, and sometimes vary prices of online products. The network of trackers comprises hundreds of entities, but consumers have little awareness of its pervasiveness and sophistication. This chapter discusses the findings and experiences of the Princeton Web Transparency Project, which continually monitors the web to uncover what user data companies collect, how they collect it, and what they do with it. We do this via a largely automated monthly “census” of the top 1 million websites, in effect “tracking the trackers”. Our tools and findings have proven useful to regulators and investigatory journalists, and have led to greater public awareness, the cessation of some privacy-infringing practices, and the creation of new consumer privacy tools. But the work raises many new questions. For example, should we hold websites accountable for the privacy breaches caused by third parties? The chapter concludes with a discussion of such tricky issues and makes recommendations for public policy and regulation of privacy.


  • Marvin Minsky
  • expert systems
  • Machine Learning
  • Artifical Intelligence
  • Big Data
  • Netflix
  • Self-Driving Cars
  • collect data first, ask questions later
  • surveillance infrastructure
  • Kafkaesque
  • data and algorithmic transparency
  • Workshop on Data and Algorithmic Transparency
  • Princeton Web Transparency and Accountability Project (WebTAP)
    Princeton Web Census
  • Privacy scholar
  • Ryan Calo
  • The Party System
  • first party
  • third party
  • Twitter
  • Facebook
  • Facebook Like Button
  • The Beauty and the Beast Project
  • Panopticlick
  • Anonymous
  • Pseudonymous
  • biases
  • discrimination
  • targeted political messaging
  • price discrimination
  • market manipulation
  • AdChoices
  • ad blockers
  • Federal Trade Commission (FTC)
  • Optimizely
  • A/B Testing
  • OpenWPM (Open Web Privacy Measurement)
  • FourthParth
  • FPDetective
  • PhamtomJS
  • Firefox
  • Tor
  • Facebook Connect
  • Google Single Sign-On (SSO)
  • longitudinal studies
  • HTML5, Canvas API
  • canvas fingerprinting
  • AddThis
  • AudioContext API
  • WebRTC API
  • Battery Status API
  • NSA (National Security Agency)
  • Snowden
  • Cookies
  • transitive cookie linking
  • cookie syncing
  • Google
  • Facebook
  • Federal Trade Commission (FTC)
  • Cross-Device Tracking
  • header enrichment (by ISPs)
  • Ghostery
  • AdBlock Plus
  • uBlock Origin
  • machine learning classifier (for tracking behavior)
  • Big Data (they used Big Data and Machine Learning Classifiers)
  • Nudge (a book)
  • Choice Architecture
  • 3rd Part Cookies, blocking 3rd party cookies
  • Do Not Track
  • Battery API
  • Internet Explorer
  • zero sum game
  • power user interfaces
  • PGP (Pretty Good Privacy)
  • Cookie Blocking
  • <buzz>long tail (of innovation)</buzz>
  • Children’s Online Privacy Protection Act (COPPA)
  • child-directed websites.
  • American Civil Liberties Union (ACLU)
  • Computer Fraud and Abuse Act
  • Personally-Identifiable Information (PII)
  • shift of power, from 3rd parties to publishers
  • Columbia University
  • Carnegie Mellon University
  • Internet of Things (IoT)
  • WiFi
  • cross-device tracking
  • smartphone app
  • Fairness, Accountability and Transparency in Machine Learning (FAT-ML)
  • Princeton


  • “The best minds of our generation are thinking about how to make people click on ads” attributed to Jeff Hammerbacher


  • Crevier D (1993) AI: The tumultuous history of the search for artificial intelligence. Basic Books, Inc.
  • Engle Jr RL, Flehinger BJ (1987) Why expert systems for medical diagnosis are not being generally used: a valedictory opinion. Bulletin of the New York Academy of Medicine 63(2):193
  • Vance A (2011) This tech bubble is different. Bloomberg
  • Angwin J (2016) Machine bias: Risk assessments in criminal sentencing. ProPublica
  • Levin S (2016) A beauty contest was judged by AI and the robots didn’t like dark skin. The Guardian
  • Solove DJ (2001) Privacy and power: Computer databases and metaphors for information privacy. Stanford Law Review pp 1393–1462
  • Marthews A, Tucker C (2015) Government surveillance and internet search behavior. ssrn:2412564
  • Hannak A, Soeller G, Lazer D, Mislove A, Wilson C (2014) Measuring price discrimination and steering on e-commerce web sites. In: Proceedings of the 2014 Conference on Internet Measurement Conference, ACM, pp 305–318
  • Calo R (2013) Digital market manipulation. University of Washington School of Law Research Paper 2013-27 DOI 10.2139/ssrn.2309703 ssrn:2309703
  • Mayer JR, Mitchell JC (2012) Third-party web tracking: Policy and technology. In: Proceedings of the 2012 IEEE Symposium on Security and Privacy, IEEE, pp 413–427
  • Angwin J (2010) The web’s new gold mine: Your secrets. The Wall Street Journal
  • Lerner A, Simpson AK, Kohno T, Roesner F (2016) Internet jones and the raiders of the lost trackers: An archaeological study of web tracking from 1996 to 2016. In: Proceedings of the 25th USENIX Security Symposium (USENIX Security 16)
  • Laperdrix P, Rudametkin W, Baudry B (2016) Beauty and the beast: Diverting modern web browsers to build unique browser fingerprints. In: Proceedings of the 37th IEEE Symposium on Security and Privacy (S&P 2016)
  • Eckersley P (2010) How unique is your web browser? In: International Symposium on Privacy Enhancing Technologies Symposium, Springer, pp 1–18
  • Acar G, Van Alsenoy B, Piessens F, Diaz C, Preneel B (2015) Facebook tracking through social plug-ins. Technical report prepared for the Belgian Privacy Commission
  • Starov O, Gill P, Nikiforakis N (2016) Are you sure you want to contact us? quantifying the leakage of pii via website contact forms. In: Proceedings on Privacy Enhancing Technologies 2016(1):20–33
  • Krishnamurthy B, Naryshkin K, Wills C (2011) Privacy leakage vs. protection measures: the growing disconnect. In: Proceedings of the Web, vol 2, pp 1–10
  • Su J, Shukla A, Goel S, Narayanan A (2017) De-anonymizing web browsing data with social networks, manuscript
  • Barocas S, Nissenbaum H (2014) Big data’s end run around procedural privacy protections. In Communications of the ACM 57-11:31-33
  • Shilton K, Greene D (2016) Because privacy: defining and legitimating privacy in ios development. In IConference 2016 Proceedings
  • Storey G, Reisman D, Mayer J, Narayanan A (2016) The future of ad blocking: Analytical framework and new techniques, manuscript
  • Narayanan A (2016) Can Facebook really make ads unblockable? In Freedom to Tinker
  • Storey G (2016) Facebook ad highlighter.
  • Reisman D (2016) A peek at A/B testing in the wild. In Freedom to Tinker
  • Acar G, Juarez M, Nikiforakis N, Diaz C, Gürses S, Piessens F, Preneel B (2013) Fpdetective: dusting the web for fingerprinters. In: Proceedings of the 2013 ACM SIGSAC Conference on Computer & Communications Security, ACM, pp 1129–1140
  • Englehardt S, Narayanan A (2016) Online tracking: A 1-million-site measurement and analysis. In: Proceedings of the 2016 ACM SIGSAC Conference on Computer & Communications Security
  • Selenium HQ (2016) Selenium browser automation faq.
  • Acar G, Eubank C, Englehardt S, Juarez M, Narayanan A, Diaz C (2014) The web never forgets. In Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security (CCS ’14) DOI 10.1145/2660267.2660347
  • Mowery K, Shacham H (2012) Pixel perfect: Fingerprinting canvas in html5. In Proceedings of W2SP
  • (Valve) VV (2016) Fingerprintjs2 — modern & flexible browser fingerprinting library, a successor to the original fingerprintjs.
  • Olejnik Ł, Acar G, Castelluccia C, Diaz C (2015) The leaking battery. In: International Workshop on Data Privacy Management, Springer, pp 254–263
  • Englehardt S, Narayanan A (2016) Online tracking: A 1-million-site measurement and analysis. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security (CCS ’16)
  • Doty N (2016) Mitigating browser fingerprinting in web specifications.
  • Soltani A, Peterson A, Gellman B (2013) NSA uses Google cookies to pinpoint targets for hacking. In Washingtno Post.
  • Englehardt S, Reisman D, Eubank C, Zimmerman P, Mayer J, Narayanan A, Felten EW (2015) Cookies that give you away. In Proceedings of the 24th International Conference on World Wide Web (WWW ’15) DOI 10.1145/2736277.2741679
  • Angwin J (2016) Google has quietly dropped ban on personally identifiable web tracking. ProPublica
  • Reitman R (2012) What actually changed in Googles privacy policy. Electronic Frontier Foundation
  • Simonite T (2015) Facebooks like buttons will soon track your web browsing to target ads. MIT Technology Review
  • Federal Trade Commission (2015) Cross-device tracking.
  • Maggi F, Mavroudis V (2016) Talking behind your back attacks & countermeasures of ultrasonic cross-device tracking, In Proceedings of Blackhat
  • Angwin J (2014) Why online tracking is getting creepier. ProPublica
  • Vallina-Rodriguez N, Sundaresan S, Kreibich C, Paxson V (2015) Header enrichment or ISP enrichment? In Proceedings of the 2015 ACM SIGCOMM Workshop on Hot Topics in Middleboxes and Network Function Virtualization (HotMiddlebox ’15) DOI 10.1145/2785989.2786002
  • Disconnect (2016) Disconnect blocks new tracking device that makes your computer draw a unique image.
  • Foundation EF (2016) Privacy badger.
  • Thaler RH, Sunstein CR (2008) Nudge: improving decisions about health, wealth, and happiness. Yale University Press
  • Fleishman G (2015) Hands-on with content blocking safari extensions in ios 9. Macworld.
  • Blink, Chromium (2016) Owp storage team sync.
  • Lynch B (2012) Do not track in the windows 8 setup experience – microsoft on the issues. Microsoft on the Issues
  • Hern A (2016) Firefox disables loophole that allows sites to track users via battery status. The Guardian
  • Mozilla (2015) Tracking protection in private browsing.
  • Mozilla (2016) Security/contextual identity project/containers.
  • Federal Trade Commission (2012) Google will pay $22.5 million to settle FTC charges it misrepresented privacy assurances to users of apple’s safari internet browser.
  • Federal Trade Commission (2016) Children’s online privacy protection rule (“COPPA”).
  • New York State Office of the Attorney General (2016) A.G. schneiderman announces results of “operation child tracker,” ending illegal online tracking of children at some of nation’s most popular kids’ websites.
  • American Civil Liberties Union (2016) Sandvig v. Lynch.
  • Eubank C, Melara M, Perez-Botero D, Narayanan A (2013) Shining the floodlights on mobile web tracking a privacy survey.
  • CMU CHIMPS Lab (2015) Privacy grade: Grading the privacy of smartphone apps.
  • Vanrykel E, Acar G, Herrmann M, Diaz C (2016) Leaky birds: Exploiting mobile application traffic for surveillance. In Proceedings of Financial Cryptography and Data Security 2016
  • Lécuyer M, Ducoffe G, Lan F, Papancea A, Petsios T, Spahn R, Chaintreau A, Geambasu R (2014) Xray: Enhancing the webs transparency with differential correlation. In: Proceedings of the 23rd USENIX Security Symposium (USENIX Security 14), pp 49–64
  • Lecuyer M, Spahn R, Spiliopolous Y, Chaintreau A, Geambasu R, Hsu D (2015) Sunlight: Fine-grained targeting detection at scale with statistical confidence. In: Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, ACM, pp 554–566
  • Tschantz MC, Datta A, Datta A, Wing JM (2015) A methodology for information flow experiments. In: Proceedings of the 2015 IEEE 28th Computer Security Foundations Symposium, IEEE, pp 554–568
  • Datta A, Sen S, Zick Y (2016) Algorithmic transparency via quantitative input influence. In: Proceedings of 37th IEEE Symposium on Security and Privacy
  • Chen L, Mislove A, Wilson C (2015) Peeking beneath the hood of uber. In: Proceedings of the 2015 ACM Conference on Internet Measurement Conference, ACM, pp 495–508
  • Valentino-Devries J, Singer-Vine J, Soltani A (2012) Websites vary prices, deals based on users information. In The Wall Street Journal 10:60–68
  • Guide ASU (2016) Ui/application exerciser monkey.
  • Rastogi V, Chen Y, Enck W (2013) Appsplayground: automatic security analysis of smartphone applications. In: Proceedings of the third ACM Conference on Data and Application Security and Privacy, ACM, pp 209–220
  • Enck W, Gilbert P, Han S, Tendulkar V, Chun BG, Cox LP, Jung J, McDaniel P, Sheth AN (2014) Taintdroid: an information-flow tracking system for realtime privacy monitoring on smartphones. In ACM Transactions on Computer Systems (TOCS) 32(2):5
  • Ren J, Rao A, Lindorfer M, Legout A, Choffnes D (2015) Recon: Revealing and controlling privacy leaks in mobile network traffic. arXiv:1507.00255.
  • Razaghpanah A, Vallina-Rodriguez N, Sundaresan S, Kreibich C, Gill P, Allman M, Paxson V (2015) Haystack: in situ mobile traffic analysis in user space. arXiv:1510.01419.
  • Sweeney L (2013) Discrimination in online ad delivery. Queue 11(3):10
  • Caliskan-Islam A, Bryson J, Narayanan A (2016) Semantics derived auto-matically from language corpora necessarily contain human biases. arxiv:1608.07187.



How to Call B.S. on Big Data: A Practical Guide | New Yorker

How to Call B.S. on Big Data: A Practical Guide; ; In The New Yorker/ 2017-06-03.


INFO 198/BIOL 106B ( – “Calling Bullshit in the Age of Big Data,” a course, University of Washington (Washington State, that is, located in Seattle WA). Instructors:  Jevin West (iinformation), Carl Bergstrom (biology)




In The New  Yorker

The Death of Rules and Standards | Casey, Niblett

Anthony J. Casey, Anthony Niblett; The Death of Rules and Standards; Coase-Sandor Working Paper Series in Law and Economics No. 738; Law School, University of Chicago; 2015; 58 pages; landing, copy, ssrn:2693826, draft.


Scholars have examined the lawmakers’ choice between rules and standards for decades. This paper, however, explores the possibility of a new form of law that renders that choice unnecessary. Advances in technology (such as big data and artificial intelligence) will give rise to this new form – the micro-directive – which will provide the benefits of both rules and standards without the costs of either.

Lawmakers will be able to use predictive and communication technologies to enact complex legislative goals that are translated by machines into a vast catalog of simple commands for all possible scenarios. When an individual citizen faces a legal choice, the machine will select from the catalog and communicate to that individual the precise context-specific command (the micro-directive) necessary for compliance. In this way, law will be able to adapt to a wide array of situations and direct precise citizen behavior without further legislative or judicial action. A micro-directive, like a rule, provides a clear instruction to a citizen on how to comply with the law. But, like a standard, a micro-directive is tailored to and adapts to each and every context.

While predictive technologies such as big data have already introduced a trend toward personalized default rules, in this paper we suggest that this is only a small part of a larger trend toward context- specific laws that can adapt to any situation. As that trend continues, the fundamental cost trade-off between rules and standards will disappear, changing the way society structures and thinks about law.

Separately noted.

Does Television Viewership Predict Presidential Election Outcomes? | Barfar, Padmanabhan

Barfar Arash, Padmanabhan Balaji (University of South Florida, Tampa); Does Television Viewership Predict Presidential Election Outcomes?; In Big Data, 3(3); 2015-09-16; pages 138-147 (10 pages). doi:10.1089/big.2015.0008; landing; slides (19 slides).

tl;dr → Sortof.  Given data from prior 24 hrs to the election, in swing states, then the model 80% accurate.  It takes a year (post election) to identify the existence such a model for 2012.


The days of surprise about actual election outcomes in the big data world are likely to be fewer in the years ahead, at least to those who may have access to such data. In this paper we highlight the potential for forecasting the Unites States presidential election outcomes at the state and county levels based solely on the data about viewership of television programs. A key consideration for relevance is that given the infrequent nature of elections, such models are useful only if they can be trained using recent data on viewership. However, the target variable (election outcome) is usually not known until the election is over. Related to this, we show here that such models may be trained with the television viewership data in the “safe” states (the ones where the outcome can be assumed even in the days preceding elections) to potentially forecast the outcomes in the swing states. In addition to their potential to forecast, these models could also help campaigns target programs for advertisements. Nearly two billion dollars were spent on television advertising in the 2012 presidential race, suggesting potential for big data–driven optimization of campaign spending.



  1. Ridout TN, Franz M, Goldstein KM, Feltus WJ. Separation by television program: Understanding the targeting of political advertising in presidential elections. In Polit Community 2012; 29:1–23.
  2. Overby, P.; National Public Radio. A review of 2012 Confirms a ‘pulverizing’ level of political ads. Available through 2015-03-01.
  3. Gordon BR, Hartmann WR. Advertising competition in presidential elections. Research Papers 2131. Stanford, CA: Stanford University Graduate School of Business, 2014.
  4. Graefe A. Issue and leader voting in US presidential elections. In Elect Studies 2013; 32:644–657.
  5. Lovett M, Peress M. Targeting political advertising on television. Rochester, NY: University of Rochester, 2010, 2014.
  6. Facebook. Politics and culture on Facebook in the 2014 midterm elections. Available through 2015-03-01.
  7. Mendenhall W, Sincich T. A second course in statistics: Regression analysis. New Jersey: Pearson, 2012.
  8. Gamboa LF, Garcı ́a-Suaza A, Otero J. Statistical inference for testing Gini coefficients: An application for Colombia. In Ensayos sobre Politica Economica 2010; 28:226–241.
  9. Taylor, C. 2012-11-07. Mashable; Triumph of the nerds: Nate Silver wins in 50 states.

Some lenders are judging you on much more than finances | LAT

Some lenders are judging you on much more than finances; James Rufus Koren; In The Los Angeles Times (LAT); 2015-12-10.

tl;dr → alternative scoring products, propensity scoring, (not-)credit reports.



For color, background & verisimilitude

  • Douglas Merrill, founder and chief executive, ZestFinance
  • Asim Khwaja, professor of international finance and development, Kennedy School, Harvard University.
  • Chi Chi Wu, attorney, National Consumer Law Center.
  • Teresa Jackson, vice president of credit, Social Fiance (SoFi).
  • Alfonso Brigham, exemplar; customer of Social Fiance (SoFi).
  • Phil Marleau, CEO, IOU Financial.
  • Eric Haller, executive vice president, Experian Data Labs.


Basix, ZestFinance

  • Basix, a lender
  • ZestFinance,
  • a holding company
  • Hollywood, CA
  • owns & operates Basix
  • Douglas Merrill, founder and chief executive
  • “All data is credit data”
  • Customers
    •, CN
  • Scheme
    • <quote>ZestFinance collects thousands of pieces of consumer information — some submitted in an online application, some obtained from data brokers — and runs them through algorithms that judge how likely it is a borrower will repay.</quote>
  • Douglas Merrill
    • founder and chief executive, ZestFinance
    • ex-Google, role unspecified.
    • ex-Rand Corp, a research role.

Social Finance (SoFi)

  • Social Finance (SoFi)
  • San Francisco, CA
  • a lender (a loan broker?)
  • founded 2011
  • 4 co-founders
    backgrounds in finance, software and business consulting.
  • Teresa Jackson, vice president of credit, SoFi.
  • Funding
    • $1B (with a ‘b’)
    • “including” SoftBank
  • Scheme
    • does not monitor social media
  • Exemplars
    • Alfonso Brigham
      • bachelor’s degree, business administration, USC 2005.
      • has a job
      • acquired for a mortgage
        • $711,000 loan
        • one-bedroom condo in Nob Hill, San Francisco

IOU Financial

  • IOU Financial
  • Montreal, CA
  • publicly traded (where?)
  • online (only?)
  • B2B
  • a lender (a loan broker?)
  • Scheme
    • monitor social media
    • count & correlate bad reviews
  • Phil Marleau, CEO


  • Experian Data Labs, “a research unit”
  • San Diego, CA
  • Eric Haller, executive vice president, Experian Data Labs
  • Scheme
    • monitor social media
  • <quote>The firm’s data scientists took business credit information and combined it with information from Twitter, Facebook, Yelp and others. Based on that analysis, the firm is working on a credit-scoring system that could be based solely on social media information.</quote>

Big Data’s Disparate Impact | Barocas, Selbst

Solon Barocas (Princeton University), Andrew D. Selbst (U.S. Court of Appeals); Big Data’s Disparate Impact; California Law Review, Vol. 104, 2016 (to appear); 62 pages; ssrn:2477899; 2015-08-14.


Big data claims to be neutral. It isn’t.

Advocates of algorithmic techniques like data mining argue that they eliminate human biases from the decision-making process. But an algorithm is only as good as the data it works with. Data mining can inherit the prejudices of prior decision-makers or reflect the widespread biases that persist in society at large. Often, the “patterns” it discovers are simply preexisting societal patterns of inequality and exclusion. Unthinking reliance on data mining can deny members of vulnerable groups full participation in society. Worse still, because the resulting discrimination is almost always an unintentional emergent property of the algorithm’s use rather than a conscious choice by its programmers, it can be unusually hard to identify the source of the problem or to explain it to a court.

This Article examines these concerns through the lens of American anti-discrimination law — more particularly, through Title VII’s prohibition on discrimination in employment. In the absence of a demonstrable intent to discriminate, the best doctrinal hope for data mining’s victims would seem to lie in disparate impact doctrine. Case law and the EEOC’s Uniform Guidelines, though, hold that a practice can be justified as a business necessity where its outcomes are predictive of future employment outcomes, and data mining is specifically designed to find such statistical correlations. As a result, Title VII would appear to bless its use, even though the correlations it discovers will often reflect historic patterns of prejudice, others’ discrimination against members of vulnerable groups, or flaws in the underlying data.

Addressing the sources of this unintentional discrimination and remedying the corresponding deficiencies in the law will be difficult technically, difficult legally, and difficult politically. There are a number of practical limits to what can be accomplished computationally. For example, where the discrimination occurs because the data being mined is itself a result of past intentional discrimination, there is frequently no obvious method to adjust historical data to rid it of this taint. Corrective measures that alter the results of the data mining after it is complete would tread on legally and politically disputed terrain. These challenges for reform throw into stark relief the tension between the two major theories underlying anti-discrimination law: nondiscrimination and anti-subordination. Finding a solution to big data’s disparate impact will require more than best efforts to stamp out prejudice and bias; it will require wholesale reexamination of the meanings of “discrimination” and “fairness.”

Table of Contents

    1. Defining the “Target Variable” and “Class Labels”
    2. Training Data
      1. Labeling Examples
      2. Data Collection
    3. Feature Selection
    4. Proxies
    5. Masking
    1. Disparate Treatment
    2. Disparate Impact
    3. Masking and Problems of Proof
    1. Internal Difficulties
      1. Defining the Target Variable
      2. Training Data
      3. Feature Selection
      4. Proxies
    2. External Difficulties

Reality Mining: Using Big Data to Engineer a Better World | Eagle, Greene

Nathan Eagle, Kate Greene; Reality Mining: Using Big Data to Engineer a Better World; The MIT Press; 1st edition; 2014-08-01; 208 pages; kindle: $20, paper: $19+SHT.

From the abstract on A….

  • Eagle, a recognized expert in the field,
  • Greene, an experienced technology journalist.

Why big data evangelists should be sent to re-education camps | ZDNet

Why big data evangelists should be sent to re-education camps; In ZDNet; 2014-09-19.
Summary: Big data is a dangerous, faith-based ideology. It’s fuelled by hubris, it’s ignorant of history, and it’s trashing decades of progress in social justice.