How to Call B.S. on Big Data: A Practical Guide | New Yorker

How to Call B.S. on Big Data: A Practical Guide; ; In The New Yorker/ 2017-06-03.


INFO 198/BIOL 106B ( – “Calling Bullshit in the Age of Big Data,” a course, University of Washington (Washington State, that is, located in Seattle WA). Instructors:  Jevin West (iinformation), Carl Bergstrom (biology)




In The New  Yorker

The Death of Rules and Standards | Casey, Niblett

Anthony J. Casey, Anthony Niblett; The Death of Rules and Standards; Coase-Sandor Working Paper Series in Law and Economics No. 738; Law School, University of Chicago; 2015; 58 pages; landing, copy, ssrn:2693826, draft.


Scholars have examined the lawmakers’ choice between rules and standards for decades. This paper, however, explores the possibility of a new form of law that renders that choice unnecessary. Advances in technology (such as big data and artificial intelligence) will give rise to this new form – the micro-directive – which will provide the benefits of both rules and standards without the costs of either.

Lawmakers will be able to use predictive and communication technologies to enact complex legislative goals that are translated by machines into a vast catalog of simple commands for all possible scenarios. When an individual citizen faces a legal choice, the machine will select from the catalog and communicate to that individual the precise context-specific command (the micro-directive) necessary for compliance. In this way, law will be able to adapt to a wide array of situations and direct precise citizen behavior without further legislative or judicial action. A micro-directive, like a rule, provides a clear instruction to a citizen on how to comply with the law. But, like a standard, a micro-directive is tailored to and adapts to each and every context.

While predictive technologies such as big data have already introduced a trend toward personalized default rules, in this paper we suggest that this is only a small part of a larger trend toward context- specific laws that can adapt to any situation. As that trend continues, the fundamental cost trade-off between rules and standards will disappear, changing the way society structures and thinks about law.

Separately noted.

Does Television Viewership Predict Presidential Election Outcomes? | Barfar, Padmanabhan

Barfar Arash, Padmanabhan Balaji (University of South Florida, Tampa); Does Television Viewership Predict Presidential Election Outcomes?; In Big Data, 3(3); 2015-09-16; pages 138-147 (10 pages). doi:10.1089/big.2015.0008; landing; slides (19 slides).

tl;dr → Sortof.  Given data from prior 24 hrs to the election, in swing states, then the model 80% accurate.  It takes a year (post election) to identify the existence such a model for 2012.


The days of surprise about actual election outcomes in the big data world are likely to be fewer in the years ahead, at least to those who may have access to such data. In this paper we highlight the potential for forecasting the Unites States presidential election outcomes at the state and county levels based solely on the data about viewership of television programs. A key consideration for relevance is that given the infrequent nature of elections, such models are useful only if they can be trained using recent data on viewership. However, the target variable (election outcome) is usually not known until the election is over. Related to this, we show here that such models may be trained with the television viewership data in the “safe” states (the ones where the outcome can be assumed even in the days preceding elections) to potentially forecast the outcomes in the swing states. In addition to their potential to forecast, these models could also help campaigns target programs for advertisements. Nearly two billion dollars were spent on television advertising in the 2012 presidential race, suggesting potential for big data–driven optimization of campaign spending.



  1. Ridout TN, Franz M, Goldstein KM, Feltus WJ. Separation by television program: Understanding the targeting of political advertising in presidential elections. In Polit Community 2012; 29:1–23.
  2. Overby, P.; National Public Radio. A review of 2012 Confirms a ‘pulverizing’ level of political ads. Available through 2015-03-01.
  3. Gordon BR, Hartmann WR. Advertising competition in presidential elections. Research Papers 2131. Stanford, CA: Stanford University Graduate School of Business, 2014.
  4. Graefe A. Issue and leader voting in US presidential elections. In Elect Studies 2013; 32:644–657.
  5. Lovett M, Peress M. Targeting political advertising on television. Rochester, NY: University of Rochester, 2010, 2014.
  6. Facebook. Politics and culture on Facebook in the 2014 midterm elections. Available through 2015-03-01.
  7. Mendenhall W, Sincich T. A second course in statistics: Regression analysis. New Jersey: Pearson, 2012.
  8. Gamboa LF, Garcı ́a-Suaza A, Otero J. Statistical inference for testing Gini coefficients: An application for Colombia. In Ensayos sobre Politica Economica 2010; 28:226–241.
  9. Taylor, C. 2012-11-07. Mashable; Triumph of the nerds: Nate Silver wins in 50 states.

Some lenders are judging you on much more than finances | LAT

Some lenders are judging you on much more than finances; James Rufus Koren; In The Los Angeles Times (LAT); 2015-12-10.

tl;dr → alternative scoring products, propensity scoring, (not-)credit reports.



For color, background & verisimilitude

  • Douglas Merrill, founder and chief executive, ZestFinance
  • Asim Khwaja, professor of international finance and development, Kennedy School, Harvard University.
  • Chi Chi Wu, attorney, National Consumer Law Center.
  • Teresa Jackson, vice president of credit, Social Fiance (SoFi).
  • Alfonso Brigham, exemplar; customer of Social Fiance (SoFi).
  • Phil Marleau, CEO, IOU Financial.
  • Eric Haller, executive vice president, Experian Data Labs.


Basix, ZestFinance

  • Basix, a lender
  • ZestFinance,
  • a holding company
  • Hollywood, CA
  • owns & operates Basix
  • Douglas Merrill, founder and chief executive
  • “All data is credit data”
  • Customers
    •, CN
  • Scheme
    • <quote>ZestFinance collects thousands of pieces of consumer information — some submitted in an online application, some obtained from data brokers — and runs them through algorithms that judge how likely it is a borrower will repay.</quote>
  • Douglas Merrill
    • founder and chief executive, ZestFinance
    • ex-Google, role unspecified.
    • ex-Rand Corp, a research role.

Social Finance (SoFi)

  • Social Finance (SoFi)
  • San Francisco, CA
  • a lender (a loan broker?)
  • founded 2011
  • 4 co-founders
    backgrounds in finance, software and business consulting.
  • Teresa Jackson, vice president of credit, SoFi.
  • Funding
    • $1B (with a ‘b’)
    • “including” SoftBank
  • Scheme
    • does not monitor social media
  • Exemplars
    • Alfonso Brigham
      • bachelor’s degree, business administration, USC 2005.
      • has a job
      • acquired for a mortgage
        • $711,000 loan
        • one-bedroom condo in Nob Hill, San Francisco

IOU Financial

  • IOU Financial
  • Montreal, CA
  • publicly traded (where?)
  • online (only?)
  • B2B
  • a lender (a loan broker?)
  • Scheme
    • monitor social media
    • count & correlate bad reviews
  • Phil Marleau, CEO


  • Experian Data Labs, “a research unit”
  • San Diego, CA
  • Eric Haller, executive vice president, Experian Data Labs
  • Scheme
    • monitor social media
  • <quote>The firm’s data scientists took business credit information and combined it with information from Twitter, Facebook, Yelp and others. Based on that analysis, the firm is working on a credit-scoring system that could be based solely on social media information.</quote>

Big Data’s Disparate Impact | Barocas, Selbst

Solon Barocas (Princeton University), Andrew D. Selbst (U.S. Court of Appeals); Big Data’s Disparate Impact; California Law Review, Vol. 104, 2016 (to appear); 62 pages; ssrn:2477899; 2015-08-14.


Big data claims to be neutral. It isn’t.

Advocates of algorithmic techniques like data mining argue that they eliminate human biases from the decision-making process. But an algorithm is only as good as the data it works with. Data mining can inherit the prejudices of prior decision-makers or reflect the widespread biases that persist in society at large. Often, the “patterns” it discovers are simply preexisting societal patterns of inequality and exclusion. Unthinking reliance on data mining can deny members of vulnerable groups full participation in society. Worse still, because the resulting discrimination is almost always an unintentional emergent property of the algorithm’s use rather than a conscious choice by its programmers, it can be unusually hard to identify the source of the problem or to explain it to a court.

This Article examines these concerns through the lens of American anti-discrimination law — more particularly, through Title VII’s prohibition on discrimination in employment. In the absence of a demonstrable intent to discriminate, the best doctrinal hope for data mining’s victims would seem to lie in disparate impact doctrine. Case law and the EEOC’s Uniform Guidelines, though, hold that a practice can be justified as a business necessity where its outcomes are predictive of future employment outcomes, and data mining is specifically designed to find such statistical correlations. As a result, Title VII would appear to bless its use, even though the correlations it discovers will often reflect historic patterns of prejudice, others’ discrimination against members of vulnerable groups, or flaws in the underlying data.

Addressing the sources of this unintentional discrimination and remedying the corresponding deficiencies in the law will be difficult technically, difficult legally, and difficult politically. There are a number of practical limits to what can be accomplished computationally. For example, where the discrimination occurs because the data being mined is itself a result of past intentional discrimination, there is frequently no obvious method to adjust historical data to rid it of this taint. Corrective measures that alter the results of the data mining after it is complete would tread on legally and politically disputed terrain. These challenges for reform throw into stark relief the tension between the two major theories underlying anti-discrimination law: nondiscrimination and anti-subordination. Finding a solution to big data’s disparate impact will require more than best efforts to stamp out prejudice and bias; it will require wholesale reexamination of the meanings of “discrimination” and “fairness.”

Table of Contents

    1. Defining the “Target Variable” and “Class Labels”
    2. Training Data
      1. Labeling Examples
      2. Data Collection
    3. Feature Selection
    4. Proxies
    5. Masking
    1. Disparate Treatment
    2. Disparate Impact
    3. Masking and Problems of Proof
    1. Internal Difficulties
      1. Defining the Target Variable
      2. Training Data
      3. Feature Selection
      4. Proxies
    2. External Difficulties

Reality Mining: Using Big Data to Engineer a Better World | Eagle, Greene

Nathan Eagle, Kate Greene; Reality Mining: Using Big Data to Engineer a Better World; The MIT Press; 1st edition; 2014-08-01; 208 pages; kindle: $20, paper: $19+SHT.

From the abstract on A….

  • Eagle, a recognized expert in the field,
  • Greene, an experienced technology journalist.

Why big data evangelists should be sent to re-education camps | ZDNet

Why big data evangelists should be sent to re-education camps; In ZDNet; 2014-09-19.
Summary: Big data is a dangerous, faith-based ideology. It’s fuelled by hubris, it’s ignorant of history, and it’s trashing decades of progress in social justice.


Hospitals Are Mining Patients’ Credit Card Data to Predict Who Will Get Sick | Businessweek

Hospitals Are Mining Patients’ Credit Card Data to Predict Who Will Get Sick; Shannon Pettypiece, Jordan Robertson; In Bloomberg Businessweek; 2014-07-03.