INFO 198/BIOL 106B (callingbullshit.org) – “Calling Bullshit in the Age of Big Data,” a course, University of Washington (Washington State, that is, located in Seattle WA). Instructors: Jevin West (iinformation), Carl Bergstrom (biology)
Anthony J. Casey, Anthony Niblett; The Death of Rules and Standards; Coase-Sandor Working Paper Series in Law and Economics No. 738; Law School, University of Chicago; 2015; 58 pages; landing, copy, ssrn:2693826, draft.
Scholars have examined the lawmakers’ choice between rules and standards for decades. This paper, however, explores the possibility of a new form of law that renders that choice unnecessary. Advances in technology (such as big data and artificial intelligence) will give rise to this new form – the micro-directive – which will provide the benefits of both rules and standards without the costs of either.
Lawmakers will be able to use predictive and communication technologies to enact complex legislative goals that are translated by machines into a vast catalog of simple commands for all possible scenarios. When an individual citizen faces a legal choice, the machine will select from the catalog and communicate to that individual the precise context-specific command (the micro-directive) necessary for compliance. In this way, law will be able to adapt to a wide array of situations and direct precise citizen behavior without further legislative or judicial action. A micro-directive, like a rule, provides a clear instruction to a citizen on how to comply with the law. But, like a standard, a micro-directive is tailored to and adapts to each and every context.
While predictive technologies such as big data have already introduced a trend toward personalized default rules, in this paper we suggest that this is only a small part of a larger trend toward context- specific laws that can adapt to any situation. As that trend continues, the fundamental cost trade-off between rules and standards will disappear, changing the way society structures and thinks about law.
tl;dr → Sortof. Given data from prior 24 hrs to the election, in swing states, then the model 80% accurate. It takes a year (post election) to identify the existence such a model for 2012.
The days of surprise about actual election outcomes in the big data world are likely to be fewer in the years ahead, at least to those who may have access to such data. In this paper we highlight the potential for forecasting the Unites States presidential election outcomes at the state and county levels based solely on the data about viewership of television programs. A key consideration for relevance is that given the infrequent nature of elections, such models are useful only if they can be trained using recent data on viewership. However, the target variable (election outcome) is usually not known until the election is over. Related to this, we show here that such models may be trained with the television viewership data in the “safe” states (the ones where the outcome can be assumed even in the days preceding elections) to potentially forecast the outcomes in the swing states. In addition to their potential to forecast, these models could also help campaigns target programs for advertisements. Nearly two billion dollars were spent on television advertising in the 2012 presidential race, suggesting potential for big data–driven optimization of campaign spending.
tl;dr → alternative scoring products, propensity scoring, (not-)credit reports.
For color, background & verisimilitude
Douglas Merrill, founder and chief executive, ZestFinance
Asim Khwaja, professor of international finance and development, Kennedy School, Harvard University.
Chi Chi Wu, attorney, National Consumer Law Center.
Teresa Jackson, vice president of credit, Social Fiance (SoFi).
Alfonso Brigham, exemplar; customer of Social Fiance (SoFi).
Phil Marleau, CEO, IOU Financial.
Eric Haller, executive vice president, Experian Data Labs.
Basix, a lender
a holding company
owns & operates Basix
Douglas Merrill, founder and chief executive
“All data is credit data”
<quote>ZestFinance collects thousands of pieces of consumer information — some submitted in an online application, some obtained from data brokers — and runs them through algorithms that judge how likely it is a borrower will repay.</quote>
founder and chief executive, ZestFinance
ex-Google, role unspecified.
ex-Rand Corp, a research role.
Social Finance (SoFi)
Social Finance (SoFi)
San Francisco, CA
a lender (a loan broker?)
backgrounds in finance, software and business consulting.
Teresa Jackson, vice president of credit, SoFi.
$1B (with a ‘b’)
does not monitor social media
bachelor’s degree, business administration, USC 2005.
has a job
acquired for a mortgage
one-bedroom condo in Nob Hill, San Francisco
publicly traded (where?)
a lender (a loan broker?)
monitor social media
count & correlate bad reviews
Phil Marleau, CEO
Experian Data Labs, “a research unit”
San Diego, CA
Eric Haller, executive vice president, Experian Data Labs
monitor social media
<quote>The firm’s data scientists took business credit information and combined it with information from Twitter, Facebook, Yelp and others. Based on that analysis, the firm is working on a credit-scoring system that could be based solely on social media information.</quote>
Solon Barocas (Princeton University), Andrew D. Selbst (U.S. Court of Appeals); Big Data’s Disparate Impact; California Law Review, Vol. 104, 2016 (to appear); 62 pages; ssrn:2477899; 2015-08-14.
Big data claims to be neutral. It isn’t.
Advocates of algorithmic techniques like data mining argue that they eliminate human biases from the decision-making process. But an algorithm is only as good as the data it works with. Data mining can inherit the prejudices of prior decision-makers or reflect the widespread biases that persist in society at large. Often, the “patterns” it discovers are simply preexisting societal patterns of inequality and exclusion. Unthinking reliance on data mining can deny members of vulnerable groups full participation in society. Worse still, because the resulting discrimination is almost always an unintentional emergent property of the algorithm’s use rather than a conscious choice by its programmers, it can be unusually hard to identify the source of the problem or to explain it to a court.
This Article examines these concerns through the lens of American anti-discrimination law — more particularly, through Title VII’s prohibition on discrimination in employment. In the absence of a demonstrable intent to discriminate, the best doctrinal hope for data mining’s victims would seem to lie in disparate impact doctrine. Case law and the EEOC’s Uniform Guidelines, though, hold that a practice can be justified as a business necessity where its outcomes are predictive of future employment outcomes, and data mining is specifically designed to find such statistical correlations. As a result, Title VII would appear to bless its use, even though the correlations it discovers will often reflect historic patterns of prejudice, others’ discrimination against members of vulnerable groups, or flaws in the underlying data.
Addressing the sources of this unintentional discrimination and remedying the corresponding deficiencies in the law will be difficult technically, difficult legally, and difficult politically. There are a number of practical limits to what can be accomplished computationally. For example, where the discrimination occurs because the data being mined is itself a result of past intentional discrimination, there is frequently no obvious method to adjust historical data to rid it of this taint. Corrective measures that alter the results of the data mining after it is complete would tread on legally and politically disputed terrain. These challenges for reform throw into stark relief the tension between the two major theories underlying anti-discrimination law: nondiscrimination and anti-subordination. Finding a solution to big data’s disparate impact will require more than best efforts to stamp out prejudice and bias; it will require wholesale reexamination of the meanings of “discrimination” and “fairness.”
Table of Contents
HOW DATA MINING DISCRIMINATES
Defining the “Target Variable” and “Class Labels”
TITLE VII LIABILITY FOR DISCRIMINATORY DATA MINING