I never signed up for this! Privacy implications of email tracking | Englehardt, Han, Narayanan

Steven Englehardt, Jeffrey Han, Arvind Narayanan; I never signed up for this! Privacy implications of email tracking; In Proceedings on Privacy Enhancing Technologies (PETS); 2018; 18 pages.

tl;dr → use Thunderbird with all plugins enabled (Ad Block, Cookie Block, etc.) or use Google Mail on the web.


We show that the simple act of viewing emails contains privacy pitfalls for the unwary. We assembled a corpus of commercial mailing-list emails, and find a network of hundreds of third parties that track email recipients via methods such as embedded pixels. About 30% of emails leak the recipient’s email address to one or more of these third parties when they are viewed. In the majority of cases, these leaks are intentional on the part of email senders, and further leaks occur if the recipient clicks links in emails. Mail servers and clients may employ a variety of defenses, but we analyze 16 servers and clients and find that they are far from comprehensive. We propose, prototype, and evaluate a new defense, namely stripping tracking tags from emails based on enhanced versions of existing web tracking protection lists.




The Princeton Web Transparency And Accountability Project | Narayanan, Reisman

Arvind Narayanan, Dillon Reisman; The Princeton Web Transparency and Accountability Project; In Tania Cerquitelli, Daniele Quercia, Frank Pasquale (editors); Transparent Data Mining for Big and Small Data; Springer; 2017.

tl;dr → There be dragons. Princeton was is there. Tell it! Testify!


When you browse the web, hidden “third parties” collect a large amount of data about your behavior. This data feeds algorithms to target ads to you, tailor your news recommendations, and sometimes vary prices of online products. The network of trackers comprises hundreds of entities, but consumers have little awareness of its pervasiveness and sophistication. This chapter discusses the findings and experiences of the Princeton Web Transparency Project, which continually monitors the web to uncover what user data companies collect, how they collect it, and what they do with it. We do this via a largely automated monthly “census” of the top 1 million websites, in effect “tracking the trackers”. Our tools and findings have proven useful to regulators and investigatory journalists, and have led to greater public awareness, the cessation of some privacy-infringing practices, and the creation of new consumer privacy tools. But the work raises many new questions. For example, should we hold websites accountable for the privacy breaches caused by third parties? The chapter concludes with a discussion of such tricky issues and makes recommendations for public policy and regulation of privacy.


  • Marvin Minsky
  • expert systems
  • Machine Learning
  • Artifical Intelligence
  • Big Data
  • Netflix
  • Self-Driving Cars
  • collect data first, ask questions later
  • surveillance infrastructure
  • Kafkaesque
  • data and algorithmic transparency
  • Workshop on Data and Algorithmic Transparency
  • Princeton Web Transparency and Accountability Project (WebTAP)
    Princeton Web Census
  • Privacy scholar
  • Ryan Calo
  • The Party System
  • first party
  • third party
  • Twitter
  • Facebook
  • Facebook Like Button
  • The Beauty and the Beast Project
  • Panopticlick
  • Anonymous
  • Pseudonymous
  • biases
  • discrimination
  • targeted political messaging
  • price discrimination
  • market manipulation
  • AdChoices
  • ad blockers
  • Federal Trade Commission (FTC)
  • Optimizely
  • A/B Testing
  • OpenWPM (Open Web Privacy Measurement)
  • FourthParth
  • FPDetective
  • PhamtomJS
  • Firefox
  • Tor
  • Facebook Connect
  • Google Single Sign-On (SSO)
  • longitudinal studies
  • HTML5, Canvas API
  • canvas fingerprinting
  • AddThis
  • AudioContext API
  • WebRTC API
  • Battery Status API
  • NSA (National Security Agency)
  • Snowden
  • Cookies
  • transitive cookie linking
  • cookie syncing
  • Google
  • Facebook
  • Federal Trade Commission (FTC)
  • yahoo.com
  • Cross-Device Tracking
  • header enrichment (by ISPs)
  • Ghostery
  • AdBlock Plus
  • uBlock Origin
  • machine learning classifier (for tracking behavior)
  • Big Data (they used Big Data and Machine Learning Classifiers)
  • Nudge (a book)
  • Choice Architecture
  • 3rd Part Cookies, blocking 3rd party cookies
  • Do Not Track
  • Battery API
  • Internet Explorer
  • zero sum game
  • power user interfaces
  • PGP (Pretty Good Privacy)
  • Cookie Blocking
  • <buzz>long tail (of innovation)</buzz>
  • Children’s Online Privacy Protection Act (COPPA)
  • child-directed websites.
  • American Civil Liberties Union (ACLU)
  • Computer Fraud and Abuse Act
  • Personally-Identifiable Information (PII)
  • shift of power, from 3rd parties to publishers
  • Columbia University
  • Carnegie Mellon University
  • Internet of Things (IoT)
  • WiFi
  • cross-device tracking
  • smartphone app
  • Fairness, Accountability and Transparency in Machine Learning (FAT-ML)
  • Princeton


  • “The best minds of our generation are thinking about how to make people click on ads” attributed to Jeff Hammerbacher


  • Crevier D (1993) AI: The tumultuous history of the search for artificial intelligence. Basic Books, Inc.
  • Engle Jr RL, Flehinger BJ (1987) Why expert systems for medical diagnosis are not being generally used: a valedictory opinion. Bulletin of the New York Academy of Medicine 63(2):193
  • Vance A (2011) This tech bubble is different. Bloomberg
  • Angwin J (2016) Machine bias: Risk assessments in criminal sentencing. ProPublica
  • Levin S (2016) A beauty contest was judged by AI and the robots didn’t like dark skin. The Guardian
  • Solove DJ (2001) Privacy and power: Computer databases and metaphors for information privacy. Stanford Law Review pp 1393–1462
  • Marthews A, Tucker C (2015) Government surveillance and internet search behavior. ssrn:2412564
  • Hannak A, Soeller G, Lazer D, Mislove A, Wilson C (2014) Measuring price discrimination and steering on e-commerce web sites. In: Proceedings of the 2014 Conference on Internet Measurement Conference, ACM, pp 305–318
  • Calo R (2013) Digital market manipulation. University of Washington School of Law Research Paper 2013-27 DOI 10.2139/ssrn.2309703 ssrn:2309703
  • Mayer JR, Mitchell JC (2012) Third-party web tracking: Policy and technology. In: Proceedings of the 2012 IEEE Symposium on Security and Privacy, IEEE, pp 413–427
  • Angwin J (2010) The web’s new gold mine: Your secrets. The Wall Street Journal
  • Lerner A, Simpson AK, Kohno T, Roesner F (2016) Internet jones and the raiders of the lost trackers: An archaeological study of web tracking from 1996 to 2016. In: Proceedings of the 25th USENIX Security Symposium (USENIX Security 16)
  • Laperdrix P, Rudametkin W, Baudry B (2016) Beauty and the beast: Diverting modern web browsers to build unique browser fingerprints. In: Proceedings of the 37th IEEE Symposium on Security and Privacy (S&P 2016)
  • Eckersley P (2010) How unique is your web browser? In: International Symposium on Privacy Enhancing Technologies Symposium, Springer, pp 1–18
  • Acar G, Van Alsenoy B, Piessens F, Diaz C, Preneel B (2015) Facebook tracking through social plug-ins. Technical report prepared for the Belgian Privacy Commission
  • Starov O, Gill P, Nikiforakis N (2016) Are you sure you want to contact us? quantifying the leakage of pii via website contact forms. In: Proceedings on Privacy Enhancing Technologies 2016(1):20–33
  • Krishnamurthy B, Naryshkin K, Wills C (2011) Privacy leakage vs. protection measures: the growing disconnect. In: Proceedings of the Web, vol 2, pp 1–10
  • Su J, Shukla A, Goel S, Narayanan A (2017) De-anonymizing web browsing data with social networks, manuscript
  • Barocas S, Nissenbaum H (2014) Big data’s end run around procedural privacy protections. In Communications of the ACM 57-11:31-33
  • Shilton K, Greene D (2016) Because privacy: defining and legitimating privacy in ios development. In IConference 2016 Proceedings
  • Storey G, Reisman D, Mayer J, Narayanan A (2016) The future of ad blocking: Analytical framework and new techniques, manuscript
  • Narayanan A (2016) Can Facebook really make ads unblockable? In Freedom to Tinker
  • Storey G (2016) Facebook ad highlighter.
  • Reisman D (2016) A peek at A/B testing in the wild. In Freedom to Tinker
  • Acar G, Juarez M, Nikiforakis N, Diaz C, Gürses S, Piessens F, Preneel B (2013) Fpdetective: dusting the web for fingerprinters. In: Proceedings of the 2013 ACM SIGSAC Conference on Computer & Communications Security, ACM, pp 1129–1140
  • Englehardt S, Narayanan A (2016) Online tracking: A 1-million-site measurement and analysis. In: Proceedings of the 2016 ACM SIGSAC Conference on Computer & Communications Security
  • Selenium HQ (2016) Selenium browser automation faq.
  • Acar G, Eubank C, Englehardt S, Juarez M, Narayanan A, Diaz C (2014) The web never forgets. In Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security (CCS ’14) DOI 10.1145/2660267.2660347
  • Mowery K, Shacham H (2012) Pixel perfect: Fingerprinting canvas in html5. In Proceedings of W2SP
  • (Valve) VV (2016) Fingerprintjs2 — modern & flexible browser fingerprinting library, a successor to the original fingerprintjs.
  • Olejnik Ł, Acar G, Castelluccia C, Diaz C (2015) The leaking battery. In: International Workshop on Data Privacy Management, Springer, pp 254–263
  • Englehardt S, Narayanan A (2016) Online tracking: A 1-million-site measurement and analysis. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security (CCS ’16)
  • Doty N (2016) Mitigating browser fingerprinting in web specifications.
  • Soltani A, Peterson A, Gellman B (2013) NSA uses Google cookies to pinpoint targets for hacking. In Washingtno Post.
  • Englehardt S, Reisman D, Eubank C, Zimmerman P, Mayer J, Narayanan A, Felten EW (2015) Cookies that give you away. In Proceedings of the 24th International Conference on World Wide Web (WWW ’15) DOI 10.1145/2736277.2741679
  • Angwin J (2016) Google has quietly dropped ban on personally identifiable web tracking. ProPublica
  • Reitman R (2012) What actually changed in Googles privacy policy. Electronic Frontier Foundation
  • Simonite T (2015) Facebooks like buttons will soon track your web browsing to target ads. MIT Technology Review
  • Federal Trade Commission (2015) Cross-device tracking.
  • Maggi F, Mavroudis V (2016) Talking behind your back attacks & countermeasures of ultrasonic cross-device tracking, In Proceedings of Blackhat
  • Angwin J (2014) Why online tracking is getting creepier. ProPublica
  • Vallina-Rodriguez N, Sundaresan S, Kreibich C, Paxson V (2015) Header enrichment or ISP enrichment? In Proceedings of the 2015 ACM SIGCOMM Workshop on Hot Topics in Middleboxes and Network Function Virtualization (HotMiddlebox ’15) DOI 10.1145/2785989.2786002
  • Disconnect (2016) Disconnect blocks new tracking device that makes your computer draw a unique image.
  • Foundation EF (2016) Privacy badger.
  • Thaler RH, Sunstein CR (2008) Nudge: improving decisions about health, wealth, and happiness. Yale University Press
  • Fleishman G (2015) Hands-on with content blocking safari extensions in ios 9. Macworld.
  • Blink, Chromium (2016) Owp storage team sync.
  • Lynch B (2012) Do not track in the windows 8 setup experience – microsoft on the issues. Microsoft on the Issues
  • Hern A (2016) Firefox disables loophole that allows sites to track users via battery status. The Guardian
  • Mozilla (2015) Tracking protection in private browsing.
  • Mozilla (2016) Security/contextual identity project/containers.
  • Federal Trade Commission (2012) Google will pay $22.5 million to settle FTC charges it misrepresented privacy assurances to users of apple’s safari internet browser.
  • Federal Trade Commission (2016) Children’s online privacy protection rule (“COPPA”).
  • New York State Office of the Attorney General (2016) A.G. schneiderman announces results of “operation child tracker,” ending illegal online tracking of children at some of nation’s most popular kids’ websites.
  • American Civil Liberties Union (2016) Sandvig v. Lynch.
  • Eubank C, Melara M, Perez-Botero D, Narayanan A (2013) Shining the floodlights on mobile web tracking a privacy survey.
  • CMU CHIMPS Lab (2015) Privacy grade: Grading the privacy of smartphone apps.
  • Vanrykel E, Acar G, Herrmann M, Diaz C (2016) Leaky birds: Exploiting mobile application traffic for surveillance. In Proceedings of Financial Cryptography and Data Security 2016
  • Lécuyer M, Ducoffe G, Lan F, Papancea A, Petsios T, Spahn R, Chaintreau A, Geambasu R (2014) Xray: Enhancing the webs transparency with differential correlation. In: Proceedings of the 23rd USENIX Security Symposium (USENIX Security 14), pp 49–64
  • Lecuyer M, Spahn R, Spiliopolous Y, Chaintreau A, Geambasu R, Hsu D (2015) Sunlight: Fine-grained targeting detection at scale with statistical confidence. In: Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, ACM, pp 554–566
  • Tschantz MC, Datta A, Datta A, Wing JM (2015) A methodology for information flow experiments. In: Proceedings of the 2015 IEEE 28th Computer Security Foundations Symposium, IEEE, pp 554–568
  • Datta A, Sen S, Zick Y (2016) Algorithmic transparency via quantitative input influence. In: Proceedings of 37th IEEE Symposium on Security and Privacy
  • Chen L, Mislove A, Wilson C (2015) Peeking beneath the hood of uber. In: Proceedings of the 2015 ACM Conference on Internet Measurement Conference, ACM, pp 495–508
  • Valentino-Devries J, Singer-Vine J, Soltani A (2012) Websites vary prices, deals based on users information. In The Wall Street Journal 10:60–68
  • Guide ASU (2016) Ui/application exerciser monkey.
  • Rastogi V, Chen Y, Enck W (2013) Appsplayground: automatic security analysis of smartphone applications. In: Proceedings of the third ACM Conference on Data and Application Security and Privacy, ACM, pp 209–220
  • Enck W, Gilbert P, Han S, Tendulkar V, Chun BG, Cox LP, Jung J, McDaniel P, Sheth AN (2014) Taintdroid: an information-flow tracking system for realtime privacy monitoring on smartphones. In ACM Transactions on Computer Systems (TOCS) 32(2):5
  • Ren J, Rao A, Lindorfer M, Legout A, Choffnes D (2015) Recon: Revealing and controlling privacy leaks in mobile network traffic. arXiv:1507.00255.
  • Razaghpanah A, Vallina-Rodriguez N, Sundaresan S, Kreibich C, Gill P, Allman M, Paxson V (2015) Haystack: in situ mobile traffic analysis in user space. arXiv:1510.01419.
  • Sweeney L (2013) Discrimination in online ad delivery. Queue 11(3):10
  • Caliskan-Islam A, Bryson J, Narayanan A (2016) Semantics derived auto-matically from language corpora necessarily contain human biases. arxiv:1608.07187.



De-Anonymizing Web Browsing Data with Social Networks | Su, Shukla, Goel, Narayanan

Jessica Su, Ansh Shukla, Sharad Goel, Arvind Narayanan; De-Anonymizing Web Browsing Data with Social Networks; draft; In Some Venue Surely (they will publish this somewhere, it is so very nicely formatted); 2017-05; 9 pages.


Can online trackers and network adversaries de-anonymize web browsing data readily available to them? We show—theoretically, via simulation, and through experiments on real user data—that de-identified web browsing histories can be linked to social media profiles using only publicly available data. Our approach is based on a simple observation: each person has a distinctive social network, and thus the set of links appearing in one’s feed is unique. Assuming users visit links in their feed with higher probability than a random user, browsing histories contain tell-tale marks of identity. We formalize this intuition by specifying a model of web browsing behavior and then deriving the maximum likelihood estimate of a user’s social profile. We evaluate this strategy on simulated browsing histories, and show that given a history with 30 links originating from Twitter, we can deduce the corresponding Twitter profile more than 50% of the time. To gauge the real-world effectiveness of this approach, we recruited nearly 400 people to donate their web browsing histories, and we were able to correctly identify more than 70% of them. We further show that several online trackers are embedded on sufficiently many websites to carry out this attack with high accuracy. Our theoretical contribution applies to any type of transactional data and is robust to noisy observations, generalizing a wide range of previous de-anonymization attacks. Finally, since our attack attempts to find the correct Twitter profile out of over 300 million candidates, it is—to our knowledge—the largest-scale demonstrated de-anonymization to date.


  • Ad Networks Can Personally Identify Web Users; Wendy Davis; In MediaPost; 2017-01-20.
    <quote> The authors tested their theory by recruiting 400 people who allowed their Web browsing histories to be tracked, and then comparing the sites they visited to sites mentioned in Twitter accounts they followed. The researchers say they were able to use that method to identify more than 70% of the volunteers.</quote>

Innovation and the Value of Privacy | Columbia University

Innovation and the Value of Privacy; At The Sanford C. Bernstein & Co. Center for Leadership and Ethics, Graduate School of Business, Columbia University; 2016-02-05.

tl;dr → (4 hours) a conference, a gathering, a happening, a celebration, a happy hour.


OpenWPM: An automated platform for web privacy measurement | Englehardt, Eubank, Zimmerman, Reisman, Narayanan

Steven Englehardt, Chris Eubank, Peter Zimmerman, Dillon Reisman, Arvind Narayanan; OpenWPM: An automated platform for web privacy measurement; draft; 2016-03-15; 12 pages.

tl;dr → yettanother crawl-and-report framework; like AdFisher, FourthParty, XRay, but different.  A survey of the previous work.


Web measurement techniques have been highly influential in online privacy debates and have brought transparency to the online tracking ecosystem. Due to its complexity, however, web privacy measurement remains a specialized research field. Our aim in this work is transform it into a widely available tool.

First, we analyze over 30 web privacy measurement studies, identify several methodological challenges for the experimenter, and discuss how to address them. Next, we present the design and implementation of OpenWPM, a flexible, modular web privacy measurement platform that can handle any experiment that maps to a general framework. It supports parallelism for speed and scale, automatic recovery from failures of the underlying browser, and realistic simulation of users. OpenWPM is open-source1 and has already been used as the basis of several published studies on web privacy and security. We show how our generic platform provides a common foundation for these diverse experiments, including a new study on the “filter bubble” which we present here.





63 references; the previous work.

Leaving Your Fingerprint In Your Code | Simplicity 2.0

Leaving Your Fingerprint In Your Code; In Simplicity 2.0; 2015.

Original Sources


How cookies can be used for global surveillance | Freedom to Tinker

; How cookies can be used for global surveillance; In Freedom to Tinker; 2014-12-18.

Original Sources

Web Privacy and Transparency Conference

Web Privacy and Transparency Conference; Center for Information Technology Policy at Princeton; 2014-10-24.

tl;dr => a 1-day session


The Web never forgets: Persistent tracking mechanisms in the wild | Acar, Eubank, Englehardt, Juarez, Narayanan, Diaz

Gunes Acar, Christian Eubank, Steven Englehardt, Marc Juarez, Arvind Narayanan, Claudia Diaz; The Web never forgets: Persistent tracking mechanisms in the wild; In Proceedings of the Conference on Computer & Communication Security (CCS); 2014-11, draft of 2014-07-24; landing.

Separately noted.