De-Anonymizing Web Browsing Data with Social Networks | Su, Shukla, Goel, Narayanan

Jessica Su, Ansh Shukla, Sharad Goel, Arvind Narayanan; De-Anonymizing Web Browsing Data with Social Networks; draft; In Some Venue Surely (they will publish this somewhere, it is so very nicely formatted); 2017-05; 9 pages.


Can online trackers and network adversaries de-anonymize web browsing data readily available to them? We show—theoretically, via simulation, and through experiments on real user data—that de-identified web browsing histories can be linked to social media profiles using only publicly available data. Our approach is based on a simple observation: each person has a distinctive social network, and thus the set of links appearing in one’s feed is unique. Assuming users visit links in their feed with higher probability than a random user, browsing histories contain tell-tale marks of identity. We formalize this intuition by specifying a model of web browsing behavior and then deriving the maximum likelihood estimate of a user’s social profile. We evaluate this strategy on simulated browsing histories, and show that given a history with 30 links originating from Twitter, we can deduce the corresponding Twitter profile more than 50% of the time. To gauge the real-world effectiveness of this approach, we recruited nearly 400 people to donate their web browsing histories, and we were able to correctly identify more than 70% of them. We further show that several online trackers are embedded on sufficiently many websites to carry out this attack with high accuracy. Our theoretical contribution applies to any type of transactional data and is robust to noisy observations, generalizing a wide range of previous de-anonymization attacks. Finally, since our attack attempts to find the correct Twitter profile out of over 300 million candidates, it is—to our knowledge—the largest-scale demonstrated de-anonymization to date.


  • Ad Networks Can Personally Identify Web Users; Wendy Davis; In MediaPost; 2017-01-20.
    <quote> The authors tested their theory by recruiting 400 people who allowed their Web browsing histories to be tracked, and then comparing the sites they visited to sites mentioned in Twitter accounts they followed. The researchers say they were able to use that method to identify more than 70% of the volunteers.</quote>

Innovation and the Value of Privacy | Columbia University

Innovation and the Value of Privacy; At The Sanford C. Bernstein & Co. Center for Leadership and Ethics, Graduate School of Business, Columbia University; 2016-02-05.

tl;dr → (4 hours) a conference, a gathering, a happening, a celebration, a happy hour.


OpenWPM: An automated platform for web privacy measurement | Englehardt, Eubank, Zimmerman, Reisman, Narayanan

Steven Englehardt, Chris Eubank, Peter Zimmerman, Dillon Reisman, Arvind Narayanan; OpenWPM: An automated platform for web privacy measurement; draft; 2016-03-15; 12 pages.

tl;dr → yettanother crawl-and-report framework; like AdFisher, FourthParty, XRay, but different.  A survey of the previous work.


Web measurement techniques have been highly influential in online privacy debates and have brought transparency to the online tracking ecosystem. Due to its complexity, however, web privacy measurement remains a specialized research field. Our aim in this work is transform it into a widely available tool.

First, we analyze over 30 web privacy measurement studies, identify several methodological challenges for the experimenter, and discuss how to address them. Next, we present the design and implementation of OpenWPM, a flexible, modular web privacy measurement platform that can handle any experiment that maps to a general framework. It supports parallelism for speed and scale, automatic recovery from failures of the underlying browser, and realistic simulation of users. OpenWPM is open-source1 and has already been used as the basis of several published studies on web privacy and security. We show how our generic platform provides a common foundation for these diverse experiments, including a new study on the “filter bubble” which we present here.





63 references; the previous work.

Leaving Your Fingerprint In Your Code | Simplicity 2.0

Leaving Your Fingerprint In Your Code; In Simplicity 2.0; 2015.

Original Sources


How cookies can be used for global surveillance | Freedom to Tinker

; How cookies can be used for global surveillance; In Freedom to Tinker; 2014-12-18.

Original Sources

Web Privacy and Transparency Conference

Web Privacy and Transparency Conference; Center for Information Technology Policy at Princeton; 2014-10-24.

tl;dr => a 1-day session


The Web never forgets: Persistent tracking mechanisms in the wild | Acar, Eubank, Englehardt, Juarez, Narayanan, Diaz

Gunes Acar, Christian Eubank, Steven Englehardt, Marc Juarez, Arvind Narayanan, Claudia Diaz; The Web never forgets: Persistent tracking mechanisms in the wild; In Proceedings of the Conference on Computer & Communication Security (CCS); 2014-11, draft of 2014-07-24; landing.

Separately noted.

Cookies that give you away: Evaluating the surveillance implications of web tracking | Reisman, Englehardt, Eubank, Zimmerman, Narayanan

Dillon Reisman, Steven Englehardt, Christian Eubank, Peter Zimmerman, Arvind Narayanan; Cookies That Give You Away: Evaluating The Surveillance Implications Of Web Tracking; draft; 2014-04-02; 22 pages.


We investigate the ability of a passive network observer to
leverage third-party HTTP tracking cookies for mass surveillance. If two
web pages embed the same tracker which emits a unique pseudonymous
identifier, then the adversary can link visits to those pages from the
same user (browser instance) even if the user’s IP address varies. Using
simulated browsing profiles, we cluster network traffic by transitively
linking shared unique cookies and estimate that for typical users over
90% of web sites with embedded trackers are located in a single connected
component. Furthermore, almost half of the most popular web pages
will leak a logged-in user’s real-world identity to an eavesdropper in
unencrypted traffic. Together, these provide a novel method to link an
identified individual to a large fraction of her entire web history. We
discuss the privacy consequences of this attack and suggest mitigation



Shining the Floodlights on Mobile Web Tracking — A Privacy Survey | Eubank, Melara, Perez-Botero, Narayanan

Christian Eubank, Marcela Melara, Diego Perez-Botero, Arvind Narayanan; Shining the Floodlights on Mobile Web Tracking — A Privacy Survey; In Proceedings of Web 2.0 Security & Privacy (W2SP); 2013-05-24; 9 pages.


We present the first published large-scale study of mobile web tracking. We compare tracking across five physical and emulated mobile devices with one desktop device as a benchmark. Our crawler is based on FourthParty; however, our architecture avoids clearing state which has the benefit of continual observation of (and by) third-parties. We confirm many intuitive predictions and report a few surprises. The lists of top third-party domains across different categories devices are substantially similar; we found surprisingly few mobile-specific ad networks. The use of JavaScript by tracking domains increases gradually as we consider more powerful devices. We also analyze cookie longevity by device. Finally, we analyze a curious phenomenon of cookies that are used to store information about the user’s browsing history on the client. Mobile tracking appears to be an under-researched area, and this paper is only a first step. We have made our code and data available at for others to build on.