tl;dr → use Thunderbird with all plugins enabled (Ad Block, Cookie Block, etc.) or use Google Mail on the web.
We show that the simple act of viewing emails contains privacy pitfalls for the unwary. We assembled a corpus of commercial mailing-list emails, and find a network of hundreds of third parties that track email recipients via methods such as embedded pixels. About 30% of emails leak the recipient’s email address to one or more of these third parties when they are viewed. In the majority of cases, these leaks are intentional on the part of email senders, and further leaks occur if the recipient clicks links in emails. Mail servers and clients may employ a variety of defenses, but we analyze 16 servers and clients and find that they are far from comprehensive. We propose, prototype, and evaluate a new defense, namely stripping tracking tags from emails based on enhanced versions of existing web tracking protection lists.
tl;dr → There be dragons. Princeton was is there. Tell it! Testify!
When you browse the web, hidden “third parties” collect a large amount of data about your behavior. This data feeds algorithms to target ads to you, tailor your news recommendations, and sometimes vary prices of online products. The network of trackers comprises hundreds of entities, but consumers have little awareness of its pervasiveness and sophistication. This chapter discusses the findings and experiences of the Princeton Web Transparency Project, which continually monitors the web to uncover what user data companies collect, how they collect it, and what they do with it. We do this via a largely automated monthly “census” of the top 1 million websites, in effect “tracking the trackers”. Our tools and findings have proven useful to regulators and investigatory journalists, and have led to greater public awareness, the cessation of some privacy-infringing practices, and the creation of new consumer privacy tools. But the work raises many new questions. For example, should we hold websites accountable for the privacy breaches caused by third parties? The chapter concludes with a discussion of such tricky issues and makes recommendations for public policy and regulation of privacy.
Levin S (2016) A beauty contest was judged by AI and the robots didn’t like dark skin. The Guardian
Solove DJ (2001) Privacy and power: Computer databases and metaphors for information privacy. Stanford Law Review pp 1393–1462
Marthews A, Tucker C (2015) Government surveillance and internet search behavior. ssrn:2412564
Hannak A, Soeller G, Lazer D, Mislove A, Wilson C (2014) Measuring price discrimination and steering on e-commerce web sites. In: Proceedings of the 2014 Conference on Internet Measurement Conference, ACM, pp 305–318
Calo R (2013) Digital market manipulation. University of Washington School of Law Research Paper 2013-27 DOI 10.2139/ssrn.2309703 ssrn:2309703
Mayer JR, Mitchell JC (2012) Third-party web tracking: Policy and technology. In: Proceedings of the 2012 IEEE Symposium on Security and Privacy, IEEE, pp 413–427
Lerner A, Simpson AK, Kohno T, Roesner F (2016) Internet jones and the raiders of the lost trackers: An archaeological study of web tracking from 1996 to 2016. In: Proceedings of the 25th USENIX Security Symposium (USENIX Security 16)
Laperdrix P, Rudametkin W, Baudry B (2016) Beauty and the beast: Diverting modern web browsers to build unique browser fingerprints. In: Proceedings of the 37th IEEE Symposium on Security and Privacy (S&P 2016)
Eckersley P (2010) How unique is your web browser? In: International Symposium on Privacy Enhancing Technologies Symposium, Springer, pp 1–18
Acar G, Juarez M, Nikiforakis N, Diaz C, Gürses S, Piessens F, Preneel B (2013) Fpdetective: dusting the web for fingerprinters. In: Proceedings of the 2013 ACM SIGSAC Conference on Computer & Communications Security, ACM, pp 1129–1140
Englehardt S, Narayanan A (2016) Online tracking: A 1-million-site measurement and analysis. In: Proceedings of the 2016 ACM SIGSAC Conference on Computer & Communications Security
Acar G, Eubank C, Englehardt S, Juarez M, Narayanan A, Diaz C (2014) The web never forgets. In Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security (CCS ’14) DOI 10.1145/2660267.2660347
Mowery K, Shacham H (2012) Pixel perfect: Fingerprinting canvas in html5. In Proceedings of W2SP
(Valve) VV (2016) Fingerprintjs2 — modern & flexible browser fingerprinting library, a successor to the original fingerprintjs.
Olejnik Ł, Acar G, Castelluccia C, Diaz C (2015) The leaking battery. In: International Workshop on Data Privacy Management, Springer, pp 254–263
Englehardt S, Narayanan A (2016) Online tracking: A 1-million-site measurement and analysis. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security (CCS ’16)
Soltani A, Peterson A, Gellman B (2013) NSA uses Google cookies to pinpoint targets for hacking. In Washingtno Post.
Englehardt S, Reisman D, Eubank C, Zimmerman P, Mayer J, Narayanan A, Felten EW (2015) Cookies that give you away. In Proceedings of the 24th International Conference on World Wide Web (WWW ’15) DOI 10.1145/2736277.2741679
Angwin J (2016) Google has quietly dropped ban on personally identifiable web tracking. ProPublica
Angwin J (2014) Why online tracking is getting creepier. ProPublica
Vallina-Rodriguez N, Sundaresan S, Kreibich C, Paxson V (2015) Header enrichment or ISP enrichment? In Proceedings of the 2015 ACM SIGCOMM Workshop on Hot Topics in Middleboxes and Network Function Virtualization (HotMiddlebox ’15) DOI 10.1145/2785989.2786002
Vanrykel E, Acar G, Herrmann M, Diaz C (2016) Leaky birds: Exploiting mobile application traffic for surveillance. In Proceedings of Financial Cryptography and Data Security 2016
Lécuyer M, Ducoffe G, Lan F, Papancea A, Petsios T, Spahn R, Chaintreau A, Geambasu R (2014) Xray: Enhancing the webs transparency with differential correlation. In: Proceedings of the 23rd USENIX Security Symposium (USENIX Security 14), pp 49–64
Lecuyer M, Spahn R, Spiliopolous Y, Chaintreau A, Geambasu R, Hsu D (2015) Sunlight: Fine-grained targeting detection at scale with statistical confidence. In: Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, ACM, pp 554–566
Tschantz MC, Datta A, Datta A, Wing JM (2015) A methodology for information flow experiments. In: Proceedings of the 2015 IEEE 28th Computer Security Foundations Symposium, IEEE, pp 554–568
Datta A, Sen S, Zick Y (2016) Algorithmic transparency via quantitative input influence. In: Proceedings of 37th IEEE Symposium on Security and Privacy
Chen L, Mislove A, Wilson C (2015) Peeking beneath the hood of uber. In: Proceedings of the 2015 ACM Conference on Internet Measurement Conference, ACM, pp 495–508
Valentino-Devries J, Singer-Vine J, Soltani A (2012) Websites vary prices, deals based on users information. In The Wall Street Journal 10:60–68
Rastogi V, Chen Y, Enck W (2013) Appsplayground: automatic security analysis of smartphone applications. In: Proceedings of the third ACM Conference on Data and Application Security and Privacy, ACM, pp 209–220
Enck W, Gilbert P, Han S, Tendulkar V, Chun BG, Cox LP, Jung J, McDaniel P, Sheth AN (2014) Taintdroid: an information-flow tracking system for realtime privacy monitoring on smartphones. In ACM Transactions on Computer Systems (TOCS) 32(2):5
Ren J, Rao A, Lindorfer M, Legout A, Choffnes D (2015) Recon: Revealing and controlling privacy leaks in mobile network traffic. arXiv:1507.00255.
Razaghpanah A, Vallina-Rodriguez N, Sundaresan S, Kreibich C, Gill P, Allman M, Paxson V (2015) Haystack: in situ mobile traffic analysis in user space. arXiv:1510.01419.
Sweeney L (2013) Discrimination in online ad delivery. Queue 11(3):10
Caliskan-Islam A, Bryson J, Narayanan A (2016) Semantics derived auto-matically from language corpora necessarily contain human biases. arxiv:1608.07187.
Can online trackers and network adversaries de-anonymize web browsing data readily available to them? We show—theoretically, via simulation, and through experiments on real user data—that de-identified web browsing histories can be linked to social media profiles using only publicly available data. Our approach is based on a simple observation: each person has a distinctive social network, and thus the set of links appearing in one’s feed is unique. Assuming users visit links in their feed with higher probability than a random user, browsing histories contain tell-tale marks of identity. We formalize this intuition by specifying a model of web browsing behavior and then deriving the maximum likelihood estimate of a user’s social profile. We evaluate this strategy on simulated browsing histories, and show that given a history with 30 links originating from Twitter, we can deduce the corresponding Twitter profile more than 50% of the time. To gauge the real-world effectiveness of this approach, we recruited nearly 400 people to donate their web browsing histories, and we were able to correctly identify more than 70% of them. We further show that several online trackers are embedded on sufficiently many websites to carry out this attack with high accuracy. Our theoretical contribution applies to any type of transactional data and is robust to noisy observations, generalizing a wide range of previous de-anonymization attacks. Finally, since our attack attempts to find the correct Twitter profile out of over 300 million candidates, it is—to our knowledge—the largest-scale demonstrated de-anonymization to date.
Ad Networks Can Personally Identify Web Users; Wendy Davis; In MediaPost; 2017-01-20.
<quote> The authors tested their theory by recruiting 400 people who allowed their Web browsing histories to be tracked, and then comparing the sites they visited to sites mentioned in Twitter accounts they followed. The researchers say they were able to use that method to identify more than 70% of the volunteers.</quote>
tl;dr → yettanother crawl-and-report framework; like AdFisher, FourthParty, XRay, but different. A survey of the previous work.
Web measurement techniques have been highly influential in online privacy debates and have brought transparency to the online tracking ecosystem. Due to its complexity, however, web privacy measurement remains a specialized research field. Our aim in this work is transform it into a widely available tool.
First, we analyze over 30 web privacy measurement studies, identify several methodological challenges for the experimenter, and discuss how to address them. Next, we present the design and implementation of OpenWPM, a flexible, modular web privacy measurement platform that can handle any experiment that maps to a general framework. It supports parallelism for speed and scale, automatic recovery from failures of the underlying browser, and realistic simulation of users. OpenWPM is open-source1 and has already been used as the basis of several published studies on web privacy and security. We show how our generic platform provides a common foundation for these diverse experiments, including a new study on the “filter bubble” which we present here.