tl;dr → There is peril to display advertising systems, which are mid-sized linkbaitists and newspapers. Paywalls are indicated.
Two recent disruptions to the online advertising market are the widespread use of ad-blocking software and proposed restrictions on third-party tracking, trends that are driven largely by consumer concerns over privacy. Both primarily impact display advertising (as opposed to search and native social ads), and affect how retailers reach customers and how content producers earn revenue. It is, however, unclear what the consequences of these trends are. We investigate using anonymized web browsing histories of 14 million individuals, focusing on “retail sessions” in which users visit online sites that sell goods and services. We find that only 3% of retail sessions are initiated by display ads, a figure that is robust to permissive attribution rules and consistent across widely varying market segments. We further estimate the full distribution of how retail sessions are initiated, and find that search advertising is three times more important than display advertising to retailers, and search advertising is itself roughly three times less important than organic web search. Moving to content providers, we find that display ads are shown by 12% of websites, accounting for 32% of their page views; this reliance is concentrated in online publishing (e.g., news outlets) where the rate is 91%. While most consumption is either in the long-tail of websites that do not show ads, or sites like Facebook that show native, first-party ads, moderately sized web publishers account for a substantial fraction of consumption, and we argue that they will be most affected by changes in the display advertising market. Finally, we use estimates of ad rates to judge the feasibility of replacing lost ad revenue with a freemium or donation-based model.
Can online trackers and network adversaries de-anonymize web browsing data readily available to them? We show—theoretically, via simulation, and through experiments on real user data—that de-identified web browsing histories can be linked to social media profiles using only publicly available data. Our approach is based on a simple observation: each person has a distinctive social network, and thus the set of links appearing in one’s feed is unique. Assuming users visit links in their feed with higher probability than a random user, browsing histories contain tell-tale marks of identity. We formalize this intuition by specifying a model of web browsing behavior and then deriving the maximum likelihood estimate of a user’s social profile. We evaluate this strategy on simulated browsing histories, and show that given a history with 30 links originating from Twitter, we can deduce the corresponding Twitter profile more than 50% of the time. To gauge the real-world effectiveness of this approach, we recruited nearly 400 people to donate their web browsing histories, and we were able to correctly identify more than 70% of them. We further show that several online trackers are embedded on sufficiently many websites to carry out this attack with high accuracy. Our theoretical contribution applies to any type of transactional data and is robust to noisy observations, generalizing a wide range of previous de-anonymization attacks. Finally, since our attack attempts to find the correct Twitter profile out of over 300 million candidates, it is—to our knowledge—the largest-scale demonstrated de-anonymization to date.
Ad Networks Can Personally Identify Web Users; Wendy Davis; In MediaPost; 2017-01-20.
<quote> The authors tested their theory by recruiting 400 people who allowed their Web browsing histories to be tracked, and then comparing the sites they visited to sites mentioned in Twitter accounts they followed. The researchers say they were able to use that method to identify more than 70% of the volunteers.</quote>
Retailers regularly target users with online ads based on their web browsing activity, benefiting both the retailers, who can better reach potential customers, and content providers, who can increase ad revenue by displaying more effective ads. The effectiveness of such ads relies on third-party brokers that maintain detailed user information, prompting privacy legislation such as “do-not-track” that would limit or ban the practice. We gauge the economic costs of such privacy policies by analyzing the anonymized web browsing histories of 14 million individuals. We find that 3% of retail sessions are currently initiated by ads capable of incorporating third-party information. Turning to content providers, we find that one-third of traffic is supported by third-party capable advertising, and the rate is particularly high (91%) for online news sites. Finally, we show that for many of the most popular content providers, modest subscription fees of $1-$3 per month charged to loyal site users would be sufficient to replace ad revenue. We conclude that do-not-track legislation would impact, but not fundamentally fracture, the Internet economy.
<quote>browsing patterns reveal that ad revenue can generally be replaced by a small fraction of loyal visitors paying a modest subscription fee, on the order of $1–$2 per month.</quote>, page 31.
<quote>the economic benefits, though ostensibly amounting to billions of dollars, are substantially smaller than generally acknowledged.</quote>, page 35.
The Party Model
zero party → nonstandard defined as <quote>instances in which data on a user’s past actions are not directly involved in prompting the shopping session. Direct navigation falls into this category, as does clicking on an organic website link, or a link displayed on a coupon site.</quote>
first party → the publisher
third party → someone else.
Open loop theorizing on why CTR is low for 3rd-party ads, pages 31-33.