“I wish I had that power,” ... while talking about the hack of Democratic National Committee emails. “Man, that would be power.”and that Snowden's ACLU lawyer, Ben Wizner said:
“I think many Americans are waking up to the fact we have created a presidency that is too powerful.”Below the fold, some thoughts on online surveillance and how it relates to the Open Access movement.
Governments Are Surveilling You OnlineGlenn Greenwald's Three New Scandals Show How Pervasive and Dangerous Mass Surveillance is in the West, Vindicating Snowden underlines the dangers of government's surveillance of everything everyone does online by pointing out that:
Earlier this month, a special British court that rules on secret spying activities issued an emphatic denunciation of the nation's domestic mass surveillance programs. The court found that "British security agencies have secretly and unlawfully collected massive volumes of confidential personal data, including financial information, on citizens for more than a decade." Those agencies, the court found, "operated an illegal regime to collect vast amounts of communications data, tracking individual phone and web use and other confidential personal information, without adequate safeguards or supervision for 17 years."and that:
On Thursday, an even more scathing condemnation of mass surveillance was issued by the Federal Court of Canada. The ruling "faulted Canada's domestic spy agency for unlawfully retaining data and for not being truthful with judges who authorize its intelligence programs." Most remarkable was that these domestic, mass surveillance activities were not only illegal, but completely unknown to virtually the entire population in Canadian democracy, even though their scope has indescribable implications for core liberties.and that:
law enforcement officials in Montreal are now defending "a highly controversial decision to spy on a La Presse columnist [Patrick Lagacé] by tracking his cellphone calls and texts and monitoring his whereabouts as part of a necessary internal police investigation." The targeted journalist, Lagacé, had enraged police officials by investigating their abusive conduct, and they then used surveillance technology to track his calls and movements to unearth the identity of his sources. Just as that scandal was exploding, it went, in the words of the Montreal Gazette, "from bad to worse" as the ensuing scrutiny revealed that police had actually "tracked the calls and movements of six journalists that year after news reports based on leaks revealed Michel Arsenault, then president of Quebec's largest labour federation, had his phone tapped."In the wake of Snowden's revelations everyone should assume that, whether or not they have legal authority to do so, governments (not just their own) are tracking everything they do online.
Companies Are Surveilling You OnlineOf course, everyone should also assume that, whether or not they gave permission, corporations (not just the ones to whose end-user license agreement they consented) are also tracking everything they do online. As usual, Maciej Cegłowski describes the situation aptly:
We're used to talking about the private and public sector in the real economy, but in the surveillance economy this boundary doesn't exist. Much of the day-to-day work of surveillance is done by telecommunications firms, which have a close relationship with government. The techniques and software of surveillance are freely shared between practitioners on both sides. All of the major players in the surveillance economy cooperate with their own country's intelligence agencies, and are spied on (very effectively) by all the others.and:
Just like industrialized manufacturing changed the relationship between labor and capital, surveillance capitalism is changing the relationship between private citizens and the entities doing the tracking. Our old ideas about individual privacy and consent no longer hold in a world where personal data is harvested on an industrial scale.Steven Englehardt and Arvind Narayanan's Online tracking: A 1-million-site measurement and analysis (also here) presents:
the largest and most detailed measurement of online tracking conducted to date, based on a crawl of the top 1 million websites. We make 15 types of measurements on each site, including stateful (cookie-based) and stateless (fingerprinting-based) tracking, the effect of browser privacy tools, and the exchange of tracking data between different sites ("cookie syncing"). Our findings include multiple sophisticated fingerprinting techniques never before measured in the wild.Englehardt and Narayanan's goal is to:
transform web privacy measurement into a widespread practice by creating a tool that is useful not just to our colleagues but also to regulators, self-regulators, the press, activists, and website operators, who are often in the dark about third-party tracking on their own domains. We also seek to lessen the burden of continual oversight of web tracking and privacy, by developing a robust and modular platform for repeated studies.
Although the ecosystem is very complex, it is subject to very strong increasing returns to scale. The benefit to a government or an advertiser of a panopticon tracking everyone is vastly greater than one tracking 1 in 10. The result, as Englehardt and Narayanan found, is consolidation:
Overall, our results show cause for concern, but also encouraging signs. In particular, several of our results suggest that while online tracking presents few barriers to entry, trackers in the tail of the distribution are found on very few sites and are far less likely to be encountered by the average user. Those at the head of the distribution, on the other hand, are owned by relatively few companies and are responsive to the scrutiny resulting from privacy studies.In fact, the consolidation they found is astonishing:
Our large scale allows us to answer a rather basic question: how many third parties are there? In short, a lot: the total number of third parties present on at least two first parties is over 81,000.So there are really only four primary commercial panopticons, but by cooperating with them many other, smaller panopticons can track effectively.
What is more surprising is that the prevalence of third parties quickly drops off: only 123 of these 81,000 are present on more than 1% of sites. This suggests that the number of third parties that a regular user will encounter on a daily basis is relatively small. The effect is accentuated when we consider that different third parties may be owned by the same entity. All of the top 5 third parties, as well as 12 of the top 20, are Google-owned domains. In fact, Google, Facebook, Twitter and AdNexus are the only third-party entities present on more than 10% of sites.
How Are You Being Surveilled?They, and the smaller ones, use two classes of instrumentation to track you, cookies and fingerprinting.
CookiesThe Same-Origin Policy for cookies is intended to prevent cookies from being read by domains that didn't set them:
A page can set a cookie for its own domain or any parent domain, as long as the parent domain is not a public suffix. ... The browser will make a cookie available to the given domain including any sub-domains, no matter which protocol (http/https) or port is used. ... When you read a cookie, you cannot see from where it was set.By doing so it was intended that cookies not be shared between domains. Why would the panopticons want to share cookies between domains? The goal of a tracker is to have the browser tell it the identity of the reader whatever page is being read. If cookies could be shared across domains, the tracker would set a cookie the first time, and read it from each other page. But the Same-Origin Policy means other pages won't return the cookie set by the first page, hence the panopticons' need to work around the policy and provide a page-independent ID for the user.
The result is cookie syncing:
Cookie syncing, a workaround to the Same-Origin Policy, allows different trackers to share user identifiers with each other. Besides being hard to detect, cookie syncing enables back-end server-to-server data merges hidden from public view, which makes it a privacy concern.Cookie syncing works in this way:
If tracker A wants to share its ID for a user with tracker B, it can do so in one of two ways: embedding the ID in the request URL to tracker B, or in the referer URL.Cookie syncing can be very effective at enabling surveillance:
From the Snowden leaks, we learnt that that NSA "piggybacks" on advertising cookies for surveillance and exploitation of targets. How effective can this technique be? We present one answer to this question. We consider a threat model where a surveillance agency has identified a target by a third-party cookie ... The adversary uses this identifier to coerce or compromise a third party into enabling surveillance or targeted exploitation.
We find that some cookies get synced over and over again to dozens of third parties; we call these promiscuous cookies. ... This means that if the adversary has identified a user by such a cookie, their ability to surveil or target malware to that user will be especially good. The most promiscuous cookie that we found belongs to the domain adverticum.net; it is synced or leaked to 82 other parties which are collectively present on 752 of the top 1,000 websites! In fact, each of the top 10 most promiscuous cookies is shared with enough third parties to cover 60% or more of the top 1,000 sites.
FingerprintingThe other way to provide a page-independent ID for the reader is fingerprinting. Narayanan, interviewed at fivethirtyeight.com reports:
Who Gets Tracking Data?Once an organization has tracking information about a user, it becomes an asset to be monetized. Google and Facebook use it to target ads, but they and the legions of less powerful trackers also sell the information to others, probably many times over. Cegłowski writes:
Surveillance capitalism has some of the features of a zero-sum game. The actual value of the data collected is not clear, but it is definitely an advantage to collect more than your rivals do. Because human beings develop an immune response to new forms of tracking and manipulation, the only way to stay successful is to keep finding novel ways to peer into people's private lives. And because much of the surveillance economy is funded by speculators, there is an incentive to try flashy things that will capture the speculators' imagination, and attract their money.The scale of dissemination is revealed by Englehardt and Narayanan's results about cookie syncing:
This creates a ratcheting effect where the behavior of ever more people is tracked ever more closely, and the collected information retained, in the hopes that further dollars can be squeezed out of it.
The most prolific cookie-syncing third party is [Google's] doubleclick.net - it shares 108 different cookies with 118 other third parties ... More interestingly, we find that the vast majority of top third parties sync cookies with at least one other party: 45 of the top 50, 85 of the top 100, 157 of the top 200, and 460 of the top 1,000. This adds further evidence that cookie syncing is an underappreciated and under-researched privacy concern.Information about your online behavior is very widely disseminated.
We also find that third parties are highly connected by synced cookies. Specifically, of the top 50 third parties that are involved in cookie syncing, the probability that a random pair will have at least one cookie in common is 85%. The corresponding probability for the top 100 is 66%.
What Could Be Done To Limit Tracking?Georgis Kontaxis and Monica Chew won "Best Paper" at the 2015 Web 2.0 Security and Privacy workshop for Tracking Protection in Firefox for Privacy and Performance (PDF). They demonstrated that Tracking Protection provided:
a 67.5% reduction in the number of HTTP cookies set during a crawl of the Alexa top 200 news sites. [and] a 44% median reduction in page load time and 39% reduction in data usage in the Alexa top 200 news site.Alas, Tracking Protection relies on human-curated lists of tracking domains to block, so it can't be completely effective.
Even if you aren't worried about governments and corporations tracking your every move online, your Web experience is still being impaired by trackers. Typically at least a third of all the data you receive while browsing has no visible effect and doesn't contribute to your user experience (but does track your behavior). Thus limiting tracking would definitely make your life better.
What Could Be Done To Limit Dissemination?Assuming that we can't eliminate tracking, perhaps the best that can be done is to limit the flow of tracking data through the ecosystem. Jack Balkin and Jonathan Zittrain's A Grand Bargain to Make Tech Companies Trustworthy suggests a legal framework to address the dissemination problem, which they characterize thus:
As we use these services, they learn more and more about us. They see who we are, but we are unable to see into their operations or understand how they use our data. As a result, we have to trust online services, but we have no real guarantees that they will not abuse our trust. Companies share information about us in any number of unexpected and regrettable ways, and the information and advice they provide can be inconspicuously warped by the companies' own ideologies or by their relationships with those who wish to influence us, whether people with money or governments with agendas.They use the analogy of fiduciaries such as doctors, lawyers, and accountants:
Like older fiduciaries, these businesses have become virtually indispensable. Like older fiduciaries, these companies collect a lot of personal information that could be used to our detriment. And like older fiduciaries, these businesses enjoy a much greater ability to monitor our activities than we have to monitor theirs. As a result, many people who need these services often shrug their shoulders and decide to trust them. But the important question is whether these businesses, like older fiduciaries, have legal obligations to be trustworthy. The answer is that they should.And the analogy with the bargain between copyright owners and users that underlies the Digital Millennium Copyright Act (DMCA):
Congress could respond with a "Digital Millennium Privacy Act" that offers a parallel trade-off to that of the DMCA: accept the federal government's rules of fair dealing and gain a safe harbor from uncertain legal liability, or stand pat with the status quo.This might help prevent perhaps the most troubling aspect of this corporate surveillance, the way information collected by these primary panopticons is disseminated through the ecosystem. The concentration of tracking means that even if only the big 4 accepted a fiduciary responsibility, the diffusion of collected information would be greatly reduced.
The DMPA would provide a predictable level of federal immunity for those companies willing to subscribe to the duties of an information fiduciary and accept a corresponding process to disclose and redress privacy and security violations. As with the DMCA, those companies unwilling to take the leap would be left no worse off than they are today - subject to the tender mercies of state and local governments. But those who accept the deal would gain the consistency and calculability of a single set of nationwide rules. Even without the public giving up on any hard-fought privacy rights recognized by a single state, a company could find that becoming an information fiduciary could be far less burdensome than having to respond to multiple and conflicting state and local obligations.
Why Are The Trackers Tracking You?Tim Wu has a new book, The Attention Merchants: The Epic Scramble to Get Inside Our Heads. His account of the evolution of the attention economy that is driving this corporate surveillance is well worth reading. He starts his historical survey in September 1833 with the launch of Benjamin Day's New York Sun:
rival papers could not at first fathom out how the Sun was able to charge less, provide more news, reach a larger audience, and still come out ahead. What Day had figued out was that newsstand earnings were trivial; advertising revenue could make it all happen.
The switch is a big problem for society; the newspapers used some of the money to report the actual news, but Google and Facebook feel no such social responsibility. As Joshua Benton writes:
I’m from a small town in south Louisiana. The day before the election, I looked at the Facebook page of the current mayor. Among the items he posted there in the final 48 hours of the campaign: Hillary Clinton Calling for Civil War If Trump Is Elected. Pope Francis Shocks World, Endorses Donald Trump for President. Barack Obama Admits He Was Born in Kenya. FBI Agent Who Was Suspected Of Leaking Hillary’s Corruption Is Dead.
These are not legit anti-Hillary stories. (There were plenty of those, to be sure, both on his page and in this election cycle.) These are imaginary, made up, frauds. And yet Facebook has built a platform for the active dispersal of these lies - in part because these lies travel really, really well. (The pope’s “endorsement” has over 868,000 Facebook shares. The Snopes piece noting the story is fake has but 33,000.)
But What About Academic Journals?Englehardt and Narayanan found that:
sites on the low end of the [tracking prevalence] spectrum are mostly sites which belong to government organizations, universities, and non-profit entities.Despite this, and using less rigorous techniques than theirs, more than 18 months ago Eric Hellman nevertheless found that 16 of the top 20 Research Journals Let Ad Networks Spy on Their Readers. Academic publishers are not immune to the lure of profit from selling their audience to the "attention economy". Why should researchers be concerned about being sold in this way?
There are the obvious privacy issues, and not just personal privacy. For example, Rick Luce at the Los Alamos National Labs set up a system that ingested and re-published academic journals for Federal researchers. The goal was to prevent non-Federal bodies observing the articles that Federal researchers working on classified projects were reading. From that information clues could be gathered as to what classified projects were working on. Pharma companies have similar concerns.
- AdGholas, which may have been running since 2013 attacking a million visits a day to:
113 domains, including some big names such as The New York Times, Le Figaro, The Verge, PCMag, IBTimes, ArsTechnica, Daily Mail, Telegraaf, La Gazetta dello Sport, CBS Sports, Top Gear, Urban Dictionary, Playboy, Answers.com, Sky.com, and more.
- The demonstration of HEIST, which showed:
- VirtualDonna, which ran on 3000 top Japanese sites and hit 100K visits/day.
- GooNky, a similar campaign that included abusing a certificate authority to allow encrypted traffic.
But there is at least one more, less obvious issue. I have written before about the priority of the publishing oligopoly to ensure they control the only easily accessible copy even of open access content, mentioning the value in the Web world of page views. But I didn't sufficiently appreciate where the value came from. It isn't just the visible value of being able to show the reader ads. It is the invisible, but probably greater, value of being able to sell the ability to track their readers' visits to the surveillance companies. Readers of academic, especially STEM, journals are high-priority targets for both commercial and governmental reasons.
Thus it is likely that discussions of various open access models, and the role of institutional repositories (IRs) have misunderstood the business model they sought to disrupt. This insight explains, for example, why Elsevier is so determined that IRs contain only metadata, not actual content, and why buying SSRN was a sound investment. Both enhance Elsevier's value to the panopticons. It would be very interesting to know whether the number of trackers on SSRN has increased since it was purchased.
ConclusionI'll leave the last word to Maciej Cegłowski, who was interviewed last Friday by Russell Brandom at The Verge:
Outside of intelligence services, police typically obtain company data with a court-ordered warrant, and most companies freely admit to filling lawful requests for data in that form. It also gives users some security: Trump or no, a judge has to sign off on probable cause before the warrant can be issued. As the prospect of another encryption battle looms, the warrant process may be the one part of the system that both sides can agree on.Back in June Cegłowski wrote:
But for Ceglowski, the presumption of the rule of law simply may not apply under Trump. “I don’t think the US is going to turn into a lawless state overnight, but look at the dynamic in places like Russia,” he said. “He’s not going to care enough to prevent such abuses at a lower level and certainly he’s going to protect anyone who’s taken to task for them.”
“I hate to sound fear-mongering but I’m from Poland,” he continued. “This fits a pattern that I recognize. It’s just that it hasn’t happened before in the United States.”
the surveillance economy is way too dangerous. Even if you trust everyone spying on you right now, the data they're collecting will eventually be stolen or bought by people who scare you. We have no ability to secure large data collections over time.
The goal should be not to make the apparatus of surveillance politically accountable (though that is a great goal), but to dismantle it. Just like we don't let countries build reactors that produce plutonium, no matter how sincere their promises not to misuse it, we should not allow people to create and indefinitely store databases of personal information. The risks are too high.
I think a workable compromise will be to allow all kinds of surveillance, but limit what anyone is allowed to store or sell.