Friday, March 17, 2017

The Amnesiac Civilization: Part 4

Part 2 and Part 3 of this series covered the unsatisfactory current state of Web archiving. Part 1 of this series briefly outlined the way the W3C's Encrypted Media Extensions (EME) threaten to make this state far worse. Below the fold I expand on the details of this threat.

The W3C's abstract describes EME thus:
This proposal extends HTMLMediaElement [HTML5] providing APIs to control playback of encrypted content.

The API supports use cases ranging from simple clear key decryption to high value video (given an appropriate user agent implementation). License/key exchange is controlled by the application, facilitating the development of robust playback applications supporting a range of content decryption and protection technologies.
The next paragraph is misleading; EME not merely enables DRM, it mandates at least an (insecure) baseline implementation of encrypted content:
This specification does not define a content protection or Digital Rights Management system. Rather, it defines a common API that may be used to discover, select and interact with such systems as well as with simpler content encryption systems. Implementation of Digital Rights Management is not required for compliance with this specification: only the Clear Key system is required to be implemented as a common baseline.
The Clear Key system requires that content be encrypted, but the keys to decrypt it are passed in cleartext. I will return to the implications of this requirement.

EME data flows
The W3C's diagram of the EME stack shows an example of how it works. An application, i.e. a Web page, requests the browser to render some encrypted content. It is delivered, in this case from a Content Distribution Network (CDN), to the browser. The browser needs a license to decrypt it, which it obtains from the application via the EME API by creating an appropriate session then using it to request the license. It hands the content and the license to a Content Decryption Module (CDM), which can decrypt the content using a key in the license and render it.

What is DRM trying to achieve? Ostensibly, it is trying to ensure that each time DRM-ed content is rendered, specific permission is obtained from the content owner. In order to ensure that, the CDM cannot trust the browser it is running in. For example, it must be sure that the browser can see neither the decrypted content nor the key. If it could see, and save for future use, either it would defeat the purpose of DRM.

The CDM is running in an environment controlled by the user, so the mechanisms a DRM implementation uses to obscure the decrypted content and the key from the environment are relatively easy to subvert. This is why in practice most DRM technologies are "cracked" fairly quickly after deployment. As Bunnie Huang's amazing book about cracking the DRM of the original Xbox shows, it is very hard to defeat a determined reverse engineer.

Content owners are not stupid. They realized early on that the search for uncrackable DRM was a fool's errand. So, to deter reverse engineering, they arranged for the 1998 Digital Millenium Copyright Act (DMCA) to make any attempt to circumvent protections on digital content a criminal offense. Cory Doctorow explains what this strategy achieves:
So if DRM isn't anti-piracy, what is it? DRM isn't really a technology at all, it's a law. Specifically, it's section 1201 of the US DMCA (and its international equivalents). Under this law, breaking DRM is a crime with serious consequences (5 years in prison and a $500,000 fine for a first offense), even if you're doing something that would otherwise be legal. This lets companies treat their commercial strategies as legal obligations: Netflix doesn't have the legal right to stop you from recording a show to watch later, but they can add DRM that makes it impossible to do so without falling afoul of DMCA.

This is the key: DRM makes it possible for companies to ban all unauthorized conduct, even when we're talking about using your own property in legal ways. This intrudes on your life in three ways:
  1. It lets companies sue and threaten security researchers who find defects in products
  2. It lets companies sue and threaten accessibility workers who adapt technology for use by disabled people
  3. It lets companies sue and threaten competitors who want to let you do more with your property -- get it repaired by independent technicians, buy third-party parts and consumables, or use it in ways that the manufacturer just doesn't like.
Of course, among the "ways that the manufacturer just doesn't like" can be archiving.

IANAL, but I do not believe that it is a defense under the DMCA that the "protections" in question are made of tissue paper. Thus, for example, it is likely that even an attempt to reverse-engineer an implementation of EME's Clear Key system in order to preserve the plaintext of some encrypted content would risk severe criminal penalties. Would an open source implementation of Clear Key be legal?

It is this interaction between even purely nominal DRM mechanisms and the DMCA that has roused opposition to EME. J. M. Porup's A battle rages for the future of the Web is an excellent overview of the opposition and its calls on Tim Berners-Lee to decry EME. Once he had endorsed it, Glyn Moody wrote a blistering takedown of his reasoning in Tim Berners-Lee Endorses DRM In HTML5, Offers Depressingly Weak Defense Of His Decision. He points to the most serious problem EME causes:
Also deeply disappointing is Berners-Lee's failure to recognize the seriousness of the threat that EME represents to security researchers. The problem is that once DRM enters the equation, the DMCA comes into play, with heavy penalties for those who dare to reveal flaws, as the EFF explained two years ago.
How do we know that this is the most serious problem? Because, like all the other code running in your browser, the DRM implementations have flaws and vulnerabilities. For example:
Google's CDM is Widevine, a technology it acquired in 2010. David Livshits, a security researchers at Ben-Gurion University and Alexandra Mikityuk from Berlin's Telekom Innovation Laboratories, discovered a vulnerability in the path from the CDM to the browser, which allows them to capture and save videos after they've been decrypted. They've reported this bug to Google, and have revealed some proof-of-concept materials now showing how it worked (they've withheld some information while they wait for Google to issue a fix).

Widevine is also used by Opera and Firefox (Firefox also uses a CDM from Adobe).

Under German law -- derived from Article 6 of the EUCD -- Mikityuk could face criminal and civil liability for revealing this defect, as it gives assistance to people wishing to circumvent Widevine. Livshits has less risk, as Israel is one of the few major US trading partners that has not implemented an "anti-circumvention" law, modelled on the US DMCA and spread by the US Trade Representative to most of the world.
Note that we (and Google) only know about this flaw because one researcher was foolhardy and another was from Israel. Many other flaws remain unrevealed:
The researchers who revealed the Widevine/Chrome defect say that it was likely present in the browser for more than five years, but are nevertheless the first people to come forward with information about its flaws. As many esteemed security researchers from industry and academe told the Copyright Office last summer, they routinely discover bugs like this, but don't come forward, because of the potential liability from anti-circumvention law.
Glyn Moody again:
The EFF came up with a simple solution that would at least have limited the damage the DMCA inflicts here:
a binding promise that W3C members would have to sign as a condition of continuing the DRM work at the W3C, and once they do, they not be able to use the DMCA or laws like it to threaten security researchers.
Alas, Cory Doctorow again:
How do we know that companies only want DRM because they want to abuse this law, and not because they want to fight piracy? Because they told us so. At the W3C, we proposed a compromise: companies who participate at W3C would be allowed to use it to make DRM, but would have to promise not to invoke the DMCA in these ways that have nothing to do with piracy. So far, nearly 50 W3C members -- everyone from Ethereum to Brave to the Royal National Institute for Bind People to Lawrence Berkeley National Labs -- have endorsed this, and all the DRM-supporting members have rejected it.

In effect, these members are saying, "We understand that DRM isn't very useful for stopping piracy, but that law that lets us sue people who aren't breaking copyright law? Don't take that away!"
Its not as though, as an educated Web user, you can decide that you don't want to take the risks inherent in using a browser that doesn't trust you, or the security researchers you depend upon. In theory Web DRM is optional, but in practice it isn't. Lucian Armasu at Tom's Hardware explains:
The next stable version of Chrome (Chrome 57) will not allow users to disable the Widevine DRM plugin anymore, therefore making it an always-on, permanent feature of Chrome. The new version of Chrome will also eliminate the “chrome://plugins” internal URL, which means if you want to disable Flash, you’ll have to do it from the Settings page.
You definitely want to disable Flash. To further "optimize the user experience":
So far only the Flash plugin can be disabled in the Chrome Settings page, but there is no setting to disable the Widevine DRM plugin, nor the PDF viewer and the Native Client plugins. PDF readers, including the ones that are built into browsers, are major targets for malicious hackers. PDF is a “powerful” file format that’s used by many, and it allows hackers to do all sorts of things given the right vulnerability.

People who prefer to open their PDF files in a better sandboxed environment or with a more secure PDF reader, rather than in Chrome, will not be able to do that anymore. All PDF files will always open in Chrome’s PDF viewer, starting with Chrome 57.
But that's not what I came to tell you about. Came to talk about the draft archiving.

I fully appreciate the seriousness of the security threat posed by EME, but it tends to overwhelm discussion of EME's other impacts. I have long been concerned about the impact of Digital Rights Management on archiving. I first wrote about the way HTML5 theoretically enabled DRM for the Web in 2011's Moonalice plays Palo Alto:
Another way of expressing the same thought is that HTML5 allows content owners to implement a semi-effective form of DRM for the Web.
That was then, but now theory is practice. Once again, Glyn Moody is right on target:
One of the biggest problems with the defense of his position is that Berners-Lee acknowledges only in passing one of the most serious threats that DRM in HTML5 represents to the open Web. Talking about concerns that DRM for videos could spread to text, he writes:
For books, yes this could be a problem, because there have been a large number of closed non-web devices which people are used to, and for which the publishers are used to using DRM. For many the physical devices have been replaced by apps, including DRM, on general purpose devices like closed phones or open computers. We can hope that the industry, in moving to a web model, will also give up DRM, but it isn't clear.
So he admits that EME may well be used for locking down e-book texts online. But there is no difference between an e-book text and a Web page, so Berners-Lee is tacitly admitting that DRM could be applied to basic Web pages. An EFF post spelt out what that would mean in practice:
A Web where you cannot cut and paste text; where your browser can't "Save As..." an image; where the "allowed" uses of saved files are monitored beyond the browser; where JavaScript is sealed away in opaque tombs; and maybe even where we can no longer effectively "View Source" on some sites, is a very different Web from the one we have today.
It's also totally different from the Web that Berners-Lee invented in 1989, and then generously gave away for the world to enjoy and develop. It's truly sad to see him acquiescing in a move that could destroy the very thing that made the Web such a wonderfully rich and universal medium -- its openness.
The EFF's post (from 2013) had several examples of EME "mission creep" beyond satisfying Netflix:
Just five years ago, font companies tried to demand DRM-like standards for embedded Web fonts. These Web typography wars fizzled out without the adoption of these restrictions, but now that such technical restrictions are clearly "in scope," why wouldn't typographers come back with an argument for new limits on what browsers can do?

Indeed, within a few weeks of EME hitting the headlines, a community group within W3C formed around the idea of locking away Web code, so that Web applications could only be executed but not examined online. Static image creators such as photographers are eager for the W3C to help lock down embedded images. Shortly after our Tokyo discussions, another group proposed their new W3C use-case: "protecting" content that had been saved locally from a Web page from being accessed without further restrictions. Meanwhile, publishers have advocated that HTML textual content should have DRM features for many years.
Web archiving consists of:
content ... saved locally from a Web page ... being accessed without further restrictions.
It appears that the W3C's EME will become, in effect, a mandatory feature of the Web. Obviously, the first effect is that much Web video will be DRM-ed, making it impossible to collect in replayable form and thus preserve. Google's making Chrome's video DRM impossible to disable suggests that YouTube video will be DRM-ed. Even a decade ago, to study US elections you needed YouTube video.

But that's not the big impact that EME will have on society's memory. It will spread to other forms of content. The business models for Web content are of two kinds, and both are struggling:
  • Paywalled content. It turns out that, apart from movies and academic publishing, only a very few premium brands such as The Economist, the Wall Street Journal and the New York Times have viable subscription business models based on (mostly) paywalled content. Even excellent journalism such as The Guardian is reduced to free access, advertising and voluntary donations. Part of the reason is that Googling the headline of paywalled news stories often finds open access versions of the content. Clearly, newspapers and academic publishers would love to use Web DRM to ensure that their content could be accessed only from their site, not via Google or Sci-Hub.
  • Advertising-supported content. The market for Web advertising is so competitive and fraud-ridden that Web sites have been forced into letting advertisers run ads that are so obnoxious and indeed riddled with malware, and to load up their sites with trackers, that many users have rebelled and use ad-blockers. These days it is pretty much essential to do so, to keep yourself safe and to reduce bandwidth consumption. Sites are very worried about the loss of income from blocked ads. Some, such as Forbes, refuse to supply content to browsers that block ads (which, in Forbes case, turned out to be a public service; the ads carried malware). DRM-ing a site's content will prevent ads being blocked. Thus ad space on DRM-ed sites will be more profitable, and sell for higher prices, than space on sites where ads can be blocked. The pressure on advertising-supported sites, which include both free and subscription news sites, to DRM their content will be intense.
Thus the advertising-supported bulk of what we think of as the Web, and the paywalled resources such as news sites that future scholars will need will become un-archivable. Kalev Leetaru will need to add a fourth, even more outraged, item to his list of complaints about Web archives.

The prospect for academic journals is somewhat less dire. Because the profit margins of the big publishers are so outrageous, and because charging extortionate subscriptions for access to the fruits of publicly and charitably-funded research so hard to justify, they are willing to acquiesce in the archiving of their content provided it doesn't threaten their bottom line. The big publishers typically supply archives such as Portico and CLOCKSS with content through non-Web channels. CLOCKSS is a dark archive, so is no threat to the bottom line. Portico's post-cancellation and audit facilities can potentially leak content, so Portico will come under pressure to DRM content supplied to its subscribers.

Almost all the world's Web archiving technology is based on Linux or other Open Source operating systems. There is a good reason for this, as I wrote back in 2014:
One thing it should be easy to agree on about digital preservation is that you have to do it with open-source software; closed-source preservation has the same fatal "just trust me" aspect that closed-source encryption (and cloud storage) suffer from.
Lucian Armasu at Tom's Hardware understands the issue:
there may also be an oligopoly issue, because the content market will depend on four, and perhaps soon only three, major DRM services players: Google, Microsoft, and Apple. All of these companies have their own operating systems, so there is also less incentive for them to support other platforms in their DRM solutions.

What that means in practice is that if you choose to use a certain Linux distribution or some completely new operating system, you may not be able to play protected content, unless Google, Microsoft, or Apple decide to make their DRM work on that platform, too.
So it may not even be possible for Web archives to render the content even if the owner wished to give them permission.

19 comments:

David. said...

Peter Bright at Ars Technica reports that DRM in HTML5 takes its next step toward standardization:

"The World Wide Web Consortium (W3C), the standards body that oversees most Web-related specifications, has moved the EME specification to the Proposed Recommendation stage.

The next and final stage is for the W3C's Advisory Committee to review the proposal. If it passes review, the proposal will be blessed as a full W3C Recommendation."

David. said...

Cory Doctorow's comment on this "next step" is here.

David. said...

Firefox is now "Netflix approved" i.e. it has HTML5 DRM support. Once I've upgraded I'll see if I can turn it off.

David. said...

Firefox is now "Netflix approved" i.e. it has HTML5 DRM support. Once I've upgraded I'll see if I can turn it off.

David. said...

UNESCO has joined in the chorus of calls to W3C not to cave in to the copyright industries:

"Caution has been expressed by Frank La Rue, Assistant Director General for Communication-Information, in a letter sent to the W3C, a standards-setting body that is considering a change to internet browsing with potentially far-reaching consequences.

The technical change is known as Encrypted Media Extensions” (EME), which would become part of the HTML 5 code for the World Wide Web, and therefore standardize how web browsers deal with encrypted video content.

Encryption of video content is something that largely serve the interests of the copyright industry, but it also has significance for network security and content integrity.

If agreed, the new EME standard could mean that internet browsers might increasingly “act as a framed gateway rather than serving as intrinsically open portals”, said La Rue."

David. said...

At least in Ubuntu, the "Play DRM Content" preference appears to be off by default.

The explanation for DRM in Firefox is here. It includes the statement:

"Similar opt-out capabilities will be offered on all new platforms where Firefox supports DRM."

David. said...

Cory Doctorow reports that:

"German Member of the European Parliament Julia Reda ... has published an open-letter signed by UK MEP Lucy Anderson, raising alarm at the fact that the W3C is on the brink of finalising a DRM standard for web video, which -- thanks to crazy laws protecting DRM -- will leave users at risk of unreported security vulnerabilities, and also prevent third parties from adapting browsers for the needs of disabled people, archivists, and the wider public."

David. said...

Yay, Portugal! Glyn Moody points to a TorrentFreak report that:

"Portugal's parliament has approved a bill that will restrict how Digital Rights Management is applied to some creative works, including those in the public domain or funded by public entities. Even when DRM is present, citizens will be able to circumvent the protection for education and private copying purposes."

David. said...

Cory Doctorow points to a letter from Tim Wu to Tim Berners-Lee drawing the parallel between the anti-competitive effects of DRM for the Web, and those of the removal of net neutrality requirements.

David. said...

DRM has doomed all the digirabbits in Second Life:

"Every Ozimal digirabbit in the venerable virtual world Second Life will starve to death (well, permanent hibernation) this week because a legal threat has shut down their food-server, and the virtual pets are designed so that they can only eat DRM-locked food, so the official food server's shutdown has doomed them all."

Perhaps the awful fate of all the cuddly bunnies will soften hearts at W3C.

David. said...

In discussions at the recent Web Archiving Conference, Andy Jackson pointed out that the British Library has legal authority to ask publishers for DRM-free versions of their content for archival purposes.

This is good, but it doesn't affect my argument for two reasons. First, the DRM-free content acquired by the BL would be accessible only to scholars physically at the BL, so it isn't useful in the way public Web archives are. Second, the cost of negotiating with individual publishers and setting up individual ingest pipelines for their DRM-free content would be prohibitive at scale, it would only be feasible for very important content such as newspapers.

David. said...

At The Register, Thomas Claburn's Web inventor Sir Tim sizes up handcuffs for his creation – and world has 2 weeks to appeal reports:

"Speaking on behalf of Berners-Lee in a note posted to the W3C mailing list, project management lead Philippe Le Hégaret said, "After consideration of the issues, the Director reached a decision that the EME specification should move to W3C Recommendation."
...
EMEs will be published as a W3C Recommendation unless at least five per cent of the 475 members of the W3C Advisory Committee – composed of companies, non-profits, and educational organizations – support an appeal within 14 days. If an appeal it considered, members will have the opportunity to vote on whether to accept or reject the technology."

David. said...

See also Mike Masnick and Cory Doctorow.

David. said...

The EFF has appealed the W3C's decision.

And Cory Doctorow points out the connection between the Net Neutrality and Web DRM fights.

David. said...

The legal threat is not just the DMCA. At Ars Technica, Timothy B. Lee's LinkedIn: It’s illegal to scrape our website without permission reports on the threat the Computer Fraud and Abuse Act poses for Web archiving:

"HiQ scrapes data about thousands of employees from public LinkedIn profiles, then packages the data for sale to employers worried about their employees quitting. LinkedIn, which was acquired by Microsoft last year, sent hiQ a cease-and-desist letter warning that this scraping violated the Computer Fraud and Abuse Act, the controversial 1986 law that makes computer hacking a crime. HiQ sued, asking courts to rule that its activities did not, in fact, violate the CFAA."

LinkedIn could have used robots.txt to ban the crawl, crawling would then violate the DMCA. But the CFAA's ban on "unauthorized access" is very vague and troubling. Orrin Kerr:

"argues sites wanting to limit access to their site should be required to use a technical mechanism like a password to signal that the website is not, in fact, available to the public.

"It's hugely problematic to let the subjective wishes of the website owner and not their objective action" determine what's legal, Kerr told Ars."

David. said...

IFLA, the International Federation of Library Associations and Institutions, has asked W3C to reconsider EME:

"Technological protection measures (TPMs) play a useful role in tackling copyright infringement, complementing legal provisions. However, they do not always stop at preventing illicit activities, and can often serve to stop libraries and their users from making fair uses of works. This can affect activities such as preservation, or inter-library document supply. To make it easier to apply TPMs, regardless of the nature of activities they are preventing, is to risk unbalancing copyright itself."

David. said...

EME, DRM for the Web, is now an official W3C Recommendation:

"Final approval came after the W3C's members voted 58.4 percent to approve the spec, 30.8 percent to oppose, with 10.8 percent abstaining."

David. said...

Who could possibly have predicted this? In After years of insisting that DRM in HTML wouldn't block open source implementations, Google says it won't support open source implementations, Cory Doctorow reports as follows:

" The bitter, yearslong debate at the World Wide Web Consortium over a proposal to standardize DRM for web browsers included frequent assurances by the pro-DRM side (notably Google, whose Widevine DRM was in line to be the principal beneficiary) that this wouldn't affect the ability of free/open source authors to implement the standard.

The absurd figleaf used to justify this was a reference implementation of EME in open source that only worked on video that didn't have the DRM turned on. The only people this impressed were people who weren't paying attention or lacked the technical depth to understand that a tool that only works under conditions that are never seen in the real world was irrelevant to real-world conditions.

Now the real world has arrived, and it was just as predicted."

Specifically:

"Maddock wanted to allow his users to do this with the videos they pay to watch on Widevine-restricted services like Hulu and Netflix, so he applied to Google for a license to implement Widevine in his browser. Four months later, Google sent him a one-sentence reply: "I'm sorry but we're not supporting an open source solution like this" (apparently four months' delay wasn't enough time to hunt up a comma or a period).

The connection to the Article 13 debate should be obvious: for years, advocates for the Directive insisted that it could be implemented without filters, but of course it requires filters. Likewise, for year, EME's backers insisted that it wouldn't prevent us from having open, auditable, free-as-in-speech browsers that anyone could inspect, improve and distribute. But of course it does."

David. said...

Cory Doctorow's How DRM has permitted Google to have an "open source" browser that is still under its exclusive control is effectively an update on this post:

" A year ago, Benjamin "Mako" Hill gave a groundbreaking lecture explaining how Big Tech companies had managed to monopolize all the benefits of free software licenses, using a combination of dirty tricks to ensure that the tools that were nominally owned by no one and licensed under free and open terms nevertheless remained under their control, so that the contributions that software developers made to "open" projects ended up benefiting big companies without big companies having to return the favor.

Mako was focused on the ways that "software as a service" subverted free/open software licenses, but just as pernicious is "digital rights management" (DRM), which is afforded a special kind of legal protection under Section 1201 of the Digital Millennium Copyright Act: under this rule, it's illegal to reverse-engineer and re-implement code that has some connection with restricting access to copyrighted works. That means that once a product or service has a skin of DRM around it, the company that controls that DRM also controls who can make an interoperable product.

That's where Google's web-dominating Chrome browser (and its nominally free/open cousin, Chromium) come in: these have become the defacto standard for web browsing, serving as the core for browsers like Microsoft Edge and Opera.

And while you can use or adapt Chromium to your heart's content, your new browser won't work with most internet video unless you license a proprietary DRM component called Widevine from Google."