Friday, March 17, 2017

The Amnesiac Civilization: Part 4

Part 2 and Part 3 of this series covered the unsatisfactory current state of Web archiving. Part 1 of this series briefly outlined the way the W3C's Encrypted Media Extensions (EME) threaten to make this state far worse. Below the fold I expand on the details of this threat.

The W3C's abstract describes EME thus:
This proposal extends HTMLMediaElement [HTML5] providing APIs to control playback of encrypted content.

The API supports use cases ranging from simple clear key decryption to high value video (given an appropriate user agent implementation). License/key exchange is controlled by the application, facilitating the development of robust playback applications supporting a range of content decryption and protection technologies.
The next paragraph is misleading; EME not merely enables DRM, it mandates at least an (insecure) baseline implementation of encrypted content:
This specification does not define a content protection or Digital Rights Management system. Rather, it defines a common API that may be used to discover, select and interact with such systems as well as with simpler content encryption systems. Implementation of Digital Rights Management is not required for compliance with this specification: only the Clear Key system is required to be implemented as a common baseline.
The Clear Key system requires that content be encrypted, but the keys to decrypt it are passed in cleartext. I will return to the implications of this requirement.

EME data flows
The W3C's diagram of the EME stack shows an example of how it works. An application, i.e. a Web page, requests the browser to render some encrypted content. It is delivered, in this case from a Content Distribution Network (CDN), to the browser. The browser needs a license to decrypt it, which it obtains from the application via the EME API by creating an appropriate session then using it to request the license. It hands the content and the license to a Content Decryption Module (CDM), which can decrypt the content using a key in the license and render it.

What is DRM trying to achieve? Ostensibly, it is trying to ensure that each time DRM-ed content is rendered, specific permission is obtained from the content owner. In order to ensure that, the CDM cannot trust the browser it is running in. For example, it must be sure that the browser can see neither the decrypted content nor the key. If it could see, and save for future use, either it would defeat the purpose of DRM.

The CDM is running in an environment controlled by the user, so the mechanisms a DRM implementation uses to obscure the decrypted content and the key from the environment are relatively easy to subvert. This is why in practice most DRM technologies are "cracked" fairly quickly after deployment. As Bunnie Huang's amazing book about cracking the DRM of the original Xbox shows, it is very hard to defeat a determined reverse engineer.

Content owners are not stupid. They realized early on that the search for uncrackable DRM was a fool's errand. So, to deter reverse engineering, they arranged for the 1998 Digital Millenium Copyright Act (DMCA) to make any attempt to circumvent protections on digital content a criminal offense. Cory Doctorow explains what this strategy achieves:
So if DRM isn't anti-piracy, what is it? DRM isn't really a technology at all, it's a law. Specifically, it's section 1201 of the US DMCA (and its international equivalents). Under this law, breaking DRM is a crime with serious consequences (5 years in prison and a $500,000 fine for a first offense), even if you're doing something that would otherwise be legal. This lets companies treat their commercial strategies as legal obligations: Netflix doesn't have the legal right to stop you from recording a show to watch later, but they can add DRM that makes it impossible to do so without falling afoul of DMCA.

This is the key: DRM makes it possible for companies to ban all unauthorized conduct, even when we're talking about using your own property in legal ways. This intrudes on your life in three ways:
  1. It lets companies sue and threaten security researchers who find defects in products
  2. It lets companies sue and threaten accessibility workers who adapt technology for use by disabled people
  3. It lets companies sue and threaten competitors who want to let you do more with your property -- get it repaired by independent technicians, buy third-party parts and consumables, or use it in ways that the manufacturer just doesn't like.
Of course, among the "ways that the manufacturer just doesn't like" can be archiving.

IANAL, but I do not believe that it is a defense under the DMCA that the "protections" in question are made of tissue paper. Thus, for example, it is likely that even an attempt to reverse-engineer an implementation of EME's Clear Key system in order to preserve the plaintext of some encrypted content would risk severe criminal penalties. Would an open source implementation of Clear Key be legal?

It is this interaction between even purely nominal DRM mechanisms and the DMCA that has roused opposition to EME. J. M. Porup's A battle rages for the future of the Web is an excellent overview of the opposition and its calls on Tim Berners-Lee to decry EME. Once he had endorsed it, Glyn Moody wrote a blistering takedown of his reasoning in Tim Berners-Lee Endorses DRM In HTML5, Offers Depressingly Weak Defense Of His Decision. He points to the most serious problem EME causes:
Also deeply disappointing is Berners-Lee's failure to recognize the seriousness of the threat that EME represents to security researchers. The problem is that once DRM enters the equation, the DMCA comes into play, with heavy penalties for those who dare to reveal flaws, as the EFF explained two years ago.
How do we know that this is the most serious problem? Because, like all the other code running in your browser, the DRM implementations have flaws and vulnerabilities. For example:
Google's CDM is Widevine, a technology it acquired in 2010. David Livshits, a security researchers at Ben-Gurion University and Alexandra Mikityuk from Berlin's Telekom Innovation Laboratories, discovered a vulnerability in the path from the CDM to the browser, which allows them to capture and save videos after they've been decrypted. They've reported this bug to Google, and have revealed some proof-of-concept materials now showing how it worked (they've withheld some information while they wait for Google to issue a fix).

Widevine is also used by Opera and Firefox (Firefox also uses a CDM from Adobe).

Under German law -- derived from Article 6 of the EUCD -- Mikityuk could face criminal and civil liability for revealing this defect, as it gives assistance to people wishing to circumvent Widevine. Livshits has less risk, as Israel is one of the few major US trading partners that has not implemented an "anti-circumvention" law, modelled on the US DMCA and spread by the US Trade Representative to most of the world.
Note that we (and Google) only know about this flaw because one researcher was foolhardy and another was from Israel. Many other flaws remain unrevealed:
The researchers who revealed the Widevine/Chrome defect say that it was likely present in the browser for more than five years, but are nevertheless the first people to come forward with information about its flaws. As many esteemed security researchers from industry and academe told the Copyright Office last summer, they routinely discover bugs like this, but don't come forward, because of the potential liability from anti-circumvention law.
Glyn Moody again:
The EFF came up with a simple solution that would at least have limited the damage the DMCA inflicts here:
a binding promise that W3C members would have to sign as a condition of continuing the DRM work at the W3C, and once they do, they not be able to use the DMCA or laws like it to threaten security researchers.
Alas, Cory Doctorow again:
How do we know that companies only want DRM because they want to abuse this law, and not because they want to fight piracy? Because they told us so. At the W3C, we proposed a compromise: companies who participate at W3C would be allowed to use it to make DRM, but would have to promise not to invoke the DMCA in these ways that have nothing to do with piracy. So far, nearly 50 W3C members -- everyone from Ethereum to Brave to the Royal National Institute for Bind People to Lawrence Berkeley National Labs -- have endorsed this, and all the DRM-supporting members have rejected it.

In effect, these members are saying, "We understand that DRM isn't very useful for stopping piracy, but that law that lets us sue people who aren't breaking copyright law? Don't take that away!"
Its not as though, as an educated Web user, you can decide that you don't want to take the risks inherent in using a browser that doesn't trust you, or the security researchers you depend upon. In theory Web DRM is optional, but in practice it isn't. Lucian Armasu at Tom's Hardware explains:
The next stable version of Chrome (Chrome 57) will not allow users to disable the Widevine DRM plugin anymore, therefore making it an always-on, permanent feature of Chrome. The new version of Chrome will also eliminate the “chrome://plugins” internal URL, which means if you want to disable Flash, you’ll have to do it from the Settings page.
You definitely want to disable Flash. To further "optimize the user experience":
So far only the Flash plugin can be disabled in the Chrome Settings page, but there is no setting to disable the Widevine DRM plugin, nor the PDF viewer and the Native Client plugins. PDF readers, including the ones that are built into browsers, are major targets for malicious hackers. PDF is a “powerful” file format that’s used by many, and it allows hackers to do all sorts of things given the right vulnerability.

People who prefer to open their PDF files in a better sandboxed environment or with a more secure PDF reader, rather than in Chrome, will not be able to do that anymore. All PDF files will always open in Chrome’s PDF viewer, starting with Chrome 57.
But that's not what I came to tell you about. Came to talk about the draft archiving.

I fully appreciate the seriousness of the security threat posed by EME, but it tends to overwhelm discussion of EME's other impacts. I have long been concerned about the impact of Digital Rights Management on archiving. I first wrote about the way HTML5 theoretically enabled DRM for the Web in 2011's Moonalice plays Palo Alto:
Another way of expressing the same thought is that HTML5 allows content owners to implement a semi-effective form of DRM for the Web.
That was then, but now theory is practice. Once again, Glyn Moody is right on target:
One of the biggest problems with the defense of his position is that Berners-Lee acknowledges only in passing one of the most serious threats that DRM in HTML5 represents to the open Web. Talking about concerns that DRM for videos could spread to text, he writes:
For books, yes this could be a problem, because there have been a large number of closed non-web devices which people are used to, and for which the publishers are used to using DRM. For many the physical devices have been replaced by apps, including DRM, on general purpose devices like closed phones or open computers. We can hope that the industry, in moving to a web model, will also give up DRM, but it isn't clear.
So he admits that EME may well be used for locking down e-book texts online. But there is no difference between an e-book text and a Web page, so Berners-Lee is tacitly admitting that DRM could be applied to basic Web pages. An EFF post spelt out what that would mean in practice:
A Web where you cannot cut and paste text; where your browser can't "Save As..." an image; where the "allowed" uses of saved files are monitored beyond the browser; where JavaScript is sealed away in opaque tombs; and maybe even where we can no longer effectively "View Source" on some sites, is a very different Web from the one we have today.
It's also totally different from the Web that Berners-Lee invented in 1989, and then generously gave away for the world to enjoy and develop. It's truly sad to see him acquiescing in a move that could destroy the very thing that made the Web such a wonderfully rich and universal medium -- its openness.
The EFF's post (from 2013) had several examples of EME "mission creep" beyond satisfying Netflix:
Just five years ago, font companies tried to demand DRM-like standards for embedded Web fonts. These Web typography wars fizzled out without the adoption of these restrictions, but now that such technical restrictions are clearly "in scope," why wouldn't typographers come back with an argument for new limits on what browsers can do?

Indeed, within a few weeks of EME hitting the headlines, a community group within W3C formed around the idea of locking away Web code, so that Web applications could only be executed but not examined online. Static image creators such as photographers are eager for the W3C to help lock down embedded images. Shortly after our Tokyo discussions, another group proposed their new W3C use-case: "protecting" content that had been saved locally from a Web page from being accessed without further restrictions. Meanwhile, publishers have advocated that HTML textual content should have DRM features for many years.
Web archiving consists of:
content ... saved locally from a Web page ... being accessed without further restrictions.
It appears that the W3C's EME will become, in effect, a mandatory feature of the Web. Obviously, the first effect is that much Web video will be DRM-ed, making it impossible to collect in replayable form and thus preserve. Google's making Chrome's video DRM impossible to disable suggests that YouTube video will be DRM-ed. Even a decade ago, to study US elections you needed YouTube video.

But that's not the big impact that EME will have on society's memory. It will spread to other forms of content. The business models for Web content are of two kinds, and both are struggling:
  • Paywalled content. It turns out that, apart from movies and academic publishing, only a very few premium brands such as The Economist, the Wall Street Journal and the New York Times have viable subscription business models based on (mostly) paywalled content. Even excellent journalism such as The Guardian is reduced to free access, advertising and voluntary donations. Part of the reason is that Googling the headline of paywalled news stories often finds open access versions of the content. Clearly, newspapers and academic publishers would love to use Web DRM to ensure that their content could be accessed only from their site, not via Google or Sci-Hub.
  • Advertising-supported content. The market for Web advertising is so competitive and fraud-ridden that Web sites have been forced into letting advertisers run ads that are so obnoxious and indeed riddled with malware, and to load up their sites with trackers, that many users have rebelled and use ad-blockers. These days it is pretty much essential to do so, to keep yourself safe and to reduce bandwidth consumption. Sites are very worried about the loss of income from blocked ads. Some, such as Forbes, refuse to supply content to browsers that block ads (which, in Forbes case, turned out to be a public service; the ads carried malware). DRM-ing a site's content will prevent ads being blocked. Thus ad space on DRM-ed sites will be more profitable, and sell for higher prices, than space on sites where ads can be blocked. The pressure on advertising-supported sites, which include both free and subscription news sites, to DRM their content will be intense.
Thus the advertising-supported bulk of what we think of as the Web, and the paywalled resources such as news sites that future scholars will need will become un-archivable. Kalev Leetaru will need to add a fourth, even more outraged, item to his list of complaints about Web archives.

The prospect for academic journals is somewhat less dire. Because the profit margins of the big publishers are so outrageous, and because charging extortionate subscriptions for access to the fruits of publicly and charitably-funded research so hard to justify, they are willing to acquiesce in the archiving of their content provided it doesn't threaten their bottom line. The big publishers typically supply archives such as Portico and CLOCKSS with content through non-Web channels. CLOCKSS is a dark archive, so is no threat to the bottom line. Portico's post-cancellation and audit facilities can potentially leak content, so Portico will come under pressure to DRM content supplied to its subscribers.

Almost all the world's Web archiving technology is based on Linux or other Open Source operating systems. There is a good reason for this, as I wrote back in 2014:
One thing it should be easy to agree on about digital preservation is that you have to do it with open-source software; closed-source preservation has the same fatal "just trust me" aspect that closed-source encryption (and cloud storage) suffer from.
Lucian Armasu at Tom's Hardware understands the issue:
there may also be an oligopoly issue, because the content market will depend on four, and perhaps soon only three, major DRM services players: Google, Microsoft, and Apple. All of these companies have their own operating systems, so there is also less incentive for them to support other platforms in their DRM solutions.

What that means in practice is that if you choose to use a certain Linux distribution or some completely new operating system, you may not be able to play protected content, unless Google, Microsoft, or Apple decide to make their DRM work on that platform, too.
So it may not even be possible for Web archives to render the content even if the owner wished to give them permission.


David. said...

Peter Bright at Ars Technica reports that DRM in HTML5 takes its next step toward standardization:

"The World Wide Web Consortium (W3C), the standards body that oversees most Web-related specifications, has moved the EME specification to the Proposed Recommendation stage.

The next and final stage is for the W3C's Advisory Committee to review the proposal. If it passes review, the proposal will be blessed as a full W3C Recommendation."

David. said...

Cory Doctorow's comment on this "next step" is here.

David. said...

Firefox is now "Netflix approved" i.e. it has HTML5 DRM support. Once I've upgraded I'll see if I can turn it off.

David. said...

Firefox is now "Netflix approved" i.e. it has HTML5 DRM support. Once I've upgraded I'll see if I can turn it off.