Monday, April 11, 2016

Brewster Kahle's "Distributed Web" proposal

Back in August last year Brewster Kahle posted Locking the Web Open: A Call for a Distributed Web. It consisted of an analysis of the problems of the current Web, a set of requirements for a future Web that wouldn't have those problems, and a list of pieces of current technology that he suggested could be assembled into a working if simplified implementation of those requirements layered on top of the current Web. I meant to blog about it at the time, but I was busy finishing my report on emulation.

Last November, Brewster gave the EE380 lecture on this topic (video from YouTube or Stanford), reminding me that I needed to write about it. I still didn't find time to write a post. On 8th June, Brewster, Vint Cerf and Cory Doctorow are to keynote a Decentralized Web Summit. I encourage you to attend. Unfortunately, I won't be able to, and this has finally forced me to write up my take on this proposal. Follow me below the fold for a brief discussion; I hope to write a more detailed post soon.

I should start by saying that I agree with Brewster's analysis of the problems of the current Web, and his requirements for a better one. I even agree that the better Web has to be distributed, and that developing it by building prototypes layered on the current Web is the way to go in the near term. I'll start by summarizing Brewster's set of requirements and his proposed implementation, then point out some areas where I have concerns.

Brewster's requirements are:
  • Peer-to-Peer Architecture to avoid the single points of failure and control inherent in the endpoint-based naming of the current Web.
  • Privacy to disrupt the current Wed's business model of pervasive, fine-grained surveillance.
  • Distributed Authentication for Identity to avoid the centralized control over identity provided by Facebook and Google.
  • Versioning to provide the memory the current Web lacks.
  • Easy payment mechanism to provide an alternate way to reward content generators.
There are already a number of attempts at partial implementations of these requirements, based as Brewster suggests on JavaScript, public-key cryptography, blockchain, Bitcoin, and Bittorrent. An example is IPFS (also here). Pulling these together into a coherent and ideally interoperable framework would be an important outcome of the upcoming summit.

Thinking of these as prototypes, exploring the space of possible features, they are clearly useful. But we have known the risks of allowing what should be prototypes to become "interim" solutions since at least the early 80s. The Alto "Interim" File Server (IFS) was designed and implemented by David R. Boggs and Ed Taft in the late 70s. In 1977 Ed wrote:
The interim nature of the IFS should be emphasized. The IFS is not itself an object of research, though it may be used to support other research efforts such as the Distributed Message System. We hope that Juniper will eventually reach the point at which it can replace IFS as our principal shared file system.
Because IFS worked well enough for people at PARC to get the stuff they needed done, the motivation to replace it with Juniper was never strong enough. The interim solution became permanent. Jim Morris, who was at PARC at the time, and who ran the Andrew Project at C-MU on which I worked from 1983-85, used IFS as the canonical example of a "success disaster", something whose rapid early success entrenches it in ways that cause cascading problems later.

And in this case the permanent solution is at least as well developed as the proposed "interim" one. For at least the last decade, rather than build a “Distributed Web”, Van Jacobson and many others have been working to build a “Distributed Internet”. The Content-Centric Networking project at Xerox PARC, which has become the Named Data Networking (NDN) project spearheaded by UCLA, is one of the NSF’s four projects under the Future Internet Architecture Program. Here is a list of 68 peer-reviewed papers published in the last 7 years relating to NDN.

By basing the future Internet on the name of a data object rather than the location of the object, many of the objectives of the “Distributed Web” become properties of the network infrastructure rather than something implemented in some people’s browsers.

Another way of looking at this is that the current Internet is about moving data from one place to another, NDN is about copying data. By making the basic operation in the net a copy, caching works properly (unlike in the current Internet). This alone is a huge deal, and not just for the Web. The Internet is more than just the Web, and the reasons for wanting to be properly “Distributed” apply just as much to the non-Web parts. And Web archiving should be, but currently isn't, about persistently caching selected parts of the Web.

I should stress that I believe that implementing these concepts initially on top of IP, and even on top of HTTP, is a great and necessary idea; it is how NDN is being tested. But doing so with the vision that eventually IP will be implemented on top of a properly “Distributed” infrastructure is also a great idea; IP can be implemented on top of NDN. For a detailed discussion of these ideas see my (long) 2013 blog post reviewing the 2012 book Trillions.

There are other risks in implementing Brewster's requirements using JavaScript, TCP/IP, the blockchain and the current Web:
  • JavaScript poses a fundamental risk, as we see from Douglas Crockford's attempt to define a "safe" subset of the language. It isn't clear that it is possible to satisfy Brewster's requirements in a safe subset of JavaScript, even if one existed. Allowing content from the Web to execute in your browser is a double-edged sword; it enables easy implementation of new capabilities, but if they are useful they are likely to pose a risk of being subverted.
  • Implementing anonymity on top of a communication infrastructure that explicitly connects endpoints turns out to be very hard. Both Tor and Bitcoin users have been successfully de-anonymized.
  • I have written extensively about the economic and organizational issues that plague Bitcoin, and will affect other totally distributed systems, such as the one Brewster wants to build. It is notable that Postel's Law (RFC 793) or the Robustness Principle has largely prevented these problems affecting the communication infrastructure level that NDN addresses.
So there are very good reasons why this way of implementing Brewster's requirements should be regarded as creating valuable prototypes, but we should be wary of the Interim File System effect. The Web we have is a huge success disaster. Whatever replaces it will be at least as big a success disaster. Lets not have the causes of the disaster be things we knew about all along.


David. said...

The inherent risks of JavaScript (or other ways of executing code on the client machine) are again revealed by the discovery that the Rowhammer attack can be implemented in JavaScript.

David. said...

Herbert van de Sompel points to MIT's Solid project as being relevant to this space. I need to look into it when I'm not rushed off my feet.

David. said...

Anyone who thinks that basing a new, better distributed Web on the ability to run JavaScript in people's browsers is viable needs to read Jérôme Segura's Large Angler Malvertising Campaign Hits Top Publishers.

David. said...

An interesting article about privacy risks in Named Data Networking. I need to think about this more but at first glance it seems that the task of targeted surveillance is no harder, but the task of dragnet surveillance is harder in a Named Data network.