DSHR's Blog: Trust in Computer Systems

Ten years ago our paper Preserving Peer Replicas by Rate-Limited Sampled Voting was awarded "Best Paper" at the 2003 ACM Symposium on Operating System Principles. It was later expanded into a paper in ACM Transactions on Computing Systems, The LOCKSS peer-to-peer digital preservation system. According to Google Scholar, the twin papers have accumulated over 300 citations. Below the fold I discuss the ideas about trust we put forward in these papers that have turned out to have lasting significance, and are particularly important today.

The question of whether we can trust our computer systems was made newsworthy by Edward Snowden's revelations that the NSA and its partner agencies in other countries had comprehensively subverted the infrastructure of the Internet in the interest of surveillance. But it has been a major concern for the industry for a long time. As regards digital preservation, it led to the Trusted Repository Audit Certification program, which the CLOCKSS Archive is currently undergoing. One thing that this process has already revealed is that although the criteria against which the archive is assessed may individually be justified, the structure into which they are organized is not based on a comprehensive model of the threats against which the archive's content is to be defended. I have repeatedly pointed to the lack of explicit threat models for digital preservation systems.

If you look into the security settings for your browser you will find a long list of trusted certificate authorities. My Firefox has about 90 organizations in the list, not counting ones I explicitly added. Trust in "secure" connections to websites depends on trusting every single one of these organizations. If any one of them is acting against my interests, the little padlock icon is meaningless. The list includes, among many mysterious organizations I have no reason to trust:

Comodo and RSA, both compromised in 2011, along with DigiNotar and StartSSL, which were in the list then.
Experian, which today I find has been supplying the data on which an identity theft service is based.
The governments of, among others, China, Japan, Taiwan and the Netherlands.

And that's not all. Many of these organizations have decided that because I trust them, I also trust organizations which they declare are trustworthy. As of 2010, the EFF found more than 650 organizations that Internet Explorer and Firefox trusted, including the US Dept. of Homeland Security. The idea that none of these organizations could ever be convinced to act against my interests is ludicrous. SSL is a security architecture with at least one point of failure and, as it is deployed in browsers, a large number of points of failure.

When we started work on the LOCKSS system, the state of the art in reliable computer systems was Byzantine Fault Tolerance (BFT), which dates to 1982 and shows that 3f+1 replicas can survive f simultaneous faults even if the faults are malign rather than just random. We initially saw a number of problems in applying BFT to the problem of preserving libraries' collections of e-journals:

The then-known techniques for implementing BFT were too slow to be usable, a problem that was not to be solved until Castro and Liskov's 1999 paper. Even then, with the large number of replicas we hoped for, BFT would have been unrealistically costly.
Under increasing failure or attack BFT works perfectly until the 3f+1 criterion is violated, at which time it fails abruptly and completely. A real-world system should fail slowly and produce unambiguous signs of impending failure as it does.
BFT assumes a static set of replicas. We expected libraries to join and leave the system over time, so there would have to be a service that distributed the current set of replicas to the nodes. This would be a single point of failure; the bad guy could subvert it and populate the system with bad replicas.

The key idea from the prototype LOCKSS system was natural to a system that was expected to have a lot of replicas. It was to base trust in the system on voting among random samples of the population of replicas, unlike BFT which bases trust on votes of the whole population. In both cases, trust was established by the replicas voting in polls. Because the replicas didn't trust each other, the bad guy had to subvert a lot of them to reliably win votes.

We gradually came to understand the most important idea from the SOSP paper as we worked on it. Sampled polls make things hard for the bad guy. Each replica in a system with no damage would see polls showing complete agreement. Each replica in a system with randomly damaged replicas would see either landslide agreement, if it was undamaged, or landslide disagreement, if it was damaged. Poll results should be bi-modal. To alter the system's idea of some content a bad guy needed to create non-random damage, the same damage at enough replicas to tip the system's majority opinion from the good to the bad content. What made it hard was that the bad guy could not know which replicas would take part in each poll. A poll in which some of the bad guy's replicas voted along with some he had not yet managed to subvert would have a result that was close-run, not landslide agreement nor landslide disagreement. Such a result would be an unambiguous signal of correlated damage at multiple replicas, in other words an attack.

As with most good research, our ideas were not wholly original. Others had some of them at about the same time. Birman et al in 1999 published work on reliable multicast protocols that, in effect, sampled the population of replicas. But they used a fixed sample size of 1 and could thus not use voting to establish trust, so their protocol was vulnerable to attack by the bad guy. But we were among the first to base trust in the system on the voting among random samples of replicas that did not trust each other, and I believe we were the first to show the special difficulties bimodal results in sampled voting pose for the bad guy.

A bad guy subverts an SSL connection from a browser to a website by intercepting the connection and supplying a valid certificate that is different from the one that actually certifies the website. For example, it might be signed by the Dept. of Homeland Security instead of by Thawte. The browser trusts both. Thus the question is: "how to detect that the certificate is not the one it should be?" This is a similar question to the digital preservation question: "how to detect that the content is not what it should be?" On a second or subsequent visit the browser can compare the certificate with the one it saw earlier, but this doesn't help the first time and it isn't conclusive. There are valid reasons why a certificate can change, for example to replace one that has expired.

In 2008 Wendlandt et al published Perspectives: Improving SSH-style Host Authentication with Multi-Path Probing, which started to apply "voting among a sample" techniques to the SSL certificate problem:

Our system, PERSPECTIVES, thwarts many of these attacks by using a collection of “notary” hosts that observes a server’s public key via multiple network vantage points (detecting localized attacks) and keeps a record of the server’s key over time (recognizing short-lived attacks). Clients can download these records on-demand and compare them against an unauthenticated key, detecting many common attacks.

It is important to protect against compromised or malign notaries, and this is where LOCKSS-style "voting among a sample" comes in:

The final aspect of the notary design is data redundancy, a cross-validation mechanism that limits the power of a compromised or otherwise malicious notary server. To implement data redundancy each notary acts as a shadow server for several other notaries. As described below, a shadow server stores an immutable record of each observation made by another notary. Whenever a client receives a query reply from a notary, the client also checks with one or more of that notary’s shadow servers to make sure that the notary reply is consistent with the history stored by the shadow server.

Just as in the LOCKSS system, if there are enough notaries and enough shadow servers the bad guy can't know which notaries the client will ask, and which shadow servers the client will ask to vote on the replies.

I know of two attempts to deploy systems related to PERSPECTIVES:

Moxie Marlinspike's Convergence, which in 2011 deployed a number of PERSPECTIVES-style notaries in the hope that users would deploy many more, and a Firefox extension that the user could configure with their choice of notaries. See this blog post with a good discussion of the problem and the requirements for a solution, John Leyden's piece at The Register, and the YouTube of Moxie's 2011 BlackHat presentation. Alas, Moxie's version seems no longer to have working public notaries. There is a live fork, unfortunately only with private notaries and which does not support Convergence's technique (called "bounce notaries") for preventing the user's requests for certificate validation revealing their browsing history to the notaries.
The EFF's SSL Observatory, which is a service started in 2010 allowing users to query a database built from SSL certificates provided by users of their HTTPS Everywhere extensions for Firefox and Chrome. The certificate observations in the database thus come from a vast range of locations around the network. The disadvantages are that the SSL Observatory is a centralized service that users have to trust, and that it uses Tor to anonymize certificate validation requests, which adds complexity. See these two presentations (PDFs) about the SSL Observatory.

The way SSL is currently deployed doesn't provide adequate security in practice, and neither do the current attempts to improve it. But it seems that the decade-old LOCKSS technique of sampled voting will play an important part in an eventual solution.

DSHR's Blog

Wednesday, October 23, 2013

Trust in Computer Systems

No comments: