Tuesday, October 28, 2014

Familiarity Breeds Contempt

In my recent Internet of Things post I linked to Jim Gettys' post Bufferbloat and Other Challenges. In it Jim points to a really important 2010 paper by Sandy Clarke, Matt Blaze, Stefan Frei and Jonathan Smith entitled Familiarity Breeds Contempt: The Honeymoon Effect and the Role of Legacy Code in Zero-Day Vulnerabilities.

Clarke et al analyze databases of vulnerabilities to show that the factors influencing the rate of discovery of vulnerabilities are quite different from those influencing the rate of discovery of bugs. They summarize their findings thus:
We show that the length of the period after the release of a software product (or version) and before the discovery of the first vulnerability (the ’Honeymoon’ period) is primarily a function of familiarity with the system. In addition, we demonstrate that legacy code resulting from code re-use is a major contributor to both the rate of vulnerability discovery and the numbers of vulnerabilities found; this has significant implications for software engineering principles and practice.
Jim says:
our engineering processes need fundamental reform in the face of very long lived devices.
Don't hold your breath. The paper's findings also have significant implications for digital preservation, because external attack is an important component of the threat model for digital preservation systems:
  • Digital preservation systems are, like devices in the Internet of Things (IoT), long-lived.
  • Although they are designed to be easier to update than most IoT devices, they need to be extremely cheap to run. Resources to make major changes to the code base within the "honeymoon" period will be inadequate.
  • Scarce resources and adherence to current good software engineering resources already mean that much of the code in these systems is shared.
Thus it is likely that digital preservation systems will be more vulnerable than the systems whose content they are intended to preserve. This is a strong argument for diversity of implementation, which has unfortunately turned out to increase costs significantly. Mitigating the threat from external attack increases the threat of economic failure.

Thursday, October 23, 2014

Facebook's Warm Storage

Last month I was finally able to post about Facebook's cold storage technology. Now, Subramanian Muralidhar and a team from Facebook, USC and Princeton have a paper at OSDI that describes the warm layer between the two cold storage layers and Haystack, the hot storage layer. f4: Facebook's Warm BLOB Storage System is perhaps less directly aimed at long-term preservation, but the paper is full of interesting information. You should read it, but below the fold I relate some details.

Monday, October 20, 2014

Journal "quality"

Anurag Acharya and co-authors from Google Scholar have a pre-print at arxiv.org entitled Rise of the Rest: The Growing Impact of Non-Elite Journals in which they use article-level metrics to track the decreasing importance of the top-ranked journals in their respective fields from 1995 to 2013. I've long argued that the value that even the globally top-ranked journals add is barely measurable and may even be negative; this research shows that the message is gradually getting out. Authors of papers subsequently found to be "good" (in the sense of attracting citations) are slowly but steadily choosing to publish away from the top-ranked journals in their field. You should read the paper, but below the fold I have some details.

Wednesday, October 15, 2014

The Internet of Things

In 1996, my friend Steven McGeady gave a fascinating and rather prophetic keynote address to the Harvard Conference on the Internet and Society. In his introduction, Steven said:
I was worried about speaking here, but I'm even more worried about some of the pronouncements that I have heard over the last few days, ... about the future of the Internet. I am worried about pronouncements of the sort: "In the future, we will do electronic banking at virtual ATMs!," "In the future, my car will have an IP address!," "In the future, I'll be able to get all the old I Love Lucy reruns - over the Internet!" or "In the future, everyone will be a Java programmer!"

This is bunk. I'm worried that our imagination about the way that the 'Net changes our lives, our work and our society is limited to taking current institutions and dialling them forward - the "more, better" school of vision for the future.
I have the same worries that Steven did about discussions of the Internet of Things that looms so large in our future. They focus on the incidental effects, not on the fundamental changes. Barry Ritholtz points me to a post by Jon Evans at TechCrunch entitled The Internet of Someone Else's Things that is an exception. Jon points out that the idea that you own the Smart Things you buy is obsolete:
They say “possession is nine-tenths of the law,” but even if you physically and legally own a Smart Thing, you won’t actually control it. Ownership will become a three-legged stool: who physically owns a thing; who legally owns it; …and who has the ultimate power to command it. Who, in short, has root.
What does this have to do with digital preservation? Follow me below the fold.

Tuesday, October 7, 2014

Economies of Scale in Peer-to-Peer Networks

In a recent IEEE Spectrum article entitled Escape From the Data Center: The Promise of Peer-to-Peer Cloud Computing, Ozalp Babaoglu and Moreno Marzolla (BM) wax enthusiastic about the potential for Peer-to-Peer (P2P) technology to eliminate the need for massive data centers. Even more exuberance can be found in Natasha Lomas' Techcrunch piece The Server Needs To Die To Save The Internet (LM) about the MaidSafe P2P storage network. I've been working on P2P technology for more than 16 years, and although I believe it can be very useful in some specific cases, I'm far less enthusiastic about its potential to take over the Internet.

Below the fold I look at some of the fundamental problems standing in the way of a P2P revolution, and in particular at the issue of economies of scale. After all, I've just written a post about the huge economies that Facebook's cold storage technology achieves by operating at data center scale.

Tuesday, September 30, 2014

More on Facebook's "Cold Storage"

So far this year I've attended two talks that were really revelatory; Krste Asanović's keynote at FAST 13, which I blogged about earlier, and Kestutis Patiejunas' talk about Facebook's cold storage systems. Unfortunately, Kestutis' talk was off-the-record, so I couldn't blog about it at the time. But he just gave a shorter version at the Library of Congress' Designing Storage Architectures workshop, so now I can blog about this fascinating and important system. Below the fold, the details.

Thursday, September 25, 2014

Plenary Talk at 3rd EUDAT Conference

I gave a plenary talk at the 3rd EUDAT Conference's session on sustainability entitled Economic Sustainability of Digital Preservation. Below the fold is an edited text with links to the sources.