Saturday, February 25, 2012

Talk at PDA2012

I spoke at this year's Personal Digital Archiving conference at the Internet Archive, following on from my panel appearance there a year ago. Below the fold is an edited text of the talk with links to the sources.

Wednesday, February 22, 2012

FAST 2012


We gave a work-in-progress paper (PDF) and a well-received poster (PDF) on our economic modeling work at the 2012 FAST conference. As usual, the technical sessions featured some very interesting papers, although this year it was hard to find any relevant to long-term storage. Below the fold are notes on the papers that caught my eye.

Friday, February 17, 2012

Cloud Storage Pricing History

The original motivation for my work on the economics of long-term storage was to figure out whether it made sense to use cloud storage for systems such as LOCKSS. Last December I finally got around to looking at the history of Amazon S3's pricing, and I was surprised to see that in nearly 6 years the price for the first TB had dropped from $0.15/GB/mo to $0.14/GB/mo. Since then, Amazon has dropped the price to $0.125/GB/mo, so the average drop in price is now about 3%/yr.

I wondered whether this very slow price drop was representative of the cloud storage industry in general, so I went looking. Below the fold is what I found, and some of the implications for cloud use for long-term storage.

Tuesday, February 7, 2012

Tide Pools and Terrorists

There's an article in the current issue of Stanford's alumni magazine that discusses, in a very accessible way, many of the concepts people need to think about when designing systems for long-term digital preservation. Raphael D. Sagarin has evolved from studying how organisms in the Monterey tide pools adapt to climate change, to critiquing responses to terrorist incidents. He suggests we have a lot to learn from the way organisms respond and adapt to threats. For example:
STRATEGY 1: Embrace uncertainty. In the natural world, they argue, most species increase uncertainty for their enemies by deploying multiple strategies for attack or defense. Think of the octopus, says Sagarin: "It's got an ink cloud it can use; it's got a beak it can use; some of them have poison; they've got these suckers; it's got really good camouflage." If one strategy doesn't work, it can fall back on another.
Increasing uncertainty for the enemy is one major aspect of the defenses of the LOCKSS system. The more you increase your certainty about what your preservation system is doing, the easier you make it for an enemy or an error to affect large parts of the system. In the long term, randomization is your friend. So is:
STRATEGY 2: Decentralize. "Putting homeland security in the hands of a massive, plodding bureaucracy hardly represents evolutionary advancement," ... Sagarin's ideal defense would operate more like the human immune system—with units that react semiautonomously to threats, loosely governed by a central command. "Instead of relying on a centralized brain or controller for everything, you farm out the responsibility of searching for and responding to changes in the environment to many, many different agents," he says.
LOCKSS boxes are autonomous, only loosely coordinated, in exactly this manner. The more coordinated the behavior of the parts of your system, the more correlated are the failures.

The whole article is worth a careful read, as is Sagarin's Adapt or Die article in Foreign Policy.

Thursday, February 2, 2012

Domain Name Persistence

Last December a useful workshop on Domain Name Persistence was held in conjunction with the 7th International Digital Curation Conference. My comments on the need for persistent domain names are below the fold.