Wednesday, February 19, 2014

Talk at FAST BoF

I gave a talk during a Birds-of-a-Feather session on "Long-Term Storage" at Usenix's FAST (File And Storage Technologies) meeting. Below the fold is an edited text with links to the sources.

I'm David Rosenthal from the LOCKSS (Lots Of Copies Keep Stuff Safe) Program at the Stanford University Libraries. I started working on the problem of long-term digital preservation 15 years ago. I thought it would be semi-retirement; it has turned into a job for life. There are three parts to the problem. Stuff must be ingested, preserved and disseminated. The essential task of the preservation part is storing the bits for the long term. The problems of storing bits for the long term come in two flavors, technical and economic.

I discussed the technical problems in a 2010 article for ACM Queue entitled Keeping Bits Safe: How Hard Can It Be? I used the example of a black box keeping a Petabyte for a century with a 50% chance that every bit survives unchanged. Consider each bit like a radioactive atom, subject to a random process that flips its state. The specification implies a half-life for the bits. It is about 60 million times the age of the universe. What this means is that although a storage system vendor may claim their product meets your specification, benchmarking it to validate their claim is infeasible.

Further, the set of threats against which stored data must be preserved includes not merely random media failures, which can be modelled with some realism, but also threats such as insider abuse and external attack which cannot. Edward Snowden's revelations show the capabilities nation-state actors had a few years ago. A few years from now many of them will be available in the exploit market for anyone motivated to corrupt your data to use.

The economic problem of storing data for the long term has historically been considered insignificant. The 30-year history of Kryder's Law, the exponential increase in bit density, led to an exponential drop in dollars per byte. This meant that, if you could afford to store the data for a few years, you could afford to store it forever. And this led to the concept of endowing stored data, depositing it together with a capital sum believed sufficient to pay for its eternal storage. In economic terms, the endowment is the net present value of the stream of future payments for storage.

In 2010 Serge Goldstein of Princeton described their endowed data service, based on his analysis that if they charged double the initial cost they could store data forever. I was skeptical, not least because what Princeton actually charged was $3K/TB. This meant either that they were paying $1.5K/TB for disk at a time when Fry's was selling disks for $50/TB, or that they were skeptical too.

So I built an economic model of long-term storage. The two key parameters are:
  • The interest rate that the as yet un-expended part of the endowment obtains.
  • The Kryder rate, the rate at which the cost per byte drops.
Both are, of course, unknowable at the time the data and its endowment are deposited. I used an interest rate model based on the 20-year history of 1-year inflation-protected US Treasuries, and varied the Kryder rate to see what happened.

At the historic 30-40%yr Kryder rate, using 2012 prices the model projects the endowment needed for 3 on-disk copies at about $1K/TB, and this number is not very sensitive to the precise Kryder rate. But as the rate drops below 20%/yr, the endowment needed starts to rise rapidly, doubling by 10% and becoming quite sensitive to the Kryder rate.

There is nothing to worry about, right? The Kryder rate has been 30-40% for 30 years, so it is bound to continue just as Moore's Law is.

In late 2011 the floods in Thailand destroyed 40% of the world's disk manufacturing capacity. The price per byte almost doubled, and more than 2 years later is still far above what it would have been absent the floods. Even the perennially optimistic industry road-maps now project not more than 20%/yr for the next 5 years. Industry consolidation, and the fearsome cost of the transition from PMR to HAMR let alone to BPM, mean that the days of 30-40% Kryder rates are over.

Every few months there is another press release announcing that some new, quasi-immortal medium such as stone DVDs has solved the problem of long-term storage. But the problem stays resolutely unsolved. Why is this? Very long-lived media are inherently more expensive, and are a niche market, so they lack economies of scale. Seagate did a study of the market for disks with an archival service life, which they could easily make, and discovered that no-one would pay the extra for them.

The fundamental problem is that long-lived media only make sense at very low Kryder rates. Even if the rate is only 10%/yr, after 10 years you could store the same data in 1/3 the space. Since space in the data center or even at Iron Mountain isn't free, this is a powerful disincentive to move old media out. If you believe that Kryder rates will get back to 30%/yr, after a decade you could store 30 times as much data in the same space.

There is one long-term storage medium that might eventually make sense. DNA is very dense, very stable in a shirtsleeve environment, and best of all it is very easy to make Lots Of Copies to Keep Stuff Safe. DNA sequencing and synthesis are improving at far faster rates than magnetic or solid state storage. Right now the costs are far too high, but if the improvement continues DNA might eventually solve the archive problem. But access will always be slow enough that the data would have to be really cold before being committed to DNA.

The reason that the idea of long-lived media is so attractive is that is suggests that you can design a system ignoring the possibility of media failures. You can't, and even if you could it wouldn't make economic sense. As Brian Wilson, CTO of BackBlaze points out, in their long-term storage environment:
Double the reliability is only worth 1/10th of 1 percent cost increase. I posted this in a different forum: Replacing one drive takes about 15 minutes of work. If we have 30,000 drives and 2 percent fail, it takes 150 hours to replace those. In other words, one employee for one month of 8 hour days. Getting the failure rate down to 1 percent means you save 2 weeks of employee salary - maybe $5,000 total? The 30,000 drives costs you $4m.
The $5k/$4m means the Hitachis are worth 1/10th of 1 per cent higher cost to us. ACTUALLY we pay even more than that for them, but not more than a few dollars per drive (maybe 2 or 3 percent more).
Moral of the story: design for failure and buy the cheapest components you can. :-)
Let me leave you with another graph. It is based on three industry numbers:
  • According to IDC, the demand for storage each year grows about 60%.
  • According to IHS iSuppli, the bit density on the platters of disk drives will grow no more than 20%/year for the next 5 years.
  • According to, IT budgets in recent years have grown between 0%/year and 2%/year.
The graph projects these three numbers out for the next 10 years. The red line is Kryder's Law, at IHS iSuppli's 20%/yr. The blue line is the IT budget, at's 2%/yr. The green line is the annual cost of storing the data accumulated since year 0 at the 60% growth rate projected by IDC, all relative to the value in the first year. 10 years from now, storing all the accumulated data would cost over 20 times as much as it does this year. If storage is 5% of your IT budget this year, in 10 years it will be more than 100% of your budget.

In the discussion after this introductory talk, someone from IBM reported large customers asking for large-scale, write-once on-line storage at low cost. I am skeptical that this combination of properties can be delivered more cheaply with custom components than, for example, Backblaze is delivering based on commodity components. Even more interesting, Dave Anderson reported that industry projections for the Kryder rate are now down to around 12%.

1 comment:

Unknown said...

I'm curious to hear your thoughts on Facebook's plan to use Blu-Ray as a long-term storage solution ( My instinct is that optical is too fragile, too difficult to copy, and too low I/O to make it a worthwhile medium, even with the cost savings of being able to truly run cold. I'm also a bit unsure of overprovisioning less with BD-R, given optical's historically poor stability.