Friday, November 14, 2014

Talk at Storage Valley Supper Club

I gave a very short talk to the Storage Valley Supper Club's 8th meeting. Below the fold, an edited text with links to the sources.

I'm David Rosenthal and I'm a customer. This will be a very short talk making one simple point, which is the title of the talk:
Storage Will Be
Much Less Free
Than It Used To Be
My five minutes of fame happened last Monday when Chris Mellor at The Register published this piece, with a somewhat misleading title. It is based on work I had been blogging about since at least 2011, ever since a conversation at the Library of Congress with Dave Anderson of Seagate. For the last 16 years I've been working at Stanford Library's LOCKSS Program on the problem of keeping data safe for the long term. There are technical problems, but the more important problems are economic. How do you fund long-term preservation?

Working with students at UC Santa Cruz's Storage Systems Research Center I built an economic model of long-term storage. Here is an early version computing the net present value of the expenditures through time to keep an example dataset for 100 years, the endowment for short, as the rate at which storage gets cheaper, the Kryder rate for short, varies. The different lines reflect media service lives of 1 to 5 years.

At the historic 30-40%/year we are in the flat part of the graph, where the endowment is low and it doesn't vary much with the Kryder rate. This meant that long-term storage was effectively free; if you could afford to store the data for a few years, you could afford to store it "for ever" because the cost of storing it for the rest of time would have become negligible.

But suppose the Kryder rate drops below about 20%/year. We are in the steep part of the graph where the endowment needed is much higher and depends strongly on the precise Kryder rate. Which, of course, we are not going to know, so the cost of long-term storage becomes much harder to predict.

We don't have to suppose. This graph, from Preeti Gupta at UCSC, shows that in 2010, before the floods in Thailand, the Kryder rate had dropped. Right now, disk is about 7 times as expensive as would have been predicted in 2010. The red lines show the range of industry projections going forward, 10-20%/year. In 2020 disk is projected to be between 100 and 300 times as expensive as would have been projected in 2010. As my first graph showed, this is a big deal for anyone who needs to keep data for the long term.

No-one should be surprised that in the real world exponential curves can't go on for ever. Here is Randall Munroe's explanation. In the real world exponential growth is always the first part of an S-curve.

Why has the Kryder rate slowed? This 2009 graph from Seagate shows that what looks like a smooth Kryder graph is actually the superimposition of a series of S-curves, one for each technology. One big reason for the slowing is technical, each successive technology transition gets harder - the long delay in getting HAMR into production is the current example. But this has economic implications. Each technology transition is more expensive, so the technology needs to remain in the market longer to earn a return on the investment. And the cost of the transition drives industry consolidation, so we now have only a little over 2 disk manufacturers. This has transformed disks from a very competitive, low-margin business into a stable 2-vendor one with reasonably good margins. Increasing margins slows the Kryder rate.

This isn't about technology "hitting a wall" and the increase in bit density stopping. It is about the interplay of technological and business factors slowing the rate of decrease in $/GB. For people who look only at the current cost of storage, this is irritating. For those of us who are concerned with the long-term cost of storage, it is a very big deal.

No comments: