Saturday, February 25, 2012

Talk at PDA2012

I spoke at this year's Personal Digital Archiving conference at the Internet Archive, following on from my panel appearance there a year ago. Below the fold is an edited text of the talk with links to the sources.

At last year's PDA I sparked a lively discussion with my panel appearance called Paying for Long-Term Storage. I'm hoping to leave enough time for a similar discussion this year.

Last year's talk covered the three possible business models for long-term storage, and focused on endowment as being the only really viable one. Endowment involves depositing the data together with a capital sum sufficient to pay for its storage indefinitely.

The reason endowment is thought to be feasible is Kryder's Law, the 30-year history of exponential increase in disk capacity at roughly constant cost. Provided that it continues for another decade or so after you deposit your data, the endowment model works. Unfortunately, exponential growth curves never continue indefinitely. At some point, they stop.

This leaves us with two intertwined questions:

  • How long can we expect Kryder's Law to continue?
  • How much should we charge per TB?
The questions are intertwined because, obviously, the sooner Kryder's Law stops the more we have to charge. I was hoping that finding out how to answer these questions would be somebody else's problem. But it turned out to be my problem after all.

I've been working for the Library of Congress on using cloud storage for a LOCKSS box (PDF). It turns out that there are several meanings of "using cloud storage for a LOCKSS box", and I have some of them actually working. But as I was starting to write up this work, I realised that the question I was going to get asked was "does it make economic sense to use cloud storage for a LOCKSS box?" A real LOCKSS box has both capital and running costs, whereas a virtual LOCKSS box in the cloud has only running costs. For an apples-to-apples comparison, I need to compare cash flows through time.

Economists have a standard technique for comparing costs through time, called Discounted Cash Flow (DCF). The idea is that needing to pay a dollar in a year is the same as investing less than a dollar now so that the investment plus the accrued interest in a year will be the dollar I need to pay. Simple. In all the textbooks. But when I looked into it, two problems emerged.

First, it doesn't work in practice. You need to know what interest rate to use. Here is research from the Bank of England (PDF) showing that the interest rates investors use are systematically wrong, in a way that makes endowing data, or making any other long-term investment, very difficult.

Second, it doesn't even work in theory. Here is research from Doyne Farmer of the Santa Fe Institute and John Geankoplos of Yale (PDF), pointing out that (assuming you could choose the correct interest rate) using a fixed real interest rate would be OK if the outcome was linearly related to the interest rate. But it isn't. Using a constant interest rate averages out periods (like the 80s) of high interest rates and periods (like now) of very low (or negative) real interest rates.

In order to model long-term investments, you need to use Monte Carlo techniques with an interest rate model. Similarly, if we assume that in the future storage costs will drop at varying rates, we need to use Monte Carlo techniques with a storage cost model.

Why would we believe that in the future storage costs will drop at varying rates? Five reasons come to mind:

  • First, they just did. The floods in Thailand increased disk prices by 50-100% almost overnight. These increased prices have flattened in recent months, but are expected to remain above trend for at least a year.
  • Second, you might be wanting to use the well-known and increasingly popular "affordable cloud storage". Here's a table of the price history of four major cloud storage providers showing that the best case is a 3% per year price drop. That's 3% not 30%.
  • Third, disk manufacturers are already finding further increases in density difficult. To stay on the curve we should have had 4TB disks by the middle of last year at the latest, but all we have are 3TB drives. The transition to future disk technologies such as HARM and BPM is being delayed, and desperate measures, called "shingled writes", are under way to build a 6th generation of the current technology, PMR. Shingled writes means, among other problems, that disks are no longer randomly writable. They become an append-only medium. 
  • Fourth, even if we assume that Kryder's Law continues, we are in for a pause in the cost drop. The market for 3.5" disks is desktop PCs, which is collapsing. The volume consumer market is now 2.5" drives, which are on the same curve, just at a higher price per byte. And the life of the 2.5" form factor is also limited. If Kryder's Law continues until 2020 we should in theory have a $40 2.5" drive holding 14TB. But no-one is going to build this drive because no-one wants 14TB on their laptop. How would you back it up? They would much rather have a 2TB 1" drive for $15 and much less power draw.
  • Fifth, there is a hard theoretical limit to the minimal size of a magnetic domain at the temperatures in a disk drive. This means Kryder's Law for magnetic disks pretty much has to stop by 2026 at the latest, and probably much earlier. Mark Kryder and Chang Soo Kim of C-MU compared the various competing solid state technologies with the 2020 14TB 2.5" drive (PDF), and none of them looked like good candidates for continued rapid drop in storage costs beyond there.
So, we need a Monte Carlo model. I started building one, and it rapidly became clear that this was a problem much bigger than I could solve on my own. So we have started up a research program at UC Santa Cruz and Stony Brook University, with help from NetApp. I'm about to show you some early results from this collaboration. I need to stress that this is very much work in progress. We are just at the stage of trying to understand what a comprehensive model would look like, by building simple models and seeing if they produce plausible results.

The first model is work by Daniel Rosenthal (no relation) of UCSC. It follows a unit of storage capacity, as it might be a shelf of a filer, as the demand for storage grows, disks fail or age out and are replaced by drives storing more data, and power and other running costs are incurred. Daniel's model doesn't account for the time value of money, so it can only be used for short periods.

Here is a graph reproducing the well-known fact that drives (or tapes in a robot) are replaced when the value of the data they hold is no longer enough to justify the space they take up, not when their service life expires. With Daniel's parameters, the optimum drive replacement age is under 3 years.

The second model is my initial simulation. It follows a unit of data, say a TB, as it migrates between media as they are replaced, occupying less and less of them. Unlike Daniel's model, this one uses an interest rate model to properly account for the time value of money. In this case interest rates are based on the last 20 years.

Here are about a thousand runs of the model. We gradually increase the endowment and each time see what probability we have of surviving 100 years without running out of money. As you see, if storage media prices are, as we assumed, dropping 25% a year the variation in interest rates doesn't have a big effect.

Here are a few million runs of the model, varying the Kryder's Law rate and the endowment to get a 3D graph. If we take the 98% contour of this graph, we get the next graph.

This shows the relationship between the endowment needed for a 98% probability of not running out of money in 100 years, and the rate of Kryder's Law decrease in cost per byte, which we
assume to be constant.

The plausible result is that the endowment is relatively insensitive to the Kryder's Law rate if it is large, say above 25%/yr. But if it is small, say below 15%/yr, the endowment is rather sensitive to the rate.

This is one of the key insights from our work so far. Storage industry experts disagree about the details but agree that the big picture is that Kryder's Law is slowing down. Thus we're moving from the right, flat side of the graph to the left, steep side. Despite being in a region of the graph where the cost is relatively low and easy to predict, the economic sustainability of digital preservation has been a major concern. Going forward, digital preservation faces two big problems:
  • The cost of preserving a unit of data will increase.
  • The uncertainty about the cost of preserving a unit of data will increase.

The next graph applies the model to cloud storage, assuming an initial cost of 13 cents/GB/yr and interest rates from the last 20 years. We compute the endowment needed per TB for various rates of cost decrease. For example, if costs decrease at the 3% rate of the last 6 years of S3, we need $29K/TB.

This is a lot of money. It is clearly possible that prices in the first 6 years of cloud storage were an anomaly, and in the near future they will start dropping as quickly as media prices. But the media price drop is slowing, and S3 does not appear to be under a lot of pricing pressure. Unless things change, cloud storage is simply too expensive for long-term use.

The last graph shows the effect on the endowment of a spike that doubles disk prices a number of years into the life of the data. The Y=0 line has no spike for comparison. As expected, the effect is big if the Kryder's Law drop is slow and the spike is soon. Note the ridge, which shows that if the spike happens at the 4-year life I assumed for the drives, you are in trouble.

As I said, we are at the very early stages of this work. It has turned out to be a lot more interesting and difficult than I could have imagined when I spoke here last year. Some of the improvements we're looking at are pluggable alternate models for interest rates (the last 20 years may not be representative) and technology evolution (we want to model the introduction of new technologies with very different properties).

We want to use the these initial models to study questions such as:
  • How does the increasing short-termism discovered by the Bank of England affect the endowment required?
  • How can we choose between storage technologies with different cost structures, such as tape, disk and solid state, as their costs evolve at different rates?
  • Can cloud storage services compete for long-term storage?
By next year, we hope to have a simulation that is realistic enough for you to use for scenario planning. We are anxious to learn if you think a simulation of this kind would be useful, and what questions you would like to ask it.

1 comment:

David. said...

One interesting point that came out in questions after my talk. The Internet Archive is offering an endowment storage service - the cost is 40 times the cost of the raw disk. Given the replication, ingest and other costs, this number roughly matches the output from the model with reasonable Kryder's Law assumptions.