Tuesday, September 27, 2011

Modeling the Economics of Long-Term Storage

I gave a talk at the Library of Congress workshop on Designing Storage Architectures entitled Modeling the Economics of Long-Term Storage. It was talk about work in progress with Library of Congress funding, expanding ideas I described in these two blog posts about ways to compare the costs of different approaches to long term storage. I had only 10 minutes to speak, so below the fold is an expanded and edited text of the talk with links to the sources.


Prelude

Mark Kryder (of Kryder's Law) and Chang Soo Kim of Carnegie-Mellon's Data Storage Systems Center published an important paper entitled "After Hard Drives - What Comes Next?" in 2009. I regret not spotting it sooner. They point out that the disk drive industry was then working with areal densities around 500Gb/in2 as against a theoretical limit around 200 times higher, and that the 40% per annum density growth rate predicts 10TB/in2 in 2020. This, they predict, implies a 14TB 2.5" drive costing $40. From the perspective of long term storage this sounds encouraging. But it is worth noticing that this also implies that Kryder's Law for disks must stop by 2026. So the more somber view is that we have at most a little more than a decade of rapidly decreasing storage costs to go.

They examine 13 alternative storage technologies, plus NAND flash, against this competitve benchmark. They project the characteristics of the technologies based on the capabilities of semiconductor fabs projected by the industry, and the research literature about each. They conclude:

"to compete with hard drives on a cost per terabyte basis will be challenging for any solid state technology because the ITRS lithography roadmap limits the densities that most alternative technologies can achieve. Those technologies with the best opportunity have a small cell size and the capability of storing multiple bits per cell. Phase change random access memory (PCRAM) and spin transfer torque random access memory (STTRAM) appear to meet these criteria."
As Dave Anderson has pointed out, we can be pretty sure that solid state memories are not going to completely displace disks in the market in the next 5-10 years, because the factories that build them are expensive and time-consuming to build. Simply by comparing the number of bytes of storage that existing and planned hard disk factories will produce with the number of bytes that existing and planned semiconductor fabs can build, it is clear that the two technologies will share the storage market, as they do now.

As now, it will be factors other than raw cost per byte that determine which technologies capture which parts of the overall storage market. Flash, despite much higher cost per byte, has captured significant parts of the market because in those parts its other attributes, such as low latency, fast transfer rate, low power and physical robustness, are more important at the overall system level than cost per byte.

Introduction

I think we can all agree that if we had an unlimited budget we would have no problem preserving data for ever. The fundamental problem in digital preservation is trading off reliability against cost; we need models to be able to make these trade-offs. I've written extensively about the difficulty of modeling storage reliability, but it turns out that there are difficulties in modeling costs as well. The difficulties arise because different storage technologies have different costs through time.

A Lifetime of Investment Decisions

Keeping data safe for the long term is not a one-time investment. Because the life of the data is longer than the life of the hardware used to store it, and because different media have different purchase and running costs, and because the interest rates on the loans needed to finance the purchase of successive hardware generations vary, at each stage of the life of a unit of data an individual investment decision must be made. Is it better to continue to use the current hardware, or should it be replaced, and if so by what?

A Durable Array of Wimpy Nodes

For example, in a recent technical report Ian Adams, Ethan Miller and I argue for a storage architecture we call DAWN for Durable Array of Wimpy Nodes, based on combining solid state memory with very low-power processors in a storage network. Even though the initial cost per byte of the DAWN technology is much higher than conventional disk storage, the total cost of ownership of data stored in it for the long term would be much lower because of its much lower power and space consumption, and its much longer service life.

In order to draw conclusions such as these, we need to compare costs incurred in the future with costs incurred now. We based our argument on a simplistic economic analysis. But Kryder and Kim's work shows that making tradeoffs between spending more now in return for lower total system costs through time will be fundamental to the economics of long term data storage for a long time to come. We need a more sophisticated analysis.

Discounted Cash Flow (DCF)

In order to draw conclusions such as these, we need to compare costs incurred in the future with costs incurred now. The standard technique economists use for doing so is called Discounted Cash Flow (DCF). DCF works by assuming a rate of return, a real interest rate. Then if I have to spend a dollar next year, I can look on it as being the same as investing enough less than a dollar now at the assumed interest rate so that a year from now the interest will have made up the difference and I will have the dollar I need to spend. The amount I need to invest now is called the net present value of the future dollar. The same process works for longer periods, larger sums and income as well as expenditure.

How do I know what interest rate to use for the calculation? If I'm certain about the future cash flow, the Treasury bond market tells me. The Treasury publishes the real yield curve every day, which relates the interest rate on inflation-protected US government debt to its term. In practice, there will be some uncertainty attached to the future cash flows, so I will add a risk premium to the interest rate to reflect the amount of uncertainty.

Does DCF Work In Practice?

Although DCF is economists' standard technique, there is evidence that it does not describe the process investors use in making decisions in the real world. Andrew Haldane and Richard Davies of the Bank of England used historical data from 624 UK and US companies to estimate the extent to which short-term thinking is reflected in the prices of their stock. In theory, the price of each company's stock should reflect the net present value of the stream of future dividends. They find that:
"First, there is statistically significant evidence of short-termism in the pricing of companies’ equities. This is true across all industrial sectors. Moreover, there is evidence of short-termism having increased over the recent past. Myopia is mounting. Second, estimates of short-termism are economically as well as statistically significant. Empirical evidence points to excess discounting of between 5% and 10% per year."
In other words:
"In the UK and the US, cash-flows 5 years ahead are discounted at rates more appropriate 8 or more years hence; 10 year ahead cash-flows are valued as if 16 or more years ahead; and cash-flows more than 30 years ahead are scarcely valued at all."
When making decisions using DCF, investors use interest rates that are systematically too high by 5-10%. This makes spending money now to save money in the future much harder to justify, a serious problem for all forms of digital preservation but especially for technologies such as DAWN.

Does DCF Work In Theory?

The Bank of England study shows that DCF doesn't work in practice. A 2009 paper by Doyne Farmer of the Santa Fe Institute and John Geanakoplos of Yale entitled Hyperbolic Discounting is Rational: Valuing the Far Future With Uncertain Discount Rates shows that it doesn't even work in theory, at least in the long term:
"What this analysis makes clear, however, is that the long term behavior of valuations depends extremely sensitively on the interest rate model. The fact that the present value of actions that affect the far future can shift from a few percent to infinity when we move from a constant interest rate to a geometric random walk calls seriously into question many well regarded analyses of the economic consequences of global warming. ... no fixed discount rate is really adequate – as our analysis makes abundantly clear, the proper discounting function is not an exponential."
Their work is complex, a more accessible description is provided by these two blog posts by Mark Buchanan.

Using a fixed discount rate averages over the variation in interest rates, so the computation never sees periods of very high or very low (possibly even negative) interest rates. But the interest rate has a non-linear effect; if these periods occur they have large impacts on the outcome. Instead, we need to average over the possible paths through the time-varying interest rate environment.

Model

This insight is important for long-term digital preservation, which requires a series of investments through time. As storage media wear out, become obsolete, or become no longer cost-effective, they must be replaced. Applying the interest rate prevailing now to the investment needed in a few years to replace the current hardware, as we would using DCF, is simply wrong. As is projecting the current exponential decrease in disk cost per byte uniformly into the future. Both future interest rates and future hardware prices are uncertain.

We can model this using Monte Carlo simulation. We follow a large number of possible paths through a simulated future of varying interest rates and hardware price/performances chosen at random from distributions aimed at modeling the real world. We then average over these paths to determine the most probable outcome and the range of outcomes for a particular set of assumptions.

Components of the Model

Our model is a Monte Carlo simulation of the economic history of a unit of stored data. It has the following components:
  • Yield Curves, which relate the term of a loan to the real interest rate charged. The yield curve is set each year by choosing a random year in the past and using the graph showing the rate for inflation-protected Treasuries of various durations in that year from a database of real yield curves back to 1990.
  • Loans, modeling the amortization of the capital costs of storage. They have a principal amount, an interest rate set from the yield curve plus a margin, a term remaining and a amount outstanding.
  • Assets, representing the remaining money available in an endowment or "Pay Once, Store Endlessly" model. They earn interest determined by the yield curve.
  • Technologies, representing different ways of storing data with different costs and service lives.
The model uses discounted cash flow to compare costs over time, but following the Bank of England research it multiplies the prevailing interest rate by a "short-term-ism" factor and also introduces a planning horizon in years. Investments, irrespective of their service life, have to pay off before the planning horizon. Eventually, the model will include replication policies, but at present it only models a single copy of the unit of data.

Technology

To reflect the real world of storage, the model generates new technologies every year, which displace old technologies from the market. Per unit of storage, each technology has a:
  • Purchase Cost. For example, for disk storage this might follow Kryder's Law, decreasing exponentially through time. Or, for cloud storage, this might be zero.
  • Annual Running Cost. For example, for cloud storage this would include the rental charge, bandwidth and compute charges for integrity checks. Or, for local disk storage, it would include power, cooling, rack space, etc.
  • Move-in Cost. For cloud storage, this would include the bandwidth costs of uploading the data.
  • Move-out Cost. Thus the cost of a migration between storage technologies is the sum of the move-out cost of the old technology and the move-in cost of the new.
  • Service Life in years. For local disk, this might be 4 or 5 years.
In addition, if the model decides to deploy a particular technology, the technology includes a loan, called the purchase loan, for the amount of the purchase cost plus the migration cost from the preceding technology. The loan has a term equal to the service life and the model's prevailing interest rate for loans of that term when the purchase was made.

Operations of the Model

The model operates with real rather than nominal prices and interest rates on an annual cycle. Each year:
  • Interest rates are set.
  • This year's set of available technologies is created by generating a number of new technologies. Technologies already in the set that can't compete with the new ones are replaced.
  • For each deployed technology:
    • If it has reached the end of its service life it is retired and replaced by the available technology with the lowest cost of ownership over the planning horizon.
    • Otherwise, the discounted cost of each of the available technologies over the shorter of the remaining service life of the deployed technology and the planning horizon is computed (i.e. the move-in cost plus the running cost plus the purchase loan costs over that period). If the lowest of these costs plus the purchase loan costs of the deployed technology over the period are lower than the discounted running costs of the deployed technology over the period, the deployed technology is replaced by the available technology with the lowest cost.
  • The running costs and the purchase loan costs of the deployed technology are paid, together with the purchase loan costs of any outstanding loans from prematurely retired technologies.
An Example of Using The Model

As a simple example of the possible uses for the model, we assume that Kryder's Law applies strictly, so that the cost per byte of otherwise identical disks drops exponentially, and interest rates vary as they have in the past. Then we ask how much money it takes to endow a unit of data to be stored "forever" (actually, 100 years). What the model can produce, after 7,500 runs, is this graph relating the size of the endowment to both the probability of surviving 100 years, and the most probable age at which the runs that ran out of money did so.

Other Uses for the Model

There are many other questions we can investigate using this model. For example:
  • What is the effect of the "short-term-ism" factor and the planning horizon on the total cost of data ownership?
  • What is the effect on the endowment needed for a unit of data of pauses of varying lengths in the Kryder's Law decrease in cost per byte?
  • How much more expensive can a DAWN architecture system be than a conventional hard disk system and still be affordable?
  • How do conventional local hard disk technologies compare with cloud storage, which has effectively zero purchase cost but significant running costs?
Feedback

What I'd like now is feedback on how realistic and how useful you think this model is, and how it might be improved.

10 comments:

David. said...

Researchers at Purdue just announced a solid-state memory technology that Kryder & Kim (obviously) didn't evaluate. It is a variant of FRAM, which they did evaluate, called FeTRAM. FRAM stores a bit in a capacitor, and has destructive read-out; FeTRAM stores a bit in a transistor and has non-destructive read-out. My amateur evaluation is that this may score better than FRAM, but probably not enough to change the overall conclusion.

Bryan Beecher said...

David, I thought you did a pretty nice job of summarizing the model in the ten minute slot.

I think a model would be a very valuable tool, and I'd like to encourage you to continue to refine it.

As I mentioned at the LC meeting, I think it would be useful if there was some way to capture the people costs too. I know at ICPSR we spend something like $5-$10 on people for every $1 we spend on technology.

(High-level summary of the meeting is here if anyone is looking for additional context.)

David. said...

I deliberately titled the talk "Modeling the Economics of Long-Term Storage" rather than "Modeling the Economics of Long-Term Preservation" because I believe that the economics of preservation are even more complex than those of storage. There's more work than I can do just in modeling storage.

I believe that staff costs do form a major part of preservation costs. Do they form a major part of your storage costs?

That being said, if in a particular case staff costs are significant, you could fold them into the running and migration costs in the model. The model is intended for "what if?" use, so I expect people to adjust its parameters.

Bryan Beecher said...

OK, I understand the distinction.

My sense, then, is that people are not *the* major cost in *storage*. For example, it sounds like your intent is not to try to capture costs in preparing the content for ingest, and that is certainly where a lot of ICPSR's people costs reside.

If I try to think about people costs in the narrow context of storage, then I have to ask where I draw the line. For instance, if a sysadmin-type writes a fixity checking tool and reviews the results periodically, is that preservation or storage? The person who migrates the content from storage array technology A to storage array technology B? The person who migrates the content from repository software technology A to repository software technology B? And so on.

Maybe any storage solution of any size needs a fixed amount of people, and then scales (for some function) with the quantity of storage thereafter? And the trick for the storage manager is to select technologies and architectures where the function grows slowly?

So perhaps the punchline is indeed to fold these into the operational and migration costs as you've suggested.

Thanks again for the interesting model and talk.

David. said...

The slides for this talk are here.

David. said...

Any true believers in Kryder's Law out there, for whom disk drive price drops are a law of nature, should take note of this comment from the sales director of a major disk distributor (my emphasis):

"There is no volume discount of any kind, inventories are tight and prices are rising,"

The reason is that historic floods in Thailand are submerging disk factories and their suppliers.

David. said...

I guess this is an awful warning against messing with economic models.

David. said...

Presentations from this meeting are now on line.

David. said...

Wow! The effects of the floods in Thailand have been severe enough that, according to The Register, the price per GB of enterprise disk is now higher than enterprise SSDs.

David. said...

Thanks to Norman Gray for pointing out that the correct link to the slides is now this and not the link in the comment above.