Thursday, November 17, 2011

Progress on the Economic Model of Storage

I've been working more on the economic model of long-term storage. As an exercise, I tried to model the effect on the long-term cost storage on disk of the current floods in Thailand. The more I work on this model, the more complex the whole problem of predicting the cost of long-term storage becomes. This time, what emerged is that, despite my skepticism about Kryder's Law, in a totally non-obvious way I had wired in to the model the assumption that disk prices could never rise! So when I tried to model the current rise in disk prices, things went very wrong. So, until I get this fixed, the best I can do is to model a pause of a varying number of years before disk prices resume their Kryder's Law decrease.

For this simulation, I assume that interest rates reflect the history of the last 20 years, that the service life of disks is 4 years, that the planning horizon is 7 years, that the disk cost is 2/3 of the 3-year cost of ownership, and that the initial cost of the unit of storage is $100. The graph plots the endowment required to have a 98% probability of surviving 100 years (z-axis) against the length of the initial pause in disk cost decrease in years (y-axis), and the percentage annual decrease in disk cost thereafter (x-axis).

As expected, the faster the disk price drops and the shorter the pause before it does, the lower the endowment needed. In this simulation the endowment needed ranges from 4.2 to 17.6 times the initial cost of storage, but these numbers should be taken with a grain of salt. It is early days and the model has many known deficiencies.

3 comments:

Henry Newman said...

David,

I have been looking at the cost of BaFe tapes from the 2 vendors and have found the cost to be cheaper when considering factor such as library slot costs, pick and load time over LTO-5. Given the hard error rate differences between tape and disk (both enterprise and consumer) could your model account for reliability as a function of long term cost and use the hard error rate as part of the function. I think that users of large archives need to be able to define the 9s for integrity as part of the model. Just my opinion. Best regards as always
Henry

David. said...

Henry, you're asking the model to run long before it can walk. There is a vast amount of work to do to make the economic model realistic enough for the real world before trying to connect it to models of storage reliability which, as I have pointed out at length themselves need vast amounts of work before they are realistic enough for the real world.

David. said...

Also, Henry, do you have any evidence showing that media hard error rates are a significant factor in overall storage system reliability in the field, as compared to all the other possible causes of data loss?

The research of which I'm aware shows (a) that vastly more errors than would be expected from the media error rates are detected, and (b) that the root cause of about half of them is traced to components other than the drives. This makes media error rates pretty useless as a predictor of overall system reliability.