Friday, November 30, 2012

Updating "Cloud vs. Local Storage Costs"

A number of things have changed since I wrote my "Cloud vs. Local Storage Costs" post in June that impact the results:
  • Amazon reduced S3 prices somewhat. As of December 1st, in response to a cut by Google, there will be a further cut.
  • Disk prices have continued their slow recovery from the Thai floods.
  • 4TB SATA drives are in stores now, albeit at high prices.
  • Michael Factor pointed out that I hadn't correctly accounted for RAID overhead in my original calculation.
Below the fold is some discussion of the Amazon vs. Google price war, and a recalculated graph.

Thursday, November 15, 2012

Bandwidth costs for Cloud Storage

Our analysis of the costs of cloud storage assumes that the only charges for bandwidth are those levied by the cloud storage service itself. Typically services charge only for data out of the cloud. From our privileged viewpoint at major universities, this is a natural assumption to make.

At The Register Trevor Potts looks at the costs of backing up data to the cloud from a more realistic viewpoint. He computes the cost and time involved for customers who have to buy their Internet bandwidth on the open market. He concludes that for small users cloud backup makes sense:
I can state with confidence that if you have already have a business ADSL with 2.5Mbps upstream and at least a 200GB per month transfer limit (not hard to find in urban areas in most developed nations) then cloud storage for anything below 100GB per month will make sense. The convenience and reliability are easily worth the marginal cost.
For his example large user at 15TB/mo with a 100Mbit fiber connection, the bandwidth costs from the ISP are double the storage charges from Amazon, for a total of $4374. And recovery from the backups would cost about as much as a month's backup, and would take a month to boot. That simply isn't viable when compared to his local solution:
The 4TB 7200 RPM Hitachi Deskstar sells for $329 at my local computer retailer. Five of these drives (for RAID 5) is $1,645; a Synology DS1512+ costs $899. A 10x10 storage unit is $233/month, and the delivery guy costs me $33 per run. So for me to back up 15TB off-site each month is $2,800 per month.
Of course, in many cases libraries and archives are part of large institutions and their bandwidth charges are buried in overhead. And the bandwidth usage of preservation isn't comparable to backup; the rate at which data is written is limited by the rate at which the archive can ingest content. On the whole, I believe it is reasonable for our models to ignore ISP charges, but Trevor's article is a reminder that this isn't a no-brainer.

Monday, November 12, 2012

BIts per square inch vs. Dollars per GB

A valid criticism of my blog posts on the economics of long-term storage, and of our UNESCO paper (PDF), is that we conflate Kryder's Law, which describes the increase in the areal density of bits on disk platters, with the cost of disk storage in $/GB. We waved our hands and said that it roughly mapped one-for-one into a decrease in the cost of disk drives. We are not alone in using this approximation, Mark Kryder himself does (PDF):
Density is viewed as the most important factor ... because it relates directly to cost/GB and in the HDD marketplace, cost/GB has always been substantially more important than other performance parameters. To compare cost/GB, the approach used here was to assume that, to first order, cost/GB would scale in proportion to (density)-1
My co-author Daniel Rosenthal has investigated the relationship between bits/in2 and $/GB over the last couple of decades. Over that time, it appears that about 3/4 of the decrease in $/GB can be attributed to the increase in bits/in2. Where did the rest of the decrease come from? I can think of three possible causes:
  • Economies of scale. For most of the last two decades the unit shipments of drives have been increasing, resulting in lower fixed costs per drive. Unfortunately, unit shipments are currently declining, so this effect has gone into reverse.
  • Manufacturing technology. The technology to build drives has improved greatly over the last couple of decades, resulting in lower variable costs per drive. Unfortunately HAMR, the next generation of disk drive technology has proven to be extraordinarily hard to manufacture, so this effect has gone into reverse.
  • Vendor margins. Over the last couple of decades disk drive manufacturing was a very competitive business, with numerous competing vendors. This gradually drove margins down and caused the industry to consolidate. Before the Thai floods, there were only two major manufacturers left, with margins in the low single digits. Unfortunately, the lack of competition and the floods have led to a major increase in margins, so this effect has gone into reverse.
Thus it seems unlikely that, at least in the medium term, causes other than Kryder's Law will contribute significantly to reductions in $/GB. They may even contribute to increases. We have already seen that even the industry projections have Kryder's Law slowing significantly, to no more than 20% for the next 5 years.

Thursday, November 8, 2012

Format Obsolescence In The WIld?

The Register has a report that, at a glance, looks like one of the long-sought instances of format obsolescence in the wild:
Andrew Brown asked to see the echocardiogram of his ticker, which was taken eight years ago. He was told that although the scan is still on file in the Worcestershire Royal hospital, it will cost a couple of grand to recreate the data as an image because it is stored in a format that can no longer be read by the hospital's computers.
But looked at more closely below the fold we see that it isn't so simple.