Tuesday, December 11, 2012

Talk at Fall 2012 CNI

I gave a talk at CNI's Fall 2012 Membership Meeting entitled The Truth Is Out There: Preservation and the Cloud. It was an updated and shortened version of a seminar I had given in mid-November at UC Berkeley's School of Information. Below the fold is an edited text with links to the resources.

Monday, December 10, 2012

Sharing makes Glacier economics even better

A more detailed analysis of the economics of Glacier sharing the same infrastructure as S3 than I posted here makes the picture look even better from Amazon's point of view. The point I missed is that the infrastructure is shared. Follow me below the fold for the details.

Friday, December 7, 2012

Nostalgia

Google has a nice post with a short video commemorating today's 50th birthday of the Ferranti Atlas, the UK's first "supercomputer". Although it wasn't the first computer I programmed, Cambridge University's Atlas 2 prototype, called Titan, was the machine I really learned programming on, starting in 1968. It was in production from 1966 to 1973 with a time-sharing operating system using Teletype KSR33 terminals, a device-independent file system and many other ground-breaking features. I got access to it late at night as an undergraduate member of the Archimedeans, the University Mathematical Society. I wrote programs in machine code (as I recall there was no mnemonic assembler, you had to remember the numeric op-codes), in Atlas Autocode, and in BCPL. Best of all, it was attached to a PDP-7 with a DEC 340 display, which a friend and I programmed to play games.

Thursday, December 6, 2012

Updating "More on Glacier Pricing"

In September I posted More on Glacier Pricing including a comparison with our baseline local storage model. Last week I posted Updating "Cloud vs. Local Storage Costs which among other things updated and corrected the baseline local storage model. Thus I needed to updated the comparison with Glacier too. Below the fold is this updated comparison, together with a back-of-the-envelope calculation to support the claim I've been making that although Glacier may look like tape, it could just be using S3's disk storage infrastructure.

Friday, November 30, 2012

Updating "Cloud vs. Local Storage Costs"

A number of things have changed since I wrote my "Cloud vs. Local Storage Costs" post in June that impact the results:
  • Amazon reduced S3 prices somewhat. As of December 1st, in response to a cut by Google, there will be a further cut.
  • Disk prices have continued their slow recovery from the Thai floods.
  • 4TB SATA drives are in stores now, albeit at high prices.
  • Michael Factor pointed out that I hadn't correctly accounted for RAID overhead in my original calculation.
Below the fold is some discussion of the Amazon vs. Google price war, and a recalculated graph.

Thursday, November 15, 2012

Bandwidth costs for Cloud Storage

Our analysis of the costs of cloud storage assumes that the only charges for bandwidth are those levied by the cloud storage service itself. Typically services charge only for data out of the cloud. From our privileged viewpoint at major universities, this is a natural assumption to make.

At The Register Trevor Potts looks at the costs of backing up data to the cloud from a more realistic viewpoint. He computes the cost and time involved for customers who have to buy their Internet bandwidth on the open market. He concludes that for small users cloud backup makes sense:
I can state with confidence that if you have already have a business ADSL with 2.5Mbps upstream and at least a 200GB per month transfer limit (not hard to find in urban areas in most developed nations) then cloud storage for anything below 100GB per month will make sense. The convenience and reliability are easily worth the marginal cost.
For his example large user at 15TB/mo with a 100Mbit fiber connection, the bandwidth costs from the ISP are double the storage charges from Amazon, for a total of $4374. And recovery from the backups would cost about as much as a month's backup, and would take a month to boot. That simply isn't viable when compared to his local solution:
The 4TB 7200 RPM Hitachi Deskstar sells for $329 at my local computer retailer. Five of these drives (for RAID 5) is $1,645; a Synology DS1512+ costs $899. A 10x10 storage unit is $233/month, and the delivery guy costs me $33 per run. So for me to back up 15TB off-site each month is $2,800 per month.
Of course, in many cases libraries and archives are part of large institutions and their bandwidth charges are buried in overhead. And the bandwidth usage of preservation isn't comparable to backup; the rate at which data is written is limited by the rate at which the archive can ingest content. On the whole, I believe it is reasonable for our models to ignore ISP charges, but Trevor's article is a reminder that this isn't a no-brainer.

Monday, November 12, 2012

BIts per square inch vs. Dollars per GB

A valid criticism of my blog posts on the economics of long-term storage, and of our UNESCO paper (PDF), is that we conflate Kryder's Law, which describes the increase in the areal density of bits on disk platters, with the cost of disk storage in $/GB. We waved our hands and said that it roughly mapped one-for-one into a decrease in the cost of disk drives. We are not alone in using this approximation, Mark Kryder himself does (PDF):
Density is viewed as the most important factor ... because it relates directly to cost/GB and in the HDD marketplace, cost/GB has always been substantially more important than other performance parameters. To compare cost/GB, the approach used here was to assume that, to first order, cost/GB would scale in proportion to (density)-1
My co-author Daniel Rosenthal has investigated the relationship between bits/in2 and $/GB over the last couple of decades. Over that time, it appears that about 3/4 of the decrease in $/GB can be attributed to the increase in bits/in2. Where did the rest of the decrease come from? I can think of three possible causes:
  • Economies of scale. For most of the last two decades the unit shipments of drives have been increasing, resulting in lower fixed costs per drive. Unfortunately, unit shipments are currently declining, so this effect has gone into reverse.
  • Manufacturing technology. The technology to build drives has improved greatly over the last couple of decades, resulting in lower variable costs per drive. Unfortunately HAMR, the next generation of disk drive technology has proven to be extraordinarily hard to manufacture, so this effect has gone into reverse.
  • Vendor margins. Over the last couple of decades disk drive manufacturing was a very competitive business, with numerous competing vendors. This gradually drove margins down and caused the industry to consolidate. Before the Thai floods, there were only two major manufacturers left, with margins in the low single digits. Unfortunately, the lack of competition and the floods have led to a major increase in margins, so this effect has gone into reverse.
Thus it seems unlikely that, at least in the medium term, causes other than Kryder's Law will contribute significantly to reductions in $/GB. They may even contribute to increases. We have already seen that even the industry projections have Kryder's Law slowing significantly, to no more than 20% for the next 5 years.