Sunday, January 20, 2008

How Hard Is "A Petabyte for a Century"?

In a comment on my "Petabyte for a Century" post Chris Rushbridge argues that because his machine has 100GB of data and he expects much higher reliability than a 50% chance of it surviving undamaged for a year, which would be a bit half-life of 100 times the age of the universe, that the Petabyte for a Century challenge is not a big deal.

It is true that disk, tape and other media are remarkaby reliable, and that we can not merely construct systems with a bit half-life of the order of 100 times the age of the universe, but also conduct experiments that show that we have done so. Watching a terabyte of data for a year is clearly a feasible experiment, and at a bit half-life of 100 times the age of the universe one would expect to see 5 bit flips.

Nevertheless, it is important to note that this is an experiment very few people actually do. Does Chris maintain checksums of every bit of his 100GB? Does he check them regularly? How certain is he that at the end of the year every single bit is the same as it was at the start? I suspect that Chris assumes that because he has 100GB of data and most of it is over a year old and he hasn't noticed anything bad, that the problem isn't that hard. Even if all these assumptions were correct, the petabyte for a century problem is one million times harder. Chris' argument amounts to saying "I have a problem one-millionth the size of the big one, and because I haven't looked very carefully I believe that it is solved. So the big problem isn't scary after all."

The few people who have actually measured silent data corruption in large operational data storage systems have reported depressing results. For example, the excellent work at CERN described in a paper (pdf) and summarized at StorageMojo showed that the error rate delivered to applications from a state-of-the-art storage farm is of the order of ten million times worse than the quoted bit error rate of the disks it uses.

We know that assembling large numbers of components into a system normally results in a system much less reliable than the components. And we have evidence from CERN, Google and elsewhere (pdf) that this is what actually happens when you assemble large numbers of disks, controllers, busses, memories and CPUs into a storage system. And we know that these systems contain large amounts of software which contains large (pdf) amounts (pdf) of bugs. And we know that it is economically and logistically impossible to do the experiments that would be needed to certify a system as delivering a bit error rate low enough to provide a 50% probability of keeping a petabyte uncorrupted for a century.

The basic point I was making was that even if we ignore all the evidence that we can't, and assume that we could actually build a system reliable enough to preserve a petabyte for a century, we could not prove that we had done so. No matter how easy or hard you think a problem is, if it is impossible to prove that you have solved it, scepticism about proposed solutions is inevitable.


Chris Rusbridge said...

Hi David, I think you somewhat misrepresent my comment on the other post. I was ONLY addressing the "bit half life 100M times the age of the universe" factoid. For the record, I do believe that keeping a petabyte for a century is hard (I know someone who is trying to do it), and I accept many of your other evidence and arguments. However, it's clear that we can keep data which represents bit half lives >> the age of the universe, even after only 60 years of trying (I don't think I quite emphasised how hard a GB for a decade would have seemed in 1968, when I first met a 8 MB disk drive the size of a fridge).

I would like to understand HOW we can achieve such miraculous-seeming results; I guess it's through distributing checks widely across many layers and many hardware devices.

David. said...

I agree that "disk, tape and other media are remarkably reliable". Disk and tape drives contain many layers of astonishing engineering, from the medium though the heads and the signal processing that extracts bits from the noisy analog signal, and the error correcting codes that clean up the bit stream. But these are nowhere near good enough to meet the demands society has for data preservation. I'm not the right person to explain these layers.

So, on top of the actual storage media we have to layer file systems and digital preservation systems to make up for their (measurable) unreliability. They increase the reliability of the bits by another large factor. The problem is not in our ability to continue to add more and more error detection and correction capabilities to the pile. I have worked in these layers off and on for many years.

There are three problems. The first is that, perhaps because they are in awe of the astonishing engineering, very few people are measuring the reliability their pile is delivering. The second is that the scanty evidence we have is that the pile is failing to deliver the reliability we would expect. The third is that, even if the pile was delivering the reliability we need, we could not perform the experiments needed to prove that it was doing so.

So, yes, it is amazing that we can store bits so reliably. But in the context of digital preservation, patting ourselves on the back about this achievement is counter-productive.

David. said...

Fixed a broken link to Junfeng Yang et al's paper on EXPLODE.