Thursday, November 8, 2012

Format Obsolescence In The WIld?

The Register has a report that, at a glance, looks like one of the long-sought instances of format obsolescence in the wild:
Andrew Brown asked to see the echocardiogram of his ticker, which was taken eight years ago. He was told that although the scan is still on file in the Worcestershire Royal hospital, it will cost a couple of grand to recreate the data as an image because it is stored in a format that can no longer be read by the hospital's computers.
But looked at more closely below the fold we see that it isn't so simple.

First, the expense comes not from needing to migrate the format, but to migrate the media, because the image of the patient's heart is stored on:
a magneto-optical disc (MOD) drive. Philips UK no longer has that particular disc drive in stock because it is out of production, so it would have to buy one in from America. The manufacturer quoted a price of £2,000 to the trust as the cost of sourcing this MOD unit.
In other words, the problem arose because the hospital didn't bother to migrate its image files off a medium, MOD, that was obviously doomed long before 2004. Although The Register claims that:
the scan is stored in a DSR-TIFF format that is only readable by a specific build of Philips Xcelera software,
I am not convinced that even if the bits could be retrieved they couldn't be rendered. A quick Google search reveals many software products that claim to be able to read DSR-TIFF image files, such as this one. And, in any case, the hospital doesn't claim that they can't be rendered, only that:
We do have the visual data on file but the cost of generating an image from what is now obsolete technology is not a cost-effective use of public money.
As usual, the issue about digital preservation isn't whether it can be done but whether we can afford to do it.

Black vs. white arguments about format obsolescence are misleading. They typically assume that the funds to prepare for the uncertain possibility of future format obsolescence are unlimited. In practice, the more you spend per byte preparing for the possibility that the format will go obsolete, the fewer bytes you can preserve. The bytes you don't preserve are almost certain to be lost. The bytes you preserve complete with elaborate format metadata are quite likely to be lost for reasons other than format obsolescence. Even if they survive and the format does go obsolete, it is very likely that the format metadata will not help.


euanc said...

Hi David,

Great post, I very much agree with what you are saying. It echoes some of what I said in a comment here e.g.:

"The problem is the cost of preventing change. Some changes are presumed to be more expensive than others to prevent. The trouble is that at the moment we don’t have good economic models to predict the cost of employing different strategies to prevent change (while also enabling access) and therefore we cannot compare the different strategies either."

In retrospect, I should have referenced your post in the comment.
Your work on the cost of long term storage is a superb example of the kind of cost modelling I believe we need for other areas in the digital preservation space.
Over here I speculated that employing an emulation strategy will probably be a cheaper option for maintaining access long term in many cases. But I was careful not to say it definitely will as we just don't have enough evidence yet to be able to make an informed decision about it.
So while I might ask for models similar to yours for use in predicting the cost of different strategies, I fear that it will be some time before we are able to produce them.


David. said...

Thanks, Euan.

I agree that we need models covering the various digital preservation processes. There is research in this area, but it is patchy. It can, for example, tell you something about the cost of generating format metadata now, but not about what it would cost if you waited until later. And it doesn't tell you what the return in terms of longer usable life of the bits is for generating format metadata, or other processes. So it isn't enough to help you make informed decisions about what to spend when.