Saturday, December 4, 2010

A Puzzling Post From Rob Sharpe

I'm sometimes misquoted as saying "formats never become obsolete", but that isn't the argument I am making. Rather, I am arguing that basing the entire architecture of digital preservation systems on preparing for an event, format obsolescence, which is unlikely to happen to the vast majority of the content in the system in its entire lifetime is not good engineering. The effect of this approach is to raise the cost per byte of preserving content, by investing resources in activities such as collecting and validating format metadata, that are unlikely to generate a return. This ensures that vastly more content will be lost because no-one can afford to preserve it than will ever be lost through format obsolescence.

Tessella is a company in the business of selling digital preservation products and services based on the idea that content needs "Active Preservation", their name for the idea that the formats will go obsolete and that the way to deal with this prospect is to invest resources into treating all content as if it were in immediate need of format migration. Their market is
managing projects for leading national archives and libraries. These include ... the UK National Archives ... the British Library [the] US National Archives and Records Administration ... [the] Dutch National Archief and the Swiss Federal Archives.
It isn't a surprise to find that on Tesella's official blog Rob Sharpe disagrees with my post on format half-lives. Rob points out that
at Tessella we have a lot of old information trapped in Microsoft Project 98 files.
The obsolescence of Microsoft Project 98's format was first pointed out to me at the June 2009 PASIG meeting in Malta, possibly by Rob himself. I agree that this is one of the best of the few examples of an obsolete format, but I don't agree that it was a widely used format. What proportion of the total digital content that needs preservation is Project 98?

But there is a more puzzling aspect to Rob's post. Perhaps someone can explain what is wrong with this analysis.

Given that Tessella's sales pitch is that "Active Preservation" is the solution to your digital preservation needs, one would expect them to use their chosen example of an obsolete format to show how successful "Active Preservation" is at migrating it. But instead
at Tessella we have a lot of old information trapped in Microsoft Project 98 files.
Presumably, this means that they are no longer able to access the information "using supported software". Of course, they could access it using the old Project 98 software, but that wouldn't meet Rob's definition of obsolescence.

Are they unable to access the information because they didn't "eat their own dog-food" in the Silicon Valley tradition, using their own technology to preserve their own information? Or are they unable to access it because they did use their own technology and it didn't work? Or is Project 98 not a good example of
a format for which no supported software that can interpret it exists
so it is neither a suitable subject for their technology, nor for this debate?

5 comments:

Anonymous said...

Project 98 seems an odd choice for a compelling example: if its obsolescence poses a problem to accessing important, needed, information in a format with required functionality & "significant properties", then Sharpe obviously needs to provide more details. Who needs to preserve the mundane details of project management or the functionality to allocate resources to tasks or track costs just like our forefathers did in the good ol' days of MSProject 98? Some specifics, please, and enough examples to convince me that this is a general problem. Given the continuing availability of viewer applications for Project 98 files (http://www.projec.to/, http://www.projectviewercentral.com/, and no doubt others), this looks to me like a non-example.

As you point out, it's kind of important to appraise the value to the community you serve of the info in a set of files before committing to the ongoing costs of "active" or any other flavour of preservation. Too many discussions of digital preservation, like Sharpe's, are weak on the issue of appraisal and "the fine art of destruction", as archivists have called it.

David. said...

Chris shows me up. I had taken the obsolescence of MS Project 98 on trust, but as he points out Project.to is a commercial cloud application, and Housatonic Project Viewer is commercial Windows viewer that both (at least claim to) support Project 98. I should have been more skeptical.

Given that there are at least two supported viewers for it, it seems that the answer to my three-part question is that Project 98 does not meet even Rob's definition of obsolete - a format for which no supported software that can interpret it exists.

Asking Google for MS Project98 viewers gets about 450K hits, revealing a vast array of viewers that claim to support Project 98. An early one is official "Microsoft does not supply a converter for Microsoft Project 98 or earlier versions that would allow you to open a file created in a later version of Project." So, if Rob actually meant his definition to say "a format for which no software supported by the original vendor that can interpret it exists" he would be correct that Project 98 is obsolete. But that definition of obsolescence is so restrictive as to be useless in practice.

Yet another claimed counter-example to my case falls flat.

It is still an interesting question as to why Tessella regards their Project 98 files as "trapped" despite their claims to provide "Active Preservation" for files of this kind.

Robert Sharpe said...

Hi David.
I've posted some comments on this back on our blog site (http://www.digital-preservation.com/2010/12/the-costs-of-active-preservation/).
You've mainly picked up on the example I used where, as you will see, I managed to get my versions of Project confused (it is the versions before Project 98 we can't easily read and use unsupported Project 98 to do so). I still think this is a legitimate example but, to me, the specifics of this case are less interesting than the fact that migration is a real requirement of our customers and even if this is mostly done for presentation reasons I think it should be done to the best of our ability (i.e. in a verifiable manner).
I also don't believe that this requirement imposes huge costs on the system.

David. said...

I'd like to thank Rob for participating in this debate. It is very helpful to have the various positions argused in detail. I respond at length to Rob's comment above, and to his post arguing that format migration is cheap in a new post.

I should also point out that I was mislead by Microsoft's prose style in my comment above. What they are saying is that they don't provide a way for Project 98 to open files from later versions, not that Project 98 can't open earlier files, nor that later versions cannot open Project 98 files. This is consistent with Microsoft's use of gratuitous format incompatibility as a marketing tool.

Anonymous said...

Content preservation in a cost effective way will help the libraries, data centers and big businesses. As technology is advancing, new formats are coming up and so the need for innovative digital preservation products.
Internet Based Business Opportunity