Friday, January 28, 2011

Threats to preservation

More than 5 years ago we published the LOCKSS threat model, the set of threats to preserved content against which the LOCKSS system was designed to preserve content. We encouraged other digital preservation systems to do likewise; it is hard to judge how effective systems are in achieving their goal of preserving content unless you know what they are intended to preserve content against. We said:
We concur with the recent National Research Council recommendations to the National Archives that the designers of a digital preservation system need a clear vision of the threats against which they are being asked to protect their system's contents, and those threats under which it is acceptable for preservation to fail.
I don't recall any other system rising to the challenge; I'd be interested in any examples of systems that have documented their threat model that readers could provide in comments.

This lack of clarity as to the actual threats involved is a major reason for the misguided focus on format obsolescence that consumes such a large proportion of digital preservation attention and resources. As I write this two ongoing examples illustrate the kinds of real threats attention should be focused on instead.

In an attempt to damp down anti-government protests, the Egyptian government shut down the Internet in their country. One copy of the Internet Archive's Wayback Machine is hosted at the Bibliotheca Alexandrina. As I write it is accessible, but the risk is clear. But, you say, the US government would never do such a thing, so the Internet Archive is quite safe. Think again. Senators Joe Lieberman and Susan Collins are currently pushing a bill, the Protecting Cyberspace as a National Asset Act of 2010, to give the US government the power to do exactly that whenever it feels like doing so.

Also as I write this SourceForge is unavailable, shut down in the aftermath of a compromise. The LOCKSS software, in common with many other digital preservation technologies, is preserved in SourceForge's source code control system. Other systems essential to digital preservation use one of a small number of other similar repositories. When SourceForge comes back up, we will have to audit the copy it contains of our source code against our backups and working copies to be sure that the attackers did not tamper with it.

I have argued for years, again with no visible effect, that national libraries should preserve these open source repositories. Not merely because, as the SourceForge compromise illustrates, their contents are the essential infrastructure for much of digital preservation, and that there are no economic, technical or legal barriers to doing so, but even more importantly they are major cultural achievements, just as worthy of future scholar's attention as books, movies and even tweets.


David. said...

In this context, I should also point to Dan Geer's wonderful essay.

Matt said...

I agree that format obsolescence should not be the principal focus of digital preservation efforts For the last 2 years, I've been working on a project called "Digital Continuity":

This project is looking at issues of short to medium term digital information management in active business use (rather than archival).

Although we began with the classic format obsolescence horror stories, we have shifted our focus towards governance and information risk management, with strong links to the information assurance and security world.

David. said...

Another close call as a Flickr employee screws up and deletes a customer's pictures. Fortunately, Flickr managed to restore them and issue a groveling apology. The Register draws the moral of the story.