DSHR's Blog: International Digital Preservation Day

The Digital Preservation Coalition's International Digital Preservation Day was marked by a wide-ranging collection of blog posts. Below the fold, some links to and comments on, a few of them.

Susan Reilly's We need to talk about copyright makes good points about the importance of copyright for preservation, in the context of Article 5 of the proposed EU Directive on Copyright in the Digital Single Market:

This article recognises the fact that a single copy is not sufficient for digital preservation and that it is necessary to multiple copies. It also allows for format shifting. The proposed directive also has a provision on technical protection measures but does not go far enough in providing a mechanism for recourse should rights owners not cooperate in allowing cultural heritage institutes to circumvent these measures for the purpose of preservation.

I agree that "recourse" would be good were it practical, but the idea of libraries suing publishers to obtain access isn't. Lets talk David vs. Goliath in terms of resources. A more realistic approach is to recognize that DRM technologies will be cracked, and to provide libraries with immunity for using tools from the "dark web" to remove DRM.

Euan Cochrane's The Emergence of "Digital Patinas" is a great argument for emulation:

As software preservation and emulation is becoming more readily accessible, thanks to the work of the KEEP project, the Internet Archive, the bwFLA project, and many others, we’re beginning to see the emergence of a phenomenon whereby digital objects are displaying something that seems strikingly similar to physical patinas. Something perhaps best described as a “digital patina”.

I especially like the example of the infuriating Clippy.

I strongly disagree with Duff Johnson's The only archival digital format. He writes:

PDF’s purpose is to be a document, with all that implies (see above). But that’s not the purpose of HTML. HTML isn’t a document, it’s an experience. HTML is about making and consuming; PDF is how you keep it, and PDF/A is how you keep it forever (preserving the file’s actual bytes, of course, is up to you).

There are at least three big problems with this. First, PDF/A does not preserve all aspects of a document; PDF has many document capabilities that PDF/A excludes. Second, but more important, digital preservation is about preserving the experience! Documents are a small and decreasing proportion of the digital content that needs to be preserved. Third, the idea that the choice of a format is what digital preservation is all about might have been true two decades ago, but it is a red herring today.

David Minor's What we’ve done well, and some things we still need to figure out ends with a point I've made repeatedly:

Funding. Yes. Of course. Funding. Funding funding funding. This is the largest single mountain we still have to climb. Digital preservation, done correctly, is expensive. It just is. And it’s not a problem that technology is going to solve. Or some new whiz bang economic theory that makes sense to twelve special people. It’s only going to cease being a problem when the people who care about their precious bits fully understand why it’s expensive, and make the commitment to support it. This is the ur-issue for our field, and has been since the beginning.

Eld Zierau's Bit Preservation is NOT a Question of Technology! raises the other big non-technical aspect of preservation, organizational. She stresses the need for auditing to ensure that organizations are actually delivering on the contracts they enter into:

The experience in Denmark is that we cannot rely on audit certifications. There needs to be specific audits that ensure that the contracts are followed, but also to sharpen contracts in cases where we have discovered risks that were not covered before (for example, time lap from data arriving to an offline tape replica unit until it is finally written and securely locked away). The experience is also that many organisations are reluctant to allow audits to be performed in their organisation.

Which, of course, raises the issue of the technology to support auditing.

Richard Wright's The Future Of Television Archives makes two important points:

While drama and entertainment programmes are kept for repeats and for sale to other countries, factual content is heavily recycled to add depth and interest to current programmes. In the BBC, about 30 to 40 percent of 'the news' is actually archive material. ... Up to 2010, about 20% of the BBC television archive was accessed each year, and 95% of that use was internal: back into the BBC for adding depth to new programmes. The other 5% was commercial use.

And, of particular importance for efforts such as the Internet Archive's TV collection:

Off-air recordings are fine for viewing copies, but have real quality limitations when it comes to re-purposing the content for new programmes. The video signal for satellite transmission is compressed by a factor of 10 to 20. This is lossy compression, meaning original quality is not recoverable. ...

Who cares? The future will care. Lossy compression today leads to 'cascaded compression' in the future, when material is recoded to new standards. Decades of experience show that there is a great risk when cascading: eventually there will be significant failures. ... transcoding errors and cascaded quality loss are the time bomb ticking in all archives containing content with lossy compression – meaning all off-air archives.

In addition, professional TV archives no longer get as much master material as they used to. In 1980 the BBC made about 90% of its output in-house, so the archive could get 'the master tape'. Now that figure has been cut to 30%. ... The future of master quality (production quality) video content is very much in doubt.

Yvonne Tunnat's Plans are my reality is the sad story of someone for whom reading a post on my blog from 2009 could have saved a whole lot of work:

My preservation plan was as following:

Gather all bad PDF

Migrate them to good PDF

Check if they still look alike

The fact that "bad" and "good" PDFs looked the same is an example of Postel's Law, especially since:

The bad-to-good migration tool built by my co-worker turned out to be too basic, just putting the bad PDF pages into a new PDF, which then would be considered ok by JHOVE, which only checks the overall structure.

DSHR's Blog

Tuesday, December 5, 2017

International Digital Preservation Day

No comments: