Tuesday, November 3, 2015

Emulation & Virtualization as Preservation Strategies

I'm very grateful that funding from the Mellon Foundation on behalf of themselves, the Sloan Foundation and IMLS allowed me to spend much of the summer researching and writing a report, Emulation and Virtualization as Preservation Strategies (37-page PDF, CC-By-SA). I submitted a draft last month, it has been peer-reviewed and I have addressed the reviewers comments. It is also available on the LOCKSS web site.

I'm old enough to know better than to give a talk with live demos. Nevertheless, I'll be presenting the report at CNI's Fall membership meeting in December complete with live demos of a number of emulation frameworks. The TL;DR executive summary of the report is below the fold.

Recent developments in emulation frameworks make it possible to deliver emulations to readers via the Web in ways that make them appear as normal components of Web pages. This removes what was the major barrier to deployment of emulation as a preservation strategy. Barriers remain, the two most important are that the tools for creating preserved system images are inadequate, and that the legal basis for delivering emulations is unclear, and where it is clear it is highly restrictive. Both of these raise the cost of building and providing access to a substantial, well-curated collection of emulated digital artefacts beyond reach.

If these barriers can be addressed, emulation will play a much greater role in digital preservation in the coming years. It will provide access to artefacts that migration cannot, and even assist in migration where necessary by allowing the original software to perform it. The evolution of digital artefacts means that current artefacts are more difficult and expensive to collect and preserve than those from the past, and less suitable for migration. This trend is expected to continue.

Emulation is not a panacea. Technical, scale and intellectual property difficulties make many current digital artefacts infeasible to emulate. Where feasible, even with better tools and a viable legal framework, emulation is more expensive than migration-based strategies. The most important reason for the failure of current strategies to collect and preserve the majority of their target material is economic; the resources available are inadequate. The bulk of the resources expended on both migration and emulation strategies are for ingest, especially metadata generation and quality assurance. There is a risk that diverting resources to emulation, with its higher per-artefact ingest cost, will exacerbate the lack of resources.

Areas requiring further work if emulation is to achieve its potential as a preservation strategy include:
  • Standardization of the format of preserved system images, the way they are obtained by emulators, and the means by which emulations of them are exposed to readers. This would enable interoperability between emulation components, aiding contributions and support from the open-source community.
  • Improvements to the tools for associating technical metadata with preserved software to enable it to be emulated, and the technical metadata databases upon which they depend. This would reduce the cost of preserved system images.
  • Clarification, and if possible relaxation, of the legal constraints on the creation and provision of access to collections of preserved system images. This would encourage institutions to collect software.

No comments: