DSHR's Blog: IPRES 2017

Much as I love Kyoto, now that I'm retired with daily grandparent duties (and no-one to subsidize my travel) I couldn't attend iPRES 2017.

I have now managed to scan both the papers, and the very useful "collaborative notes" compiled by Micky Lindlar, Joshua Ng, William Kilbride, Euan Cochrane, Jaye Weatherburn and Rachel Tropea (thanks!). Below the fold I have some notes on the papers that caught my eye.

I have appreciated the Dutch approach to addressing problems ever since the late 70s, when I worked with Paul ten Hagen and Rens Kessner on the Graphical Kernel System standard. This approach featured in two of the papers:

How the Dutch prepared for certification by Barbara Sierman and Kees Waterman describes how six large cultural heritage organizations worked together to ease each of their paths up the hierarchy of repository certification from DSA to Nestor. The group added two preparatory stages before DSA (Initial Self-Assessment, and Exploratory Phase), comprising activities that I definitely recommend as a starting point. They also translated the DSA and Nestor standards into Dutch, enhanced some of the available tools, and conducted surveys and awareness-raising.
A Dutch approach in constructing a network of nationwide facilities for digital preservation together by Joost van der Nat and Marcel Ras reported that:

In November 2016, the NCDD research on the construction of a cross-domain network of facilities for long-term access to digital Cultural Heritage in the Netherlands was rewarded the Digital Preservation Award 2016 in the category Research and Innovation. According to the judges the research report presents an outstanding model to help memory institutes to share facilities and create a distributed, nationwide infrastructure network for Digital Preservation.
The NCDD didn't go all-out for either centralization or distribution, but set out to find the optimum balance for infrastructure spanning diverse institutions:

Under the motto “Joining forces for our digital memory”, a research project was started in 2014 ... This project had the purpose to find out what level of differentiation between the domains offers the best balance for efficiency. Without collaboration, inefficiencies loom, while individual institutes continue to expand their digital archives and may be reinventing the same wheel over and over again. The project’s objective was and is to avoid duplication of work, and to avoid wasting time, money, and energy. Economies of scale make it easier for the many smaller Dutch institutes to profit from available facilities, services, and expertise as well. Policy makers can now ponder the question “The same for less money, or more for the same money?”.

I've blogged before about the important work of the Software Heritage Foundation. Software Heritage: Why and How to Preserve Software Source Code by Roberto Di Cosmo and Stefano Zacchiroli provides a comprehensive overview of their efforts. I'm happy to see them making two justifications for preserving open-source software that I've been harping on for years:

Source code is clearly starting to be recognized as a first class citizen in the area of cultural heritage, as it is a noble form of human production that needs to be preserved, studied, curated, and shared. Source code preservation is also an essential component of a strategy to defend against digital dark age scenarii in which one might lose track of how to make sense of digital data created by software currently in production.

But they also provide other important justifications, such as these two:

First, Software Heritage intrinsic identifiers can precisely pinpoint specific software versions, independently of the original vendor or intermediate distributor. This de facto provides the equivalent of “part numbers” for FOSS components that can be referenced in quality processes and verified for correctness ....

Second, Software Heritage will provide an open provenance knowledge base, keeping track of which software component - at various granularities: from project releases down to individual source files — has been found where on the Internet and when. Such a base can be referenced and augmented with other software-related facts, such as license information, and used by software build tools and processes to cope with current development challenges.

Considering Software Heritage's relatively short history the coverage statistics in Section 9 of the paper are very impressive, illustrating the archive-friendly nature of open-source code repositories.

Emulation featured in two papers:

Adding Emulation Functionality to Existing Digital Preservation Infrastructure by Euan Cochrane, Jonathan Tilbury and Oleg Stobbe is a short paper describing how Yale University Library (YUL) interfaced bwFLA, Freiburg's emulation-as-a-service infrastructure to their Preservica digital preservation system. The goal is to implement their policy:

YUL will ensure access to hardware and software dependencies of digital objects and emulation or virtualization tools by [...] Preserving, or providing access to preserved software (applications and operating systems), and pre-configured software environments, for use in interacting with digital content that depends on them.
Yale is doing important work making Feiburg's emulation infrastructure easy-to-use in libraries.
Trustworthy and Portable Emulation Platform for Digital Preservation by Zahra Tarkhani, Geoffrey Brown and Steven Myers:

provides a technological solution to a fundamental problem faced by libraries and archives with respect to digital preservation — how to allow patrons remote access to digital materials while limiting the risk of unauthorized copying. The solution we present allows patrons to execute trusted software on an untrusted platform; the example we explore is a game emulator which provides a convenient prototype to consider many fundamental issues.
Their solution depends on Intel's SGX instruction set extensions, meaning it will work only on Skylake and future processors. I would expect it to be obsoleted by the processor-independent, if perhaps slightly less bullet-proof, W3C Encrypted Media Extensions (EME) available in all major browsers. Of course, if SGX is available, implementations of EME could use it to render the user even more helpless.

Always on the Move: Transient Software and Data Migrations by David Wilcox is a short paper describing the import/export utility developed to ease the data migration between versions 3 and 4 of Fedora. This has similarities with the IMLS-funded WASAPI web archive interoperability work with which the LOCKSS Program is involved.

Although they caught my eye, I have omitted here two papers on identifiers. I plan a future post about identifiers into which I expect they will fit:

Permanence of the Scholarly Record: Persistent Identification and Digital Preservation – A Roadmap by Angela Dappert and Adam Farquhar.
Getting Persistent Identifiers Implemented By ‘Cutting In The Middle-Man’ by Remco van Veenendaal, Marcel Ras and Marie Claire Dangerfield.

DSHR's Blog

Tuesday, October 10, 2017

IPRES 2017

No comments: