Tuesday, November 24, 2020

I Rest My Case

Jeff Rothenberg's seminal 1995 Ensuring the Longevity of Digital Documents focused on the threat of the format in which the documents were encoded becoming obsolete, and rendering its content inaccessible. This was understandable, it was a common experience in the preceeding decades. Rothenberg described two different approaches to the problem, migrating the document's content from the doomed format to a less doomed one, and emulating the software that accessed the document in a current environment.

The Web has dominated digital content since 1995, and in the Web world formats go obsolete very slowly, if at all, because they are in effect network protocols. The example of IPv6 shows how hard it is to evolve network protocols. But now we are facing the obsolescence of a Web format that was very widey used as the long effort to kill off Adobe's Flash comes to fruition. Fortunately, Jason Scott's Flash Animations Live Forever at the Internet Archive shows that we were right all along. Below the fold, I go into the details.

Preservationists inspired by Rothenberg's article siezed on migration as the only viable approach, perhaps because of emulation's greater technical challenges. They built systems that ingested content by preemptively migrating it to one of a small set of formats they assumed were unlikely to become obsolete. There were a number of problems with this "aggressive" approach, some of which I set out in the third post to this blog, Format Obsolescence: the Prostate Cancer of Preservation, such as:
Many digital preservation systems define levels of preservation; the higher the level assigned to a format, the stronger the "guarantee" of preservation the system offers. For example, PDF gets a higher level than Microsoft Word. Essentially, the greater the perceived difficulty of migrating a format, the lower the effort that will be devoted to preserving it. But the easier the format is to migrate, the lower the risk it is at. So investment, particularly in the "aggressive" approach, concentrates on the low-hanging fruit. This is neither at significant risk of loss, nor at significant risk of format obsolescence.
The idea that it was possible to assess the degree of doom a format would encounter in the future was suspect, to say the least. Right from the start of the LOCKSS Program in 1998 we disagreed with the "aggressive" approach, arguing that the most important thing was to collect and preserve the original bits, and work out how to provide access later, when it was requested. Our arguments fell on deaf ears, so in 2005 we implemented and demonstrated a technique by which on-access format migration was completely transparent to the user (see Transparent Format Migration of Preserved Web Content). This allowed the decision about the less-doomed format to be postponed until the answer was clear.

But underlying this approach was an assumption that some less-doomed format into which it was possible to migrate the doomed format without suffering catastrophic loss of information actually existed. In the case of Adobe Flash, even as obsolescence loomed, no-one identified such a format. The migration approach could only work for Flash when in 2016 Adobe Animate could convert it to HTML5 (only specified in 2014), an expensive and fragile migration. Flash content was less like a "document" in Rothenberg's sense, and more like a program.

Fortunately, as I detailed in my 2015 report Emulation and Virtualization as Preservation Strategies:
Recent developments in emulation frameworks make it possible to deliver emulations to readers via the Web in ways that make them appear as normal components of Web pages. This removes what was the major barrier to deployment of emulation as a preservation strategy.
Click image for emulation
Perhaps the most important such framework is the Internet Archive's Emularity, which injects an emulator into the reader's browser to process the preserved content. Now, Jason Scott writes:
Utilizing an in-development Flash emulator called Ruffle, we have added Flash support to the Internet Archive’s Emularity system, letting a subset of Flash items play in the browser as if you had a Flash plugin installed. While Ruffle’s compatibility with Flash is less than 100%, it will play a very large portion of historical Flash animation in the browser, at both a smooth and accurate rate.

We have a showcase of the hand-picked best or representative Flash items in this collection. If you want to try your best at combing through a collection of over 1,000 flash items uploaded so far, here is the link.

You will not need to have a flash plugin installed, and the system works in all browsers that support Webassembly.
The fact that it is now possible to access preserved Flash content is important, especially for the history of the Web. As Scott writes:
From roughly 2000 to 2005, Flash was the top of the heap for a generation of creative artists, animators and small studios. Literally thousands and thousands of individual works were released on the web. Flash could also be used to make engaging menu and navigation systems for webpages, and this was used by many major and minor players on the Web to bring another layer of experience to their users.
This period was the height of Flash. Nearly every browser could be expected to have a “Flash Plugin” to make it work, thousands of people were experimenting with Flash to make art and entertainment, and an audience of millions, especially young ones, looked forward to each new release.
Unfortunately, the misplaced priorities resulting from migration-based preservation strategies meant that formal preservation systems mostly failed to preserve Flash content, so much has probably been lost. But kudos to everyone who worked on making it possible to experience this important period in the development of the Web.


Unknown said...

Thank you!

Perserve always and proactively, Migrate when you can, emulate when you have to... and we have to emulate with flash that is more than just a movie, as you suggested.

But keeping the original bits is key.

Another motto I have found to be true "Access drives preservation"

Unknown said...

In my opinion, the success of the Internet Archive and Ruffle will depend largely on their ability to support AVM2. I'm rooting for them, but in the meantime, I'm not confident that emulation is a more viable preservation strategy for AS3 content than migration.

Unknown said...

fyi: the Perserve always and proactively, Migrate when you can, emulate when you have to... post above is from brewster kahle of the internet archive. blogger put the author as Unknown.