On the basis of the report I wrote on Emulation and Virtualization as Preservation Strategies two years ago, I was asked to give a brief talk today. That may have been a mistake; I retired almost a year ago and I haven't been following developments in the field closely. But I'll do my best and I'm sure you will let me know where I'm out-of-date. As usual, you don't need to take notes, the text of what follows with links to the sources will go up on my blog at the end of this session.
I don't have much time, so I'll just cover a few technical, legal and business points, then open it up for questions (and corrections).
TechnicalFirst, a point I've been making for a long time. Right now, we're investing resources into building emulations to make preserved software accessible. But we're not doing it in a preservable way. We're wrapping the hardware metadata and the system image in a specific emulator, probably bwFLA or Emularity. This is a bad idea because emulation technology will evolve, and maybe your collaborators want to use a different technology now.
I wrote a detailed post about this a year ago, using the analogy of PDF. We didn't wrap PDFs in Adobe Reader, and then have to re-wrap them all in pdf.js. We exposed the metadata and the content so that, at rendering time, the browser could decide on the most appropriate renderer. I wrote:
- A metadata MimeType, say Emulation/MachineSpec, that describes the architecture and configuration of the hardware, which links to one or more resources of:
- A disk image MimeType, say DiskImage/qcow2, with the contents of each of the disks.
Internet Archive's emulation is impressive, but the UI is nothing like the knobs and switches I used back in the day, let alone the huge round calligraphic CRT. As the desktop and laptop gradually die out, the set of controls available to "emul.js" become sparser, and more different from the originals. This could be an issue for a collaboration in which, for example, one partner wanted a kiosk for access and another wanted a phone.
VisiCalc for the Apple ][ is usable in emulation only because Dan Bricklin put the reference card up on his website. I defy anyone trained on Excel to figure it out without the card. Games are supposed to be "self-teaching" but this relies on a lot of social context that will be quite different a half-century later. How is this contextual metadata to be collected, preserved and presented to the eventual user?
Familiarity Breeds Contempt: The Honeymoon Effect and the Role of Legacy Code in Zero-Day Vulnerabilities by Sandy Clarke, Matt Blaze, Stefan Frei and Jonathan Smith, you really need to. They show that the rate of detection of vulnerabilities in software goes up with time. Exploits for these vulnerabilities on the Internet never go away. Preserved software will have vast numbers of vulnerabilities that are under active exploitation on the Internet. And, worse, it will still have zero-days waiting to be discovered. Adequately firewalling preserved software from the net will be extremely difficult, especially in a collaborative setting where partners want to access it over the net.
LegalWhich brings me to my first legal point. Note that I am not a lawyer, and that the following does not apply to preserving open-source software (thank you, Software Heritage for stepping up to do so).
Software makers disclaim liability for the vulnerabilities in their products. Users in effect disclaim liability for vulnerabilities in things they connect to the net by claiming that they follow "best practice" by keeping them patched up-to-date. We're going to be treading new legal ground in this area, because Spectre is a known vulnerability that cannot be patched in software, and for which there will not be fixed hardware for some time. Where does the liability for the inevitable breaches end up? With the cloud provider? With the cloud user who was too cheap to pay for dedicated hosts?
Archives are going to connect known-vulnerable software that cannot be patched to the net. The firewall between the software and the net will disclaim liability. Are your lawyers happy that you can disclaim liability for the known risk that your preserved software will be used to attack someone? And are your sysadmins going to let you connect known-vulnerable systems to their networks?
Of course, the two better-known legal minefields surrounding old software are copyright and the End User License Agreement (EULA). I wrote about both in my report. In both areas the law is an ass; on a strict interpretation of the law almost anyone preserving closed-source software would incur criminal liability under the DMCA and civil liability under both copyright and contract law. Further, collaborations to preserve software would probably be conspiracies in criminal law, and the partners might incur joint and several liability under civil law. All it would take would be a very aggressive prosecutor or plaintiff and the wrong judge to make these theoretical possibilities real enough to give your institution's lawyers conniptions.
Copyright and EULAs apply separately to each component of the software stack, and to each partner in a collaboration. Games are typically a simple case, with only the game and an OS in the stack. But even then, a three-partner collaboration ends up with a table like this:
|OS||C E||C E||C E|
|Game||C ||C E|
There's a reason why in the LOCKSS system that Vicky Reich & I designed nearly two decades ago each library's LOCKSS box gets its own copy of the journals from the publisher under its own subscription agreement. Subsequently, the box proves to other boxes that they have the same content, so that copying content from one to another to repair damage isn't creating a new, leaked copy. In a collaboration where one partner is responsible for ingesting content, and providing access to the other partners, new leaked copies are being created, potentially violating copyright, and used, potentially violating the EULA.
Even if the law were both clear and sensible, this would be tricky. As things are it is too hard to deal with. In the meantime, the best we can do is to define, promulgate, and adhere to reasonable "best practices". And to lobby the Copyright Office for exemptions, as the MADE (Museum of Arts and Digital Entertainment) and Public Knowledge are doing. They are asking for an exemption to allow museums and archives to run servers for abandoned online multi- and single-player games.
BusinessIn my panel presentation at the 2016 iPRES I said:
Is there a business model to support emulation services? This picture is very discouraging. Someone is going to have to pay the cost of emulation. As Rhizome found when the Theresa Duncan CD-ROMs went viral, if the end user doesn't pay there can be budget crises. If the end user does pay, its a significant barrier to use, and it starts looking like it is depriving the vendor of income. Some vendors might agree to cost-recovery charging. But typically there are multiple vendors involved. Consider emulating an old Autodesk on Windows environment. That is two vendors. Do they both agree to the principle of cost-recovery, and to the amount of cost-recovery?Somehow, all three phases of preservation need to be paid for:
- Ingest: is expensive and, obviously, has to be paid up-front. That makes it look like a capital investment. The return on the investment will be in the form of future accesses, which are uncertain. This makes it hard to justify, leading to skimping on the expensive parts, such as metadata generation, which leads to less access, which leads to less justification for ingest.
- Preservation: may not be that expensive each year, but it is an on-going cost whose total is hard to predict.
- Dissemination: is typically not expensive but it can spike if the content gets popular, as it did with Rhizome's Theresa Duncan CDs. You only discover the cost after its too late to plan for it.