First, a few links to reinforce points that I made in the report:
One important assumption that lies behind the use of emulation for preservation is that future hardware will be much more powerful than the hardware that originally ran the preserved digital artefact. Moore's Law used to make this a no-brainer for CPU performance and memory size. Although it has recently slowed, the long time scales implicit in preservation mean that these are still a good bets. But the capabilities emulation needs to be more powerful are not limited to CPU and memory. They include the I/O resources needed for communication with the user. The report points out that this is no longer a good bet. Desktop and laptop sales are in free-fall and as The Register reports, even tablet sales have been cratering over the last year. The hardware future users will use to interact with emulations will be a smartphone. It won't have a physical keyboard and its display and pixels will be much smaller. Most current emulations are unusable on a smartphone.
Mac emulator running on an Apple watch. Nick Lee started a trend. Hacking Jules has Nintendo 64 and PSP emulators running on his Android Wear. Not, of course, that these emulated games really recreate the experience of playing on a Nintendo 64 or a PSP. But, as with Nick Lee's Mac, they show that simply running an emulation is not that hard.
One thing that surprised me during the research for the report was that retro-gaming is a $200M/yr business. It just held a convention in Portland, complete with a keynote by Al Alcorn.
Some papers at iPRES2015 addressed issues that were raised in the report:
- Functional Access to Forensic Disk Images in a Web Service. by Kam Woods et al. describe using Freiburg's emulation-as-a-service on a collection of forensic disk images.
- Characterization of CDROMs for Emulation-based Access by Klaus Rechert et al is a paper I cited in the report, thanks to a pre-print from Klaus. It describes the DNB's efforts using Freiburg's EAAS to provide access to their collection of CD-ROM images. In particular it describes an automated workflow for extracting the necessary technical metadata.
- Getting to the Bottom Line: 20 Digital Preservation Cost Questions. Cost is the single most important cause of the Half-Empty Archive. One concern the report raises is that, absent better ingest tools, the per-artefact cost of emulation is too high. Matt Schultz et al describe a resource to help institutions identify the full range of costs that might be associated with any particular digital preservation service.
- Dragan Espenscheid's beautiful poster about the Theresa Duncan CD-ROMs is worth a look.
- Releasing their code under GPLv3.
- Making Docker images of the EAAS components.
- Documenting the bootable USB image I mentioned in the report.
A paper at the recent SOSP by Nadav Amit et al entitled Virtual CPU Verification casts light on the causes and cures of fidelity failures in emulators. They observed that the problem of verifying virtualized or emulated CPUs is closely related to the problem of verifying a real CPU. Real CPU vendors sink huge resources into verifying their products, and this team from the Technion and Intel were able to base their research into X86 emulation on the tools that Intel uses to verify its CPU products.
Although QEMU running on an X86 tries hard to virtualize rather than emulate, it is capable of emulating and the team were able to force it into emulation mode. Using their tools, they were able to find and analyze 117 bugs in QEMU, and fix most of them. Their testing also triggered a bug in the VM BIOS:
But the VM BIOS can also introduce bugs of its own. In our research, as we addressed one of the disparities in the behavior of VCPUs and CPUs, we unintentionally triggered a bug in the VM BIOS that caused the 32-bit version of Windows 7 to display the so-called blue screen of death.Their conclusion is worth quoting:
Hardware-assisted virtualization is popular, arguably allowing users to run multiple workloads robustly and securely while incurring low performance overheads. But the robustness and security are not to be taken for granted, as it is challenging to virtualize the CPU correctly, notably in the face of newly added features and use cases. CPU vendors invest a lot of effort—hundreds of person years or more—to develop validation tools, and they exclusively enjoy the benefit of having an accurate reference system. We therefore speculate that effective hypervisor validation could truly be made possible only with their help. We further contend that it is in their interest to provide such help, as the majority of server workloads already run on virtual hardware, and this trend is expected to continue. We hope that open source hypervisors will be validated on a regular basis by Intel Open Source Technology Center.Having Intel validate the open source hypervisors, especially doing so by forcing them to emulate rather than virtualize, would be a big step forward. But note the focus on current uses of virtualization. To what extent the validation process would test the emulation of the hardware features of legacy CPUs important for preservation is uncertain. Though the fact that their verification caught a bug that was relevant only to Windows 7 is encouraging.