I think the two most useful things I can do this morning are:
- A quick run-down of developments I'm aware of since the report came out.
- A summary of the key problem areas and recommendations from the report.
EmulatorsFirst, the emulators themselves. Reports of new, enthusiast-developed emulators continue to appear. Among recent ones are:
- An emulator for the CDC6400 and much of the Cyber series, but alas not yet for the CDC6600 on which I did my Ph.D.
- Warren Toomey is working to revive Unix on the PDP-7, the machine on which I learnt computer graphics.
- Seth Morabito, who used to work with the LOCKSS Program, is working on an emulator for the AT&T 3B2, a PC-sized 32-bit machine built by AT&T starting in 1983 that ran Unix. It used technology from the duplex fault-tolerant computer at the heart of phone switches such as 5ESS. It was the machine on which the Unix System V reference implementation ran. I spent much time at Sun Microsystems merging SunOS and System V into System V Release 4.0.
Although QEMU running on an X86 tries hard to virtualize rather than emulate, it is capable of emulating and the team were able to force it into emulation mode. Using their tools, they were able to find and analyze 117 bugs in QEMU, and fix most of them. Their testing also triggered a bug in the VM BIOS:
But the VM BIOS can also introduce bugs of its own. In our research, as we addressed one of the disparities in the behavior of VCPUs and CPUs, we unintentionally triggered a bug in the VM BIOS that caused the 32-bit version of Windows 7 to display the so-called blue screen of death.Having Intel validate the open source hypervisors, especially doing so by forcing them to emulate rather than virtualize, would be a big step forward. To what extent the validation process would test the emulation of the hardware features of legacy CPUs important for preservation is uncertain, though the fact that their verification caught a bug that was relevant only to Windows 7 is encouraging.
QEMU is supported via the Software Freedom Conservancy. It supported Christopher Hellwig's lawsuit against VMware for GPL violations. As a result the Conservancy is apparently seeing corporate support evaporate, placing its finances in jeopardy.
The report discusses the problems GPUs pose for emulation and the efforts to provide paravirtualized GPU support in QEMU. This limited but valuable support is now mainstreamed in the Linux 4.4 kernel.
Mozilla among others has been working to change the way in which Web pages are rendered in the browser to exploit the capabilities of GPUs. Their experimental "servo" rendering engine gains a huge performance advantage by doing so. For us, this is a double-edged sword. It makes the browser dependent on GPU support in a way it wasn't before, and thus makes the task of browser emulations such as oldweb.today harder. If, on the other hand, it means that GPU capabilities will be exposed to WebAssembly, it raises the prospect of worthwhile GPU-dependent emulations running in browsers, further reducing the barrier to entry.
CollectionsThird, the collections. The Internet Archive has continued to release collections of legacy software using Emularity. The Malware Museum, a collection of currently 47 viruses from the '80s and '90s, has proven very popular, with over 850K views in about 6 weeks. The Windows 3.X Showcase, a curated sample of the over 1500 Windows emulations in the collection, has received 380K views in the same period. It is particularly interesting because it includes a stock install of Windows 3.11. Despite that the team has yet to receive a takedown request from Microsoft.
About the same time as my report, a team at Cornell led by Oya Rieger and Tim Murray produced a white paper for the National Endowment for the Humanities entitled Preserving and Emulating Digital Art Objects. I blogged about it. To summarize my post, I believe that outside their controlled "reading room" conditions the concern they express for experiential fidelity is underestimated, because smartphones and tablets are rapidly replacing PCs. But two other concerns, for emulator obsolescence and the fidelity of access to web resources, are overblown.
ToolsFourth, the tools. The Internet Archive has a page describing how DOS software to be emulated can be submitted. Currently about 65 submissions a day are being received, despite the somewhat technical process it lays out. Each is given minimal initial QA to ensure that it comes up, and is then fed into the crowd-sourced QA process described in the report. It seems clear that improved tooling, especially automating the process via an interactive Web page that ran the emulation locally before submission, would result in more and better quality submissions.
Internet of ThingsThe Internet of Things has been getting a lot of attention, especially the catastrophic state of IoT security. Updating the software of Things in the Internet to keep them even marginally secure is often impossible because the Things are so cheap there are no dollars for software support and updates, and because customers have no way to tell that one device is less insecure than another. This is exactly the problem faced by preserved software that connects to the Internet, as discussed in the report. Thus efforts to improve the security of the IoT and efforts such as Freiburg's to build an "Internet Emulator" to protect emulations of preserved software may be highly synergistic.
Off on a tangent, it is worth thinking about the problems of preserving the Internet of Things. The software and hardware are intimately linked, even more so than smartphone apps. So does preserving the Internet of Things reduce to preserving the Things in the Internet, or does emulation have a role to play?
The To-Do ListTo refresh your memories, here are the highlights of the To-Do List that ends the report, with some additional commentary. I introduce the list by pointing out the downsides of the lack of standardization among the current frameworks, in particular:
- There will be multiple emulators and emulation frameworks, and they will evolve through time. Re-extracting or re-packaging preserved artefacts for different, or different versions of, emulators or emulation frameworks would be wasted effort.
- The most appropriate framework configuration for a given user will depend on many factors, including the bandwidth and latency of their network connection, and the capabilities of their device. Thus the way in which emulations are advertised to users, for example by being embedded in a Web page, should not specify a particular framework or configuration; this should be determined individually for each access.
If the access paths to the emulations link directly to evanescent services emulating the preserved artefacts, not to the artefacts themselves, the preserved artefacts are not themselves discoverable or preservable.In summary, the To-Do list was:
- Standardize Preserved System Images so that the work of preparing preserved system images for emulation will not have to be redone repeatedly as emulation technology evolves, and
- Standardize Access To System Images and
- Standardize Invoking Emulators so that the work of presenting emulations of preserved system images to the "reader" will not have to be redone repeatedly as emulation technology evolve.
- Improve Tools For Preserving System Images: The Internet Archive's experience shows that even minimal support for submission of system images can be effective. Better support should be a high priority. If the format of system images could be standardized, submissions would be available to any interested archive.
- Enhance Metadata Databases: these tools, and standardized methods for invoking emulators, rely on metadata database, which need significant enhancement for this purpose.
- Support Emulators: The involvement of Intel in QA-ing QEMU is a major step forward, but it must be remembered that most emulations of old software depend on enthusiast-supported emulators such as MAME/MESS. Supporting ways to improve emulator quality, such as for example external code reviews to identify critical quality issues, and a "bounty" program for fixing them, should be a high priority. It would be important that any such program be "bottom-up"; a "top-down" approach would not work in the enthusiast-dependent emulator world.
- Develop Internet Emulators: oldweb.today is already demonstrating the value of emulating software that connects to the Internet. Doing so carries significant risks, and developing technology to address them (before the risks become real and cause a backlash) needs high priority. The synergies between this and the security of the Internet of Things should be explored urgently.
- Tackle Legalities: As always, the legal issues are the hardest to address. I haven't heard that the PERSIST meeting in Paris last November came up with any new ideas in this area. The lack of a reaction to the Internet Archive's Windows 3.X Showcase is encouraging, and I'm looking forward to hearing whether others have made progress in this area.