Thursday, February 22, 2018

Brief Talk at Video Game Preservation Workshop

I was asked to give a brief talk to the Video Game Preservation Workshop: Setting the Stage for Multi-Partner Projects at the Stanford Library, discussing the technical and legal aspects of cooperation on preserving software via emulation. Below the fold is an edited text of the talk with links to the sources.

On the basis of the report I wrote on Emulation and Virtualization as Preservation Strategies two years ago, I was asked to give a brief talk today. That may have been a mistake; I retired almost a year ago and I haven't been following developments in the field closely. But I'll do my best and I'm sure you will let me know where I'm out-of-date. As usual, you don't need to take notes, the text of what follows with links to the sources will go up on my blog at the end of this session.

I don't have much time, so I'll just cover a few technical, legal and business points, then open it up for questions (and corrections).

Technical

First, a point I've been making for a long time. Right now, we're investing resources into building emulations to make preserved software accessible. But we're not doing it in a preservable way. We're wrapping the hardware metadata and the system image in a specific emulator, probably bwFLA or Emularity. This is a bad idea because emulation technology will evolve, and maybe your collaborators want to use a different technology now.

I wrote a detailed post about this a year ago, using the analogy of PDF. We didn't wrap PDFs in Adobe Reader, and then have to re-wrap them all in pdf.js. We exposed the metadata and the content so that, at rendering time, the browser could decide on the most appropriate renderer. I wrote:
The linked-to object that the browser obtains needs to describe the hardware that should be emulated. Part of that description must be the contents of the disks attached to the system. So we need two MimeTypes:
  • A metadata MimeType, say Emulation/MachineSpec, that describes the architecture and configuration of the hardware, which links to one or more resources of:
  • A disk image MimeType, say DiskImage/qcow2, with the contents of each of the disks.
Then the browser can download "emul.js", JavaScript that can figure out how to configure and invoke the appropriate emulator for that particular rendering.

Second, one of the things "emul.js" is going to have to work out is how to emulate the UI devices of the original hardware, and how to communicate this mapping to the user. Almost a half-century ago I was quite good at SpaceWar on the PDP7. The Internet Archive's emulation is impressive, but the UI is nothing like the knobs and switches I used back in the day, let alone the huge round calligraphic CRT. As the desktop and laptop gradually die out, the set of controls available to "emul.js" become sparser, and more different from the originals. This could be an issue for a collaboration in which, for example, one partner wanted a kiosk for access and another wanted a phone.

Third, the eventual user needs more than just some keymappings. VisiCalc for the Apple ][ is usable in emulation only because Dan Bricklin put the reference card up on his website. I defy anyone trained on Excel to figure it out without the card. Games are supposed to be "self-teaching" but this relies on a lot of social context that will be quite different a half-century later. How is this contextual metadata to be collected, preserved and presented to the eventual user?

Fourth, network access by preserved software poses a lot of very difficult problems. If you haven't read Familiarity Breeds Contempt: The Honeymoon Effect and the Role of Legacy Code in Zero-Day Vulnerabilities by Sandy Clarke, Matt Blaze, Stefan Frei and Jonathan Smith, you really need to. They show that the rate of detection of vulnerabilities in software goes up with time. Exploits for these vulnerabilities on the Internet never go away. Preserved software will have vast numbers of vulnerabilities that are under active exploitation on the Internet. And, worse, it will still have zero-days waiting to be discovered. Adequately firewalling preserved software from the net will be extremely difficult, especially in a collaborative setting where partners want to access it over the net.

Legal

Which brings me to my first legal point. Note that I am not a lawyer, and that the following does not apply to preserving open-source software (thank you, Software Heritage for stepping up to do so).

Software makers disclaim liability for the vulnerabilities in their products. Users in effect disclaim liability for vulnerabilities in things they connect to the net by claiming that they follow "best practice" by keeping them patched up-to-date. We're going to be treading new legal ground in this area, because Spectre is a known vulnerability that cannot be patched in software, and for which there will not be fixed hardware for some time. Where does the liability for the inevitable breaches end up? With the cloud provider? With the cloud user who was too cheap to pay for dedicated hosts?

Archives are going to connect known-vulnerable software that cannot be patched to the net. The firewall between the software and the net will disclaim liability. Are your lawyers happy that you can disclaim liability for the known risk that your preserved software will be used to attack someone? And are your sysadmins going to let you connect known-vulnerable systems to their networks?

Of course, the  two better-known legal minefields surrounding old software are copyright and the End User License Agreement (EULA). I wrote about both in my report. In both areas the law is an ass; on a strict interpretation of the law almost anyone preserving closed-source software would incur criminal liability under the DMCA and civil liability under both copyright and contract law. Further, collaborations to preserve software would probably be conspiracies in criminal law, and the partners might incur joint and several liability under civil law. All it would take would be a very aggressive prosecutor or plaintiff and the wrong judge to make these theoretical possibilities real enough to give your institution's lawyers conniptions.

Copyright and EULAs apply separately to each component of the software stack, and to each partner in a collaboration. Games are typically a simple case, with only the game and an OS in the stack. But even then, a three-partner collaboration ends up with a table like this:

Partners' Rights

Partners

A B C
OSC EC EC E
GameC EC EC E
There are only a few closed-source OS left, so its likely that all partners have a site license for the OS, giving them cover under both copyright and EULA for what they want to do. But suppose partner A is a national library that acquired the game under copyright deposit, and thus never agreed to an EULA. Partner C purchased the game, so is covered under both (depending on the terms of the EULA). But partner B got the game in the donated papers of a deceased faculty member.

There's a reason why in the LOCKSS system that Vicky Reich & I designed nearly two decades ago each library's LOCKSS box gets its own copy of the journals from the publisher under its own subscription agreement. Subsequently, the box proves to other boxes that they have the same content, so that copying content from one to another to repair damage isn't creating a new, leaked copy. In a collaboration where one partner is responsible for ingesting content, and providing access to the other partners, new leaked copies are being created, potentially violating copyright, and used, potentially violating the EULA.

Even if the law were both clear and sensible, this would be tricky. As things are it is too hard to deal with. In the meantime, the best we can do is to define, promulgate, and adhere to reasonable "best practices". And to lobby the Copyright Office for exemptions, as the MADE (Museum of Arts and Digital Entertainment) and Public Knowledge are doing. They are asking for an exemption to allow museums and archives to run servers for abandoned online multi- and single-player games.

Business

In my panel presentation at the 2016 iPRES I said:
Is there a business model to support emulation services? This picture is very discouraging. Someone is going to have to pay the cost of emulation. As Rhizome found when the Theresa Duncan CD-ROMs went viral, if the end user doesn't pay there can be budget crises. If the end user does pay, its a significant barrier to use, and it starts looking like it is depriving the vendor of income. Some vendors might agree to cost-recovery charging. But typically there are multiple vendors involved. Consider emulating an old Autodesk on Windows environment. That is two vendors. Do they both agree to the principle of cost-recovery, and to the amount of cost-recovery?
Somehow, all three phases of preservation need to be paid for:
  • Ingest: is expensive and, obviously, has to be paid up-front. That makes it look like a capital investment. The return on the investment will be in the form of future accesses, which are uncertain. This makes it hard to justify, leading to skimping on the expensive parts, such as metadata generation, which leads to less access, which leads to less justification for ingest.
  • Preservation: may not be that expensive each year, but it is an on-going cost whose total is hard to predict.
  • Dissemination: is typically not expensive but it can spike if the content gets popular, as it did with Rhizome's Theresa Duncan CDs. You only discover the cost after its too late to plan for it.
Most content will not be popular, and this is especially true if it is pay-per-view, thus most content is unlikely to earn enough to pay for its ingest and preservation. Taking into account the legal complexities, charging for access doesn't seem like a viable business model for software archives. But what other sustainable business models are there?

4 comments:

David. said...

"the Entertainment Software Association, ... has come out in opposition to preserving online games, arguing that such preservation is a threat to the industry." reports Timothy Geigner at TechDirt, before shredding their arguments:

"the final complaint about these museums making these games playable when gaming companies normally charge for them is nearly enough to make one's head explode. Museums like MADE wouldn't be preserving these games except for the fact that these games are no longer operated by the gaming studios, for a fee or otherwise. What ESA's opposition actually says is that making older online games playable will compete in the market with its constituent studios' new online games. To that, there seems to be a simple solution: the game companies should keep these older online games viable themselves, then. After all, if a decade-old MMORPG can still compete with newer releases, then clearly there is money in keeping that older game up and running. Notably, game publishers aren't doing that."

David. said...

video game-playing AI beat Q*bert in a way no one’s ever seen before by James Vincent at The Verge reports that:

"They were exploring a particular method of teaching AI agents to navigate video games (in this case, desktop ports of old Atari titles from the 1980s) when they discovered something odd. The software they were testing discovered a bug in the port of the retro video game Q*bert that allowed it to rack up near infinite points."

David. said...
This comment has been removed by the author.
David. said...

Two quick links about game preservation.

David Pescovitz' Meet vintage videogaming's archivist extraordinaire is about:

"Frank Cifaldi, one of the world's leading collectors of rare vintage videogames and related ephemera. ... Cifaldi founded the Video Game History Foundation, dedicated to preserving this vibrant art form's history and culture for the ages."

Jared Newman's Meet The Hardware Artisans Keeping Classic Video Games Alive covers Analogue and other companies building Super Nintendo compatible hardware that outputs HDTV, so the games look really good.