Wednesday, October 3, 2007

Update on Preserving the Record

In my post Why Preserve E-Journals? To Preserve the Record I used the example of government documents to illustrate why trusting web publishers to maintain an accurate record is fraught with dangers. The temptation to mount an "insider attack" to make the record less inconvenient or embarrassing is too much to resist.

Below the fold I report on two more examples, one from the paper world and one from the pre-web electronic world, showing the value of a tamper-evident record.

For the first example I'm indebted to Prof. Jeanine Pariser Plottel of Hunter College, who has compared the pre- and post-WWII editions of books published by right-wing authors in France and shown that the (right-wing) publishers sanitized the post-WWII editions to remove much of the anti-semitic rhetoric. Note that this analysis was possible only because the pre-WWII editions survived in libraries and private collections. They were widely distributed on durable, reasonably tamper-evident media. They survived invasion, occupation, counter-invasion and social disruption. It would have been futile for the publishers to claim that the pre-WWII editions had somehow been faked after the war to discredit the right. Prof. Plottel points to two examples of "common practice":

1. the books of Robert Brasillach (who was executed) edited by his brother-in-law Maurice Bardèche, Professor of 19th Century French Literature at the Sorbonne during the war, stripped of his post, after. The two men published an Histoire du cinéma in 1935. In subsequent editions published several times after the war beginning in 1947, the term "fascism" is replaced by "anti-communisme."

2. Lucien Rebatet's, Les décombres (1942), was one of the best-sellers of the Occupation, and it is virulently anti-Semitic. A new expurgated version was later published under the title Mémoire d'un fasciste. Who was Rebatet? you ask. Relegated to oblivion, I hope. Still, you may remember Truffaut's film, Le dernier métro (wonderful and worth seeing, if you haven't). The character Daxiat is modeled upon Rebatet.

In a web-only world it would have been much easier for the publishers to sanitize history. Multiple libraries keeping copies of the original editions would have been difficult under the DMCA. It must be doubtful whether the library copies would have survived the war. The publisher's changes would likely have remained undetected. Had they been detected the critics would have been much easier to discredit.

The second example is here. This fascinating paper is based on Will Crowther's original source code for ADVENT the pioneering work of interactive fiction that became, with help from Don Woods, the popular Adventure game. The author, Dennis Jerz, shows that the original was based closely on a real cave, part of Kentucky's Colossal Cave system. This observation was obscured by Don Woods' later improvements.

As the swift and comprehensive debunking of the allegations in SCO vs. IBM shows, archaeology of this kind for Open Source software is now routine and effective. This is because the code is preserved in third-party archives which use Source Code Control systems derived from Marc Rochkind's 1972 SCCS, and provide a somewhat tamper-evident record. Although Jerz shows Crowther's original ADVENT dates from the 1975-6 academic year, SCCS had yet to become widely used outside Bell Labs, and the technology needed for third-party repositories was a decade in the future. Jerz's work depended on Stanford's ability to recover data from backups of Don Woods' student account from 30 years ago; an impressive feat of system administration! Don Woods vouches for the recovered code, so there's no suspicion that it isn't authentic.

How likely is it that other institutions could recover 30-year old student files? Absent such direct testimony, how credible would allegedly recovered student files that old be? Yet they have provided important evidence for the birth of an entire genre of fiction.


Dennis G. Jerz said...

Yes, it was a fantastic stroke of luck that the files were accessible, even after I figured out who was the right person to ask. Adventure is going to be one of the case studies in the "Preserving Digital Worlds" project, funded by the Library of Congress. But there are many, many other digital artifacts that have already been lost.

A better URL for that Digital Humanities Quarterly article is

David. said...

Thank you! I fixed the link as you suggested.

David. said...

Anyone interested in Adventure needs to see Jason Scott's documentary about text-based games Get Lamp.

David. said...

Simon Sharwood's Source code for seminal adventure game Zork on dead mainframe exhumed onto GitHub reports on another source code rescue of an important early game:

"Source code for seminal adventure game has been Zork recovered and published on GitHub.

While classic adventure games (aka interactive fiction) are well represented in the Internet Archive - there’s plenty of playable Zork versions here - this new trove is source code that’s been retrieved from the Massachusetts Institute of Technology, Tapes of Tech Square (ToTS) collection at the MIT Libraries Department of Distinctive Collections (DDC).