Thursday, March 24, 2016

Long Tien Nguyen & Alan Kay's "Cuneiform" System

Jason Scott points me to Long Tien Nguyen and Alan Kay's paper from last October entitled The Cuneiform Tablets of 2015. It describes what is in effect a better implementation of Raymond Lorie's Universal Virtual Computer. They attribute the failure of the UVC to its complexity:
They tried to make the most general virtual machine they could think of, one that could easily emulate all known real computer architectures easily. The resulting design has a segmented memory model, bit-addressable memory, and an unlimited number of registers of unlimited bit length. This Universal Virtual Computer requires several dozen pages to be completely specified and explained, and requires far more than an afternoon (probably several weeks) to be completely implemented.
They are correct that the UVC was too complicated, but the reasons why it was a failure are far more fundamental and, alas, apply equally to Chifir, the much simpler virtual machine they describe. Below the fold, I set out these reasons.

The reasons are strongly related to the reason why the regular announcements of new quasi-immortal media have had almost no effect on practical digital preservation. And, in fact, the paper starts by assuming the availability of a quasi-immortal medium in the form of a Rosetta Disk. So we already know that each preserved artefact they create will be extremely expensive.

Investing money and effort now in things that only pay back in the far distant future is simply not going to happen on any scale because the economics don't work. So at best you can send an insignificant amount of stuff on its journey to the future. By far the most important reason digital artefacts, including software, fail to reach future scholars is that no-one could afford to preserve them. Suggesting an approach whose costs are large and totally front-loaded implicitly condemns a vastly larger amount of content of all forms to oblivion because the assumption of unlimited funds is untenable.

Its optimistic to say the least to think you can solve all the problems that will happen to stuff in the next say 1000 years in one fell swoop - you have no idea what the important problems are. The Cuneiform approach assumes that the problems are (a) long-lived media and (b) the ability to recreate an emulator from scratch. These are problems, but there are many other problems we can already see facing the survival of software over the next millennium. And its very unlikely that we know all of them, or that our assessment of their relative importance is correct.

Preservation is a continuous process, not a one-off thing. Getting as much as you can to survive the next few decades is do-able - we have a pretty good idea what the problems are and how to solve them. At the end of that time, technology will (we expect) be better and cheaper, and we will understand the problems of the next few decades. The search for a one-time solution is a distraction from the real, continuing task of preserving our digital heritage.

And I have to say that analogizing a system designed for careful preservation of limited amounts of information for the very long term with cuneiform tablets is misguided. The tablets were not designed or used to send information to the far future. They were the equivalent of paper, a medium for recording information that was useful in the near future, such as for accounting and recounting stories. Although:
Between half a million and two million cuneiform tablets are estimated to have been excavated in modern times, of which only approximately 30,000 – 100,000 have been read or published.
The probability that an individual tablet would have survived to be excavated in the modern era is extremely low. A million or so survived, many millions more didn't. The authors surely didn't intend to propose a technique for getting information to the far future with such a low probability of success.

No comments:

Post a Comment