Thus devoting a large proportion of the resources available for preservation to obsessively collecting metadata intended to ease eventual format migration was economically unjustifiable, for three reasons. First, the time value of money meant that paying the cost later would allow more content to be preserved. Second, the format might never suffer obsolescence, so the cost of preparing to migrate it would be wasted. Third, if the format ever did suffer obsolescence, the technology available to handle it when obsolescence occurred would be better than when it was ingested.
Below the fold, I ask how well the predictions have held up in the light of subsequent developments?
Research by Matt Holden at INA in 2012 showed that the vast majority of even 15-year old audio-visual content was easily rendered with current tools. The audio-visual formats used in the early days of the Web would be among the most vulnerable to obsolescence. The UK Web Archive's Interject prototype's Web site claims that these formats are obsolete and require migration:
- image/x-bitmap and image/x-pixmap, both rendered in my standard Linux environment via Image Viewer.
- x-world/x-vrml, versions 1 and 2, not rendered in my standard Linux environment, but migration tools available.
- ZX Spectrum software, not suitable for migration.
|Click image to start emulation|
|Viewed with Safari on OS X|
The prediction that if obsolescence were to happen to a widely used format it would happen very slowly is currently being validated, but not for the expected reason and not as a demonstration of the necessity of format migration. Adobe's Flash has been a very widely used Web format. It is not obsolete in the sense that it can no longer be rendered. It is becoming obsolete in the sense that browsers are following Steve Jobs lead and deprecating its use, because it is regarded as too dangerous in today's Internet threat environment:
Five years ago, 28.9% of websites used Flash in some way, according to Matthias Gelbmann, managing director at web technology metrics firm W3Techs. As of August, Flash usage had fallen to 10.3%.If browsers won't support Flash because it poses an unacceptable risk to the underlying system, much of the currently preserved Web will become unusable. It is true that some of that preserved Web is Flash malware, thus simply asking the user to enable Flash in their browser is not a good idea. But if Web archives emulated a browser with Flash, either remotely or locally, the risk would be greatly reduced.
But larger websites have a longer way to go. Flash persists on 15.6% of the top 1,000 sites, Gelbmann says. That’s actually the opposite situation compared to a few years ago, when Flash was used on 22.2% of the largest sites, and 25.6% of sites overall.
Even if the emulation fell victim to the malware, the underlying system would be at much less risk. If the goal of the malware was to use the compromised system as part of a botnet, the emulation's short life-cycle would render it ineffective. Users would have to be warned against input-ing any sensitive information that the malware might intercept, but it seems unlikely that many users would send passwords or other credentials via a historical emulation. And, because the malware was captured before the emulation was created, the malware authors would be unable to update it to target the emulator itself rather than the system it was emulating.
So, how did my predictions hold up?
- It is clear that obsolescence of widely used Web formats is rare. Flash is the only example in two decades, and it isn't obsolete in the sense that advocates of preemptive migration meant.
- It is clear that if it occurs, obsolescence of widely used Web formats is a very slow process. For Flash, it has taken half a decade so far, and isn't nearly complete.
- The technology for accessing preserved content has improved considerably. I'm not aware of any migration-based solution for safely accessing preserved Flash content. It seems very likely that a hypothetical technique for migrating Flash would migrate the malware as well, vitiating the reason for the migration.