This document defines two URI schemes. The first, 'duri' (standing for "dated URI"), identifies a resource as of a particular time. This allows explicit reference to the "time of retrieval", similar to the way in which bibliographic references containing URIs are often written. The second scheme, 'tdb' ( standing for "Thing Described By"), provides a way of minting URIs for anything that can be described, by the means of identifying a description as of a particular time. These schemes were posited as "thought experiments", and therefore this document is designated as Experimental.As far as I can tell, this proposal went nowhere, but it raises a question that is also raised by NFTs. What is the point of a link that is unlikely to continue to resolve to the expected content? Below the fold I explore this question.
I'm David Rosenthal, and this is a place to discuss the work I'm doing in Digital Preservation.
Thursday, April 22, 2021
What Is The Point?
During a discussion of NFTs, Larry Masinter pointed me to his 2012 proposal The 'tdb' and 'duri' URI schemes, based on dated URIs. The proposal's abstract reads:
Thursday, April 15, 2021
NFTs and Web Archiving
One of the earliest observations of the behavior of the Web at scale was "link rot". There were a lot of 404s, broken links. Research showed that the half-life of Web pages was alarmingly short. Even in 1996 this problem was obvious enough for Brewster Kahle to found the Internet Archive to address it. From the Wikipedia entry for Link Rot:
I expect you have noticed the latest outbreak of blockchain-enabled insanity, Non-Fungible Tokens (NFTs). Someone "paying $69M for a JPEG" or $560K for a New York Times column attracted a lot of attention. Follow me below the fold for the connection between NFTs, "link rot" and Web archiving.
A 2003 study found that on the Web, about one link out of every 200 broke each week,[1] suggesting a half-life of 138 weeks. This rate was largely confirmed by a 2016–2017 study of links in Yahoo! Directory (which had stopped updating in 2014 after 21 years of development) that found the half-life of the directory's links to be two years.[2]One might have thought that academic journals were a relatively stable part of the Web, but research showed that their references decayed too, just somewhat less rapidly. A 2013 study found a half-life of 9.3 years. See my 2015 post The Evanescent Web.
I expect you have noticed the latest outbreak of blockchain-enabled insanity, Non-Fungible Tokens (NFTs). Someone "paying $69M for a JPEG" or $560K for a New York Times column attracted a lot of attention. Follow me below the fold for the connection between NFTs, "link rot" and Web archiving.
Tuesday, April 13, 2021
Cryptocurrency's Carbon Footprint
China’s bitcoin mines could derail carbon neutrality goals, study says and Bitcoin mining emissions in China will hit 130 million tonnes by 2024, the headlines say it all. Excusing this climate-destroying externality of Proof-of-Work blockchains requires a continuous flow of new misleading arguments. Below the fold I discuss one of the more recent novelties.
Tuesday, April 6, 2021
Elon Musk: Threat or Menace?
Although both Tesla and SpaceX are major engineering achievements, Elon Musk seems completely unable to understand the concept of externalities, unaccounted-for costs that society bears as a result of these achievements.
First, in Tesla: carbon offsetting, but in reverse, Jaime Powell reacted to Tesla taking $1.6B in carbon offsets which provided the only profit Tesla ever made and putting them into Bitcoin:
First, in Tesla: carbon offsetting, but in reverse, Jaime Powell reacted to Tesla taking $1.6B in carbon offsets which provided the only profit Tesla ever made and putting them into Bitcoin:
Looked at differently, a single Bitcoin purchase at a price of ~$50,000 has a carbon footprint of 270 tons, the equivalent of 60 ICE cars.Below the fold, more externalities Musk is ignoring.
Tesla’s average selling price in the fourth quarter of 2020? $49,333.
We’re not sure about you, but FT Alphaville is struggling to square the circle of “buy a Tesla with a bitcoin and create the carbon output of 60 internal combustion engine cars” with its legendary environmental ambitions.
Unless, of course, that was never the point in the first place.
Thursday, March 25, 2021
Internet Archive Storage

Jonah Edwards, who runs the Core Infrastructure team, gave a presentation on the Internet Archive's storage infrastructure to the Archive's staff. Below the fold, some details and commentary.
Tuesday, March 16, 2021
Correlated Failures
The invaluable statistics published by Backblaze show that, despite being built from technologies close to the physical limits (Heat-Assisted Magnetic Recording, 3D NAND Flash), modern digital storage media are extraordinarily reliable. However, I have long believed that the models that attempt to project the reliability of digital storage systems from the statistics of media reliability are wildly optimistic. They ignore foreseeable causes of data loss such as Coronal Mass Ejections and ransomware attacks, which cause correlated failures among the media in the system. No matter how many they are, if all replicas are destroyed or corrupted the data is irrecoverable.
Modelling these "black swan" events is clearly extremely difficult, but much less dramatic causes are in practice important too. It has been known at least since Talagala's 1999 Ph.D. thesis that media failures in storage systems are significantly correlated, and at least since Jiang et al's 2008 Are Disks the Dominant Contributor for Storage Failures? A Comprehensive Study of Storage Subsystem Failure Characteristics that only about half the failures in storage systems are traceable to media failures. The rest happen in the pipeline from the media to the CPU. Because this typically aggregates data from many media components, it naturally causes correlations.
As I wrote in 2015's Disk reliability, discussing Backblaze's experience of a 40% Annual Failure Rate (AFR) in over 1,100 Seagate 3TB drives:
Modelling these "black swan" events is clearly extremely difficult, but much less dramatic causes are in practice important too. It has been known at least since Talagala's 1999 Ph.D. thesis that media failures in storage systems are significantly correlated, and at least since Jiang et al's 2008 Are Disks the Dominant Contributor for Storage Failures? A Comprehensive Study of Storage Subsystem Failure Characteristics that only about half the failures in storage systems are traceable to media failures. The rest happen in the pipeline from the media to the CPU. Because this typically aggregates data from many media components, it naturally causes correlations.
As I wrote in 2015's Disk reliability, discussing Backblaze's experience of a 40% Annual Failure Rate (AFR) in over 1,100 Seagate 3TB drives:
Alas, there is a long history of high failure rates among particular batches of drives. An experience similar to Backblaze's at Facebook is related here, with an AFR over 60%. My first experience of this was nearly 30 years ago in the early days of Sun Microsystems. Manufacturing defects, software bugs, mishandling by distributors, vibration resonance, there are many causes for these correlated failures.Despite plenty of anecdotes, there is little useful data on which to base models of correlated failures in storage systems. Below the fold I summarize and comment on an important paper by a team from the Chinese University of Hong Kong and Alibaba that helps remedy this.
Thursday, March 4, 2021
History Of Window Systems
Alan Kay's Should web browsers have stuck to being document viewers? makes important points about the architecture of the infrastructure for user interfaces, but also sparked comments and an email exchange that clarified the early history of window systems. This is something I've wrtten about previously, so below the fold I go into considerable detail.
Subscribe to:
Posts (Atom)