Talk on LOCKSS Metadata Extraction at IIPC 2013

I gave a brief introduction to the way the LOCKSS daemon extracts metadata from the content it collects at the 2013 IIPC General Assembly. Below the fold is an edited text with links to the sources.

Talk on Harvesting the Future Web at IIPC2013

I gave a talk to introduce the workshop "Future Web Capture: Replay, Data Mining and Analysis" at the 2013 IIPC General Assembly. It was based on my talk at the Spring CNI meeting. Below the fold is an edited text with links to the sources.

Software obsolescence doesn't imply format obsolescence

Tim Anderson at The Register celebrates the 20th anniversary of Mosaic:
Using the DOSBox emulator (the Megabuild version which has network connectivity via an emulated NE2000 NIC) I ran up Windows 3.11 with Trumpet Winsock and got Mosaic 1.0 running.
This illustrates two important points:
  • Tim had no trouble resuscitating a 20-year-old software environment using off-the-shelf emulation.
  • The 20-year-old browser struggled to make sense of today's web. But today's browsers have no difficulty at all with vintage web pages.
The fact that the software that originally interpreted the content is obsolete (a) does not meant that there is significant difficulty in running it, and (b) does not mean that you need to use emulation to run it in order to interpret the content, because the obsolescence of the software does not imply the obsolescence of the format. Backwards compatibility is a feature of the Web, for reasons I have been pointing out for many years.

Moore, Kryder vs. SAW

Ashish Sood et al's paper Predicting the Path of Technological Innovation: SAW vs. Moore, Bass, Gompertz, and Kryder is very interesting. They propose a discontinuous model in which technology evolves in steps, separated by periods of stasis they call waits, leading them to dub the model SAW (Step And Wait). They show that it models the evolution of a wide range of technologies better than continuous models such as Moore's and Kryder's laws. Our work on the economics of long-term storage is based on Kryder's law, a continuous model. Below the fold I ask whether we need to change models.

Making Memento Succesful

I gave a talk at the IIPC General Assembly on the problems facing Memento as it attempts the transition from a technology to a ubiquitous part of the Web's infrastructure. It was based on my earlier posts on Memento, my talk at the recent CNI and discussions with the Memento team, and intended to provide the background for subsequent talks from Herbert van de Sompel and Michael Nelson. Below the fold is an edited text with links to the sources.

It isn't just Kryder's Law

The drastic fall-off in PC shipments as demand switches to tablets isn't just affecting the prospects for Kryder's law reducing storage media costs, even for 2.5" drives:
PC sales are in terminal decline thanks to the continued popularity of tablets and there’s nothing an anticipated surge in ultramobiles can do to stop it.
Gartner has estimated that this year will see 2.4 billion devices shipped – that’s PCs, tablets and mobile phones combined – growing nine per cent over 2012.
The number of PCs sold in 2013 will fall 7.6 per cent compared to 2012, to 315 million units, with the only bright spot being ultramobiles, which will increase 140 per cent to 23 million units.
Tablet shipments will surge 69 per cent to 197 million units, while smartphones will make up an ever-increasing slice of the mobile phone pie. Of the 1.875 billion mobile phones Gartner predicts will be sold in 2013, a whopping 1 billion units are predicted to be smartphones, compared with 675 million units in 2012 (out of 1.746 billion).
 It is affecting the prospects for Moore's law reducing the costs of the servers that drive them:
But with memory prices stabilizing after years of double-digit drops, analysts said that DDR3 DRAM will likely have a longer-than-expected life, which could delay the wide adoption of DDR4 in computers. DRAM prices have stabilized as demand for DDR3 has exceeded supply, and the number of memory makers has also dwindled. ...
The volume shipments of PCs and servers are not enough to justify an early switch to DDR4, analysts said. Also, a lot of focus is now on the fast-growing tablet and smartphone markets, so manufacturers are shifting capacity to LPDDR3 and other forms of mobile memory and storage.

Talk at Spring 2013 CNI

Kris Carpenter Negulescu and I gave talks at the Spring 2013 CNI meeting in a project briefing entitled "Its Not Your Grandfather's Web Any Longer". They were based on the workshop we ran at the 2012 IIPC meeting at the Library of Congress looking at the problems of harvesting and preserving the future Web. I talked about the problems the workshop identified and Kris talked about the solutions people are working on. Below the fold is an edited text of my part of the talk with links to the sources.

More on Amazon's Margins

I'm not the only one doing the math to show the extortionate margins Amazon enjoys on its S3 cloud storage business. Over at The Register Simon Sharwood uses an announcement about Amazon's Cloud Drive service and a comparison with the competing Dropbox service, which runs on S3, to draw the same conclusion. He shows that, unless either Amazon or Dropbox are losing money, S3's costs must be much less than 3.7c/GB/mo:
5000 terabytes is 5,120,000 gigabytes. At $0.037 a gigabyte a month, Dropbox would have a bill of $189,440 a month. At $9.99 a month for 100 gigabytes of data, Dropbox needs 18,963 paying customers to meet that bill. 18,963 times 100 gigabytes is 1,896,296, which leaves 3,223,704 gigabytes of space Dropbox can dole out to its non-paying users. That's not 96 per cent of the capacity it pays for, but given Dropbox's customers at all levels probably don't use all their capacity it's not hard to see how Dropbox could get mighty close to a profit even if it pays AWS' published price, which we can't imagine it does.
So is Amazon making a profit on Cloud Drive's paid plans? AWS' genesis as Amazon's private cloud means it is sensible to assume Cloud Drive runs on S3 or something an awful lot like it, charged back between business units at low, low, mi casa es su casa prices. That could mean AWS can operate cloud storage space rather more cheaply than its advertised rates and almost certainly more cheaply than it charges even colossal customers like Dropbox.
Go read the whole piece.