Thursday, February 11, 2016

James Jacobs on Looking Forward

Government documents have long been a field that the LOCKSS Program has been involved in. Recent history, such as that of the Harper administration in Canada, is full of examples of Winston Smith style history editing by governments. This makes it essential that copies of government documents are maintained outside direct government custody, and several private LOCKSS networks are doing this for various kinds of government documents. Below the fold, a look at the US Federal Depository Library Program, which has been doing this in the paper world for a long time, and the state of its gradual transition to the digital world.

Tuesday, February 9, 2016

The Malware Museum

Mikko Hypponen and Jason Scott at the Internet Archive have put up the Malware Museum:
a collection of malware programs, usually viruses, that were distributed in the 1980s and 1990s on home computers. Once they infected a system, they would sometimes show animation or messages that you had been infected. Through the use of emulations, and additionally removing any destructive routines within the viruses, this collection allows you to experience virus infection of decades ago with safety.
The museum is an excellent use of emulation and well worth a visit.

I discussed the issues around malware in my report on emulation. The malware in the Malware Museum is too old to be networked, and thus avoids the really difficult issues that running software with access to the network that is old, and thus highly vulnerable, causes.

Even if emulation can ensure that only the virtual machine and not its host is infected, and users can be warned not to input any personal information to it, this may not be enough. The goal of the infection is likely to be to co-opt the virtual machine into a botnet, or to act as a Trojan on your network. If you run this vulnerable software you are doing something that a reasonable person would understand puts other people's real machines at risk. The liability issues of doing so bear thinking about.

Tuesday, February 2, 2016

Always read the fine print

When Amazon announced Glacier I took the trouble to read their pricing information carefully and wrote:
Because the cost penalties for peak access to storage and for small requests are so large ..., if Glacier is not to be significantly more expensive than local storage in the long term preservation systems that use it will need to be carefully designed to rate-limit accesses and to request data in large chunks.
Now, 40 months later, Simon Sharwood at The Register reports that people who didn't pay attention are shocked that using Glacier can cost more in a month than enough disk to store the data 60 times over:
Last week, a chap named Mario Karpinnen took to Medium with a tale of how downloading 60GB of data from Amazon Web Services' archive-grade Glacier service cost him a whopping US$158.

Karpinnen went into the fine print of Glacier pricing and found that the service takes your peak download rate, multiplies the number of gigabytes downloaded in your busiest hour for the month and applies it to every hour of the whole month. His peak data retrieval rate of 15.2GB an hour was therefore multiplied by the $0.011 per gigabyte charged for downloads from Glacier. And then multiplied by the 744 hours in January. Once tax and bandwidth charges were added, in came the bill for $158.
Karpinnen's post is a cautionary tale for Glacier believers, but the real problem is he didn't look the gift horse in the mouth:
But doing the math (and factoring in VAT and the higher prices at AWS’s Irish region), I had the choice of either paying almost $10 a month for the simplicity of S3 or just 87¢/mo for what was essentially the same thing,
He should have asked himself how Amazon could afford to sell "essentially the same thing" for one-tenth the price. Why wouldn't all their customers switch? I asked myself this in my post on the Glacier announcement:
In order to have a competitive product in the the long-term storage market Amazon had to develop a new one, with a different pricing model. S3 wasn't competitive.
As Sharwood says:
Karpinnen's post and Oracle's carping about what it says about AWS both suggest a simple moral to this story: cloud looks simple, but isn't, and buyer beware applies every bit as much as it does for any other product or service.
The fine print was written by the vendor's lawyers. They are not your friends.

Tuesday, January 26, 2016

Emulating Digital Art Works

Back in November a team at Cornell led by Oya Rieger and Tim Murray produced a white paper for the National Endowment for the Humanities entitled Preserving and Emulating Digital Art Objects. It was the result of two years of research into how continuing access could be provided to the optical disk holdings of the Rose Goldsen Archive of New Media Art at Cornell. Below the fold, some comments on the white paper.

Monday, January 18, 2016

Bitcoin's Death Spiral

More than two years ago in my first post on Bitcoin I wrote about the difficulty of maintaining its decentralized nature. Nearly a year later I wrote Economies of Scale in Peer-to-Peer Networks, a detailed explanation of why peer-to-peer currencies could not maintain decentralization for long. In a long and fascinating post Mike Hearn, one of the original developers of the Bitcoin software, has now announced that The resolution of the Bitcoin experiment is that it has failed.

The fundamental reasons for the failure are lack of decentralization at both the organizational and technical levels. You have to read Mike's post to understand the organizational issues, which would probably have doomed Bitcoin irrespective of the technical issues. They prevented Bitcoin responding to the need to increase the block size. But the block size is a minor technical issue compared to the fact that:
the block chain is controlled by Chinese miners, just two of whom control more than 50% of the hash power. At a recent conference over 95% of hashing power was controlled by a handful of guys sitting on a single stage.
As Mike says:
Even if a new team was built to replace Bitcoin Core, the problem of mining power being concentrated behind the Great Firewall would remain. Bitcoin has no future whilst it’s controlled by fewer than 10 people. And there’s no solution in sight for this problem: nobody even has any suggestions. For a community that has always worried about the block chain being taken over by an oppressive government, it is a rich irony.
Mike's post is a must-read. But reading it doesn't explain why "nobody even has any suggestions". For that you need to read Economies of Scale in Peer-to-Peer Networks.

Friday, January 15, 2016

The Internet is for Cats

It is a truth universally acknowledged that, after pr0n, the most important genre of content on the Internet is cat videos. But in the early days of the Web, there was no video. For sure, there was pr0n, but how did the Internet work without cat videos? Follow me below the fold for some research into the early history of Web content.

Wednesday, January 13, 2016

Guest post: Ilya Kreymer on

Recently, the remarkably productive Ilya Kreymer put up an emulation-based system for displaying archived Web pages using contemporary browsers at I mentioned it in my talk at the last CNI meeting, but I had misunderstood the details, so Ilya had to correct me.

Ilya's work is much more important that I originally realized. It isn't just a very good example of the way that emulation can layer useful services over archived content. It is also a different approach to delivering emulations, leveraging the current trend towards containers and thus less dependent on specialized, preservation-only technology.

I asked Ilya to write a guest post explaining how it works, which is below the fold.