Tuesday, May 30, 2017

Blockchain as the Infrastructure for Science? (updated)

Herbert Van de Sompel pointed me to Lambert Heller's How P2P and blockchains make it easier to work with scientific objects – three hypotheses as an example of the persistent enthusiasm for these technologies as a way of communicating and preserving research, among other things. Another link from Herbert, Chris H. J. Hartgerink's Re-envisioning a future in scholarly communication from this year's IFLA conference, proposes something similar:
Distributing and decentralizing the scholarly communications system is achievable with peer-to-peer (p2p) Internet protocols such as dat and ipfs. Simply put, such p2p networks securely send information across a network of peers but are resilient to nodes being removed or adjusted because they operate in a mesh network. For example, if 20 peers have file X, removing one peer does not affect the availability of the file X. Only if all 20 are removed from the network, file X will become unavailable. Vice versa, if more peers on the network have file X, it is less likely that file X will become unavailable. As such, this would include unlimited redistribution in the scholarly communication system by default, instead of limited redistribution due to copyright as it is now.
I first expressed skepticism about this idea three years ago discussing a paper proposing a P2P storage infrastructure called Permacoin. It hasn't taken over the world. [Update: my fellow Sun Microsystems alum Radia Perlman has a broader skeptical look at blockchain technology. I've appended some details.]

I understand the theoretical advantages of peer-to-peer (P2P) technology. But after nearly two decades researching, designing, building, deploying and operating P2P systems I have learned a lot about how hard it is for these theoretical advantages actually to be obtained at scale, in the real world, for the long term. Below the fold, I try to apply these lessons.

Friday, May 26, 2017

I'm Doing It Wrong (personal)

Retirement, that is. Nearly six months into retirement from Stanford, I can see my initial visions of kicking back in a La-Z-Boy with time to read and think were unrealistic.

First, until you've been through it, you have no idea how much paperwork getting to be retired involves. I'm still working on it. Second, I'm still involved in the on-going evolution of the LOCKSS architecture, and I'm now working with the Internet Archive to re-think the economic model of long-term storage I built with students at UC Santa Cruz. Third, I have travel coming up, intermittently sick grandkids, and a lot of sysadmin debt built up over the years on our home network. I haven't even started on the mess in the garage.

This is just a series of feeble excuses for why, as Atrios likes to say, "extra sucky blogging" for the next month or so. Sorry about that.

Thursday, May 18, 2017

"Privacy is dead, get over it" [updated]

I believe it was in 1999 that Scott McNealy famously said "privacy is dead, get over it". It is a whole lot deader now than it was then. A month ago in Researcher Privacy I discussed Sam Kome's CNI talk about the surveillance abilities of institutional network technology such as central wireless and access proxies. There's so much more to report on privacy that below the fold there can't be more than some suggested recent readings, as an update to my 6-month old post Open Access and Surveillance. [See a major update at the end]

Tuesday, May 9, 2017

Another Class of Blockchain Vulnerabilities

For at least three years I've been pointing out a fundamental problem with blockchain systems, and indeed peer-to-peer (P2P) systems in general, which is that maintaining their decentralized nature in the face of economies of scale (network effects, Metcalfe's Law, ...) is pretty close to impossible. I wrote a detailed analysis of this issue in Economies of Scale in Peer-to-Peer Networks. Centralized P2P systems, in which a significant minority (or in the case of Bitcoin an actual majority) can act in coordination perhaps because they are conspiring together, are vulnerable to many attacks. This was a theme of our SOSP "Best Paper" winner in 2003.

Now, Catalin Cimpanu at Bleeping Computer reports on research showing yet another way in which P2P networks can become vulnerable through centralization driven by economies of scale. Below the fold, some details.

Thursday, May 4, 2017

Tape is "archive heroin"

I've been boring my blog readers for years with my skeptical take on quasi-immortal media. Among the many, many reasons why long media life, such as claimed for tape, is irrelevant to practical digital preservation is that investing in long media life is a bet against technological progress.

Now, at IEEE Spectrum, Marty Perlmutter's The Lost Picture Show: Hollywood Archivists Can’t Outpace Obsolescence is a great explanation of why tape's media longevity is irrelevant to long-term storage:
While LTO is not as long-lived as polyester film stock, which can last for a century or more in a cold, dry environment, it’s still pretty good.

The problem with LTO is obsolescence. Since the beginning, the technology has been on a Moore’s Law–like march that has resulted in a doubling in tape storage densities every 18 to 24 months. As each new generation of LTO comes to market, an older generation of LTO becomes obsolete. LTO manufacturers guarantee at most two generations of backward compatibility. What that means for film archivists with perhaps tens of thousands of LTO tapes on hand is that every few years they must invest millions of dollars in the latest format of tapes and drives and then migrate all the data on their older tapes—or risk losing access to the information altogether.

That costly, self-perpetuating cycle of data migration is why Dino Everett, film archivist for the University of Southern California, calls LTO “archive heroin—the first taste doesn’t cost much, but once you start, you can’t stop. And the habit is expensive.” As a result, Everett adds, a great deal of film and TV content that was “born digital,” even work that is only a few years old, now faces rapid extinction and, in the worst case, oblivion.
Note also that the required migration consumes a lot of bandwidth, meaning that in order to supply the bandwidth needed to ingest the incoming data you need a lot more drives. This reduces the tape/drive ratio, and thus decreases tape's apparent cost advantage. Not to mention that migrating data from tape to tape is far less automated and thus far more expensive than migrating between on-line media such as disk.

Tuesday, May 2, 2017

Distill: Is This What Journals Should Look Like?

A month ago a post on the Y Combinator blog announced that they and Google have launched a new academic journal called Distill. Except this is no ordinary journal consisting of slightly enhanced PDFs, it is a big step towards the way academic communication should work in the Web era:
The web has been around for almost 30 years. But you wouldn’t know it if you looked at most academic journals. They’re stuck in the early 1900s. PDFs are not an exciting form.

Distill is taking the web seriously. A Distill article (at least in its ideal, aspirational form) isn’t just a paper. It’s an interactive medium that lets users – “readers” is no longer sufficient – work directly with machine learning models.
Below the fold, I take a close look at one of the early articles to assess how big a step this is.