Friday, May 26, 2017

I'm Doing It Wrong (personal)

Retirement, that is. Nearly six months into retirement from Stanford, I can see my initial visions of kicking back in a La-Z-Boy with time to read and think were unrealistic.

First, until you've been through it, you have no idea how much paperwork getting to be retired involves. I'm still working on it. Second, I'm still involved in the on-going evolution of the LOCKSS architecture, and I'm now working with the Internet Archive to re-think the economic model of long-term storage I built with students at UC Santa Cruz. Third, I have travel coming up, intermittently sick grandkids, and a lot of sysadmin debt built up over the years on our home network. I haven't even started on the mess in the garage.

This is just a series of feeble excuses for why, as Atrios likes to say, "extra sucky blogging" for the next month or so. Sorry about that.

Thursday, May 18, 2017

"Privacy is dead, get over it" [updated]

I believe it was in 1999 that Scott McNealy famously said "privacy is dead, get over it". It is a whole lot deader now than it was then. A month ago in Researcher Privacy I discussed Sam Kome's CNI talk about the surveillance abilities of institutional network technology such as central wireless and access proxies. There's so much more to report on privacy that below the fold there can't be more than some suggested recent readings, as an update to my 6-month old post Open Access and Surveillance. [See a major update at the end]

Tuesday, May 9, 2017

Another Class of Blockchain Vulnerabilities

For at least three years I've been pointing out a fundamental problem with blockchain systems, and indeed peer-to-peer (P2P) systems in general, which is that maintaining their decentralized nature in the face of economies of scale (network effects, Metcalfe's Law, ...) is pretty close to impossible. I wrote a detailed analysis of this issue in Economies of Scale in Peer-to-Peer Networks. Centralized P2P systems, in which a significant minority (or in the case of Bitcoin an actual majority) can act in coordination perhaps because they are conspiring together, are vulnerable to many attacks. This was a theme of our SOSP "Best Paper" winner in 2003.

Now, Catalin Cimpanu at Bleeping Computer reports on research showing yet another way in which P2P networks can become vulnerable through centralization driven by economies of scale. Below the fold, some details.

Thursday, May 4, 2017

Tape is "archive heroin"

I've been boring my blog readers for years with my skeptical take on quasi-immortal media. Among the many, many reasons why long media life, such as claimed for tape, is irrelevant to practical digital preservation is that investing in long media life is a bet against technological progress.

Now, at IEEE Spectrum, Marty Perlmutter's The Lost Picture Show: Hollywood Archivists Can’t Outpace Obsolescence is a great explanation of why tape's media longevity is irrelevant to long-term storage:
While LTO is not as long-lived as polyester film stock, which can last for a century or more in a cold, dry environment, it’s still pretty good.

The problem with LTO is obsolescence. Since the beginning, the technology has been on a Moore’s Law–like march that has resulted in a doubling in tape storage densities every 18 to 24 months. As each new generation of LTO comes to market, an older generation of LTO becomes obsolete. LTO manufacturers guarantee at most two generations of backward compatibility. What that means for film archivists with perhaps tens of thousands of LTO tapes on hand is that every few years they must invest millions of dollars in the latest format of tapes and drives and then migrate all the data on their older tapes—or risk losing access to the information altogether.

That costly, self-perpetuating cycle of data migration is why Dino Everett, film archivist for the University of Southern California, calls LTO “archive heroin—the first taste doesn’t cost much, but once you start, you can’t stop. And the habit is expensive.” As a result, Everett adds, a great deal of film and TV content that was “born digital,” even work that is only a few years old, now faces rapid extinction and, in the worst case, oblivion.
Note also that the required migration consumes a lot of bandwidth, meaning that in order to supply the bandwidth needed to ingest the incoming data you need a lot more drives. This reduces the tape/drive ratio, and thus decreases tape's apparent cost advantage. Not to mention that migrating data from tape to tape is far less automated and thus far more expensive than migrating between on-line media such as disk.

Tuesday, May 2, 2017

Distill: Is This What Journals Should Look Like?

A month ago a post on the Y Combinator blog announced that they and Google have launched a new academic journal called Distill. Except this is no ordinary journal consisting of slightly enhanced PDFs, it is a big step towards the way academic communication should work in the Web era:
The web has been around for almost 30 years. But you wouldn’t know it if you looked at most academic journals. They’re stuck in the early 1900s. PDFs are not an exciting form.

Distill is taking the web seriously. A Distill article (at least in its ideal, aspirational form) isn’t just a paper. It’s an interactive medium that lets users – “readers” is no longer sufficient – work directly with machine learning models.
Below the fold, I take a close look at one of the early articles to assess how big a step this is.

Friday, April 21, 2017

A decade of blogging

A decade ago today I posted Mass-market scholarly communication to start this blog. Now, 459 posts later I would like to thank everyone who has read and especially those who have commented on it.

Blogging is useful to me for several reasons:
  • It forces me to think through issues.
  • It prevents me forgetting what I thought when I thought through an issue.
  • Its a much more effective way to communicate with others in the same field than publishing papers.
  • Since I'm not climbing the academic ladder there's not much incentive for me to publish papers anyway, although I have published quite a few since I started LOCKSS.
  • I've given quite a few talks too. Since I started posting the text of a talk with links to the sources it has become clear that it is much more useful to readers than posting the slides.
  • I use the comments as a handy way to record relevant links, and why I thought they were relevant.
There weren't  a lot of posts until in 2011 I started to target one post a week. I thought it would be hard to come up with enough topics, but pretty soon afterwards half-completed or note-form drafts started accumulating. My posting rate has accelerated smoothly since, and most weeks now get two posts. Despite this, I have more drafts lying around than ever.

Wednesday, April 19, 2017

Emularity strikes again!

The Internet Archive's massive collection of software now includes an in-browser emulation in the Emularity framework of the original Mac with MacOS from 1984 to 1989, and a Mac Plus with MacOS 7.0.1 from 1991. Shaun Nichols at The Register reports that:
The emulator itself is powered by a version of Hampa Hug's PCE Apple emulator ported to run in browsers via JavaScript by James Friend. PCE and PCE.js have been around for a number of years; now that tech has been married to the Internet Archive's vault of software.
Congratulations to Jason Scott and the software archiving team!