Thursday, November 19, 2015

You get what you get and you don't get upset

The title is a quote from Coach Junior, who teaches my elder grand-daughter soccer. It comes in handy when, for example, the random team selection results in a young lady being on the opposite team to her best friend. It came to mind when I read Kalev Leetaru's How Much Of The Internet Does The Wayback Machine Really Archive? documenting the idiosyncratic and evolving samples the Internet Archive collects of the Web, and the subsequent discussion on the IIPC mail alias. Below the fold, my take on this discussion.

Thursday, November 12, 2015

SPARC Author Addendum

SPARC has a post Author Rights: Using the SPARC Author Addendum to secure your rights as the author of a journal article announcing the result of an initiative to fix one of the fundamental problems of academic publishing, namely that in most cases authors carelessly give up essential rights by signing unchanged a copyright transfer agreement written by the publisher's lawyers.

The publisher will argue that this one-sided agreement, often transferring all possible rights to the publisher, is absolutely necessary in order that the article be published. Despite their better-than-average copyright policy, ACM's claims in this regard are typical. I dissected them here.

The SPARC addendum was written by a lawyer, Michael W. Carroll of Villanova University School of Law, and is intended to be attached to, and thereby modify, the publisher's agreement. It performs a number of functions:
  • Preserving the author's rights to reproduce, distribute perform, and display the work for non-commercial purposes.
  • Acknowledges that the work may already be the subject of non-exclusive copyright grants to the author's institution or a funding agency.
  • Imposes as a condition of publication that the publisher provide the author with a PDF of the camera-ready version without DRM.
The kicker is the final paragraph, which requests that the publisher return a signed copy of the addendum, and makes it clear that publishing the work in any way indicates assent to the terms of the addendum. This leaves the publisher with only three choices, agree to the terms, refuse to publish the work, or ignore the addendum.

Of course, many publishers will refuse to publish, and many authors at that point will cave in. The SPARC site has useful advice for this case. The more interesting case is the third, where the publisher simply ignores the author's rights as embodied in the addendum. Publishers are not above ignoring the rights of authors, as shown by the history of my article Keeping Bits Safe: How Hard Can It Be?, published both in ACM Queue (correctly with a note that I retained copyright) and in CACM (incorrectly claiming ACM copyright). I posted analysis of ACM's bogus justification of their copyright policy based on this experience. There is more here.

So what will happen if the publisher ignores the author's addendum? They will publish the paper. The author will not get a camera-ready copy without DRM. But the author will make the paper available, and the "kicker" above means they will be on safe legal ground. Not merely did the publisher constructively agree to the terms of the addendum, but they failed to deliver on their side of the deal. So any attempt to haul the author into court, or send takedown notices, would be very risky for the publisher.

2012 data from Alex Holcombe
Publishers don't need anything except permission to publish. Publishers want the rights beyond this to extract the rents that generate their extraordinary profit margins. Please use the SPARC addendum when you get the chance.

Tuesday, November 10, 2015

Follow-up to the Emulation Report

Enough has happened while my report on emulation was in the review process that, although I announced its release last week, I already have enough material for a follow-up post. Below the fold, the details, including a really important paper from the recent SOSP workshop.

Thursday, November 5, 2015

Cloud computing; Threat or Menace?

Back in May The Economist hosted a debate on cloud computing:
Big companies have embraced the cloud more slowly than expected. Some are holding back because of the cost. Others are wary of entrusting sensitive data to another firm’s servers. Should companies be doing most of their computing in the cloud?
It was sponsored by Microsoft, who larded it with typical cloud marketing happy-talk such as:
The Microsoft Cloud creates technology that becomes essential but invisible, to help you build something amazing. Microsoft Azure empowers organizations with the creation of innovative apps. Dynamics CRM helps companies market smarter and more effectively, while Office 365 enables employees to work from virtually anywhere on any device. So whether you need on-demand scalability, real-time data insights, or technology to connect your people, the Microsoft Cloud is designed to empower your business, allowing you to do more and achieve more.
Below the fold, some discussion of actual content.

Tuesday, November 3, 2015

Emulation & Virtualization as Preservation Strategies

I'm very grateful that funding from the Mellon Foundation on behalf of themselves, the Sloan Foundation and IMLS allowed me to spend much of the summer researching and writing a report, Emulation and Virtualization as Preservation Strategies (37-page PDF, CC-By-SA). I submitted a draft last month, it has been peer-reviewed and I have addressed the reviewers comments. It is also available on the LOCKSS web site.

I'm old enough to know better than to give a talk with live demos. Nevertheless, I'll be presenting the report at CNI's Fall membership meeting in December complete with live demos of a number of emulation frameworks. The TL;DR executive summary of the report is below the fold.

Tuesday, October 27, 2015

More interesting numbers from Backblaze

On the 17th of last month Amazon, in some regions, cut the Glacier price from 1c/GB/month to 0.7c/GB/month. It had been stable since it was announced in August 2012. As usual with Amazon, they launched at an aggressive and attractive price, and stuck there for a long time. Glacier wasn't under a lot of competitive pressure, so they didn't need to cut the price. Below the fold, I look at how Backblaze changed this.

Wednesday, October 21, 2015

ISO review of OAIS

ISO standards are regularly reviewed. In 2017, the OAIS standard ISO14721 will be reviewed. The DPC is spearheading a praiseworthy effort to involve the digital preservation community in the process of providing input to this review, via this Wiki.

I've been critical of OAIS over the years, not so much of the standard itself, but of the way it was frequently mis-used. Its title is Reference Model for an Open Archival Information System (OAIS), but it is often treated as if it were entitled The Definition of Digital Preservation, and used as a way to denigrate digital preservation systems that work in ways the speaker doesn't like by claiming that the offending system "doesn't conform to OAIS". OAIS is a reference model and, as such, defines concepts and terminology. It is the concepts and terminology used to describe a system that can be said to conform to OAIS.

Actual systems are audited for conformance to a set of OAIS-based criteria, defined currently by ISO16363. The CLOCKSS Archive passed such an audit last year with flying colors. Based on this experience, we identified a set of areas in which the concepts and terminology of OAIS were inadequate to describe current digital preservation systems such as the CLOCKSS Archive.

I was therefore asked to inaugurate the DPC's OAIS review Wiki with a post that I entitled The case for a revision of OAIS. My goal was to encourage others to post their thoughts. Please read my post and do so.