Tuesday, May 1, 2012

Catching up

This time I'm not going to apologise for the gap in posting; I was on a long-delayed vacation. Below the fold are some links I noticed in the intervals between vacating.

Digital Preservation

The Economist has an excellent leader and an article (maybe paywalled) explaining clearly the problems that the copyright law poses for preserving society's memory. They conclude:
Despite the proliferation of archives, digital preservation is patchy at best. Until the law catches up with technology, digital history will have to be written in drips and drabs rather than the great gushes promised by the digital age.
 JISC funded a valuable report (PDF) that provides:
high-level  guidance for the strategic and engineering development of Data Management and Preservation plans for ‘Big Science’ data.
It is gratifying to find that in Section 3.2.3 they cite our work on economic models of long-term storage and find that, even in its current preliminary form, it provides useful insight.

Storage Technology

Last month saw the following headline in Cnet:
Seagate says that it has reached the milestone of storage density that offers 1 terabit (1 trillion bits) per square inch, using Heat-Assisted Magnetic recording technology that promises a 60TB hard drive within the next decade.
Making HAMR work at this density is clearly a major achievement for Seagate, but I remain sceptical that we will see an affordable 60TB drive in the next decade. The consumer market for 3.5" drives is going away. Even if HAMR achieves its potential in 2.5" drives they won't be bigger than 20TB. Just as the market for desktop PCs has gone away because laptops are "good enough" and more convenient, the market for laptops will go away because tablets are "good enough" and more convenient. Tablets will have solid state storage for reasons of packaging and robustness, so there won't be a consumer market for 20TB 2.5" drives.

Also, I hear reports that two archives have measured the bit error rate delivered to applications from their tape archives as 10-14. I don't have a link for this yet; I'm still looking. But this is a plausible result.

Academic Communication

The movement for open access to academic publications got support from The Economist, Harvard Library's Faculty Advisory Council, the New York Times  and Cory Doctorow, who makes this interesting observation:
Many scholars sign work-made-for-hire deals with the universities that employ them. That means that the copyright for the work they produce on the job is vested with their employers -- the universities -- and not the scholars themselves. Yet these scholars routinely enter into publishing contracts with the big journals in which they assign the copyright -- which isn't theirs to bargain with -- to the journals. This means that in a large plurality of cases, the big journals are in violation of the universities' copyright. Technically, the universities could sue the journals for titanic fortunes. Thanks to the "strict liability" standard in copyright, the fact that the journals believed that they had secured the copyright from the correct party is not an effective defense, though technically the journals could try to recoup from the scholars, who by and large don't have a net worth approaching one percent of the liability the publishers face.
I have a problem with the "major publishers are too expensive" rhetoric. No-one would have a problem paying the major publishers a lot of money if they were delivering a lot of value. But they aren't, and here is an example to back up my contention that the peer review system has broken down. Via Slashdot and Yahoo, we find this commentary in Nature (paywalled) from the ex-head of global cancer research at Amgen:
During a decade as head of global cancer research at Amgen, C. Glenn Begley identified 53 "landmark" publications -- papers in top journals, from reputable labs -- for his team to reproduce. Begley sought to double-check the findings before trying to build on them for drug development.
Result: 47 of the 53 could not be replicated.
Begley attributes this to:
a skewed system of incentives that has academics cutting corners to further their careers.
On the other hand, Nature shows the kind of value that big publishers can deliver if they want to. They have released their bibliographic metadata as linked data:
The platform includes more than 20 million Resource Description Framework (RDF) statements, including primary metadata for more than 450,000 articles published by NPG since 1869. In this first release, the datasets include basic citation information (title, author, publication date, etc) as well as NPG specific ontologies. These datasets are being released under an open metadata license, Creative Commons Zero (CC0), which permits maximal use/re-use of this data.
Nature deserves a lot of credit for doing this, in particular for licensing the data correctly.

No comments: