Wednesday, April 1, 2015

Preserving Long-Form Digital Humanities

Carl Straumsheim at Inside Higher Ed reports on a sorely-needed new Mellon Foundation initiative supporting digital publishing in the humanities:
The Andrew W. Mellon Foundation is aggressively funding efforts to support new forms of academic publishing, which researchers say could further legitimize digital scholarship.

The foundation in May sent university press directors a request for proposals to a new grant-making initiative for long-form digital publishing for the humanities. In the e-mail, the foundation noted the growing popularity of digital scholarship, which presented an “urgent and compelling” need for university presses to publish and make digital work available to readers.
Note in particular:
The foundation’s proposed solution is for groups of university presses to ... tackle any of the moving parts that task is comprised of, including “...(g) distribution; and (h) maintenance and preservation of digital content.”
Below the fold, some thoughts on this based on experience from the LOCKSS Program.

Since a Mellon-funded meeting more than a decade ago at the NYPL with humanities librarians, the LOCKSS team has been involved in discussions of, and attempts to, preserve the "long tail" of smaller journal publishers, especially in the humanities. Our observations:
  • The cost of negotiating individually with publishers for permission to preserve their content, and the fact that they need to take action to express that permission, is a major problem. Creative Commons licenses and their standard electronic representation greatly reduce the cost of preservation. If for-pay access is essential for sustainability, some standard electronic representation of permission and standard way of allowing archives access is necessary.
  • Push preservation models, in which the publisher sends content for preservation, are not viable in the long tail. Pull preservation, in which the archive(s) harvest content from the publisher, is essential.
  • Further, the more the "new digital work flows and publication models" diverge from the e-book/PDF model, the less push models will work. They require the archive replicating the original publishing platform, easy enough if it is delivering static files, but not so easy once the content gets dynamic.
  • The cost of pull preservation is dominated by the cost of the first publisher on a given platform. Subsequent publishers have much lower cost. Thus driving publishing to a few, widely-used platforms is very important.
  • Once a platform has critical mass, archives can work with the platform to reduce the cost of preservation. We have worked with the Open Journal System (OJS) to (a) make it easy for publishers to give LOCKSS permission by checking a box, and (b) provide LOCKSS with a way of getting the content without all the highly variable (and thus impossibly expensive) customization. See, for example, work by the Public Knowledge Project.
  • The problem with OJS has been selection - much of the content is too low quality to justify the effort of preserving it. Finding the good stuff is difficult for archives because the signal-to-noise ratio is low.
Between LOCKSS and CLOCKSS we have been more successful at addressing the long tail of journals than most. Thus there is a temptation, as we see for example in discussions of National Hosting, to say "let LOCKSS handle the long tail and [other archive] will handle the big publishers". But we subsidize the long tail from working with the big publishers, explicitly in the case of CLOCKSS, but implicitly in the case of LOCKSS since most librarians are interested in post-cancellation access to the big publishers, but not in contributing to the preservation of the record. Thus there is little funding to preserve the long tail, which is at once more difficult, more expensive and at more risk.

There are significant differences between the University Press market for long-form digital humanities and the long tail of humanities journals. The journals are mostly open-access and many are low-quality. The content that Mellon is addressing is mostly paid access and uniformly high-quality; the selection process has been done by the Presses. But these observations are still relevant, especially the cost implications of a lack of standards.

It is possible that no viable cost-sharing model can be found for archiving the long tail in general. In the University Press case, a less satisfactory alternative is a "preserve in place" strategy in which a condition of funding would be that the University commit to permanent access to the output of its press, with an identified succession plan. At least this would make the cost of preservation visible, and eliminate the assumption that it was someone else's problem.

