Tuesday, October 18, 2016

Why Did Institutional Repositories Fail?

Richard Poynder has a blogpost introducing a PDF containing a lengthy introduction that expands on the blog post and a Q&A with Cliff Lynch on the history and future of Institutional Repositories (IRs). Richard and Cliff agree that IRs have failed to achieve the hopes that were placed in them at their inception in a 1999 meeting at Santa Fe, NM. But they disagree about what those hopes were. Below the fold, some commentary.

Poynder sets out the two competing visions of IRs from the Santa Fe meeting. One was:
The repository model that the organisers of the Santa Fe meeting had very much in mind was the physics preprint server arXiv. ... As a result, the early focus of the initiative was on increasing the speed with which research papers were shared, and it was therefore assumed that the emphasis would be on archiving papers that had yet to be published (i.e. preprints).
The other was:
However, amongst the Santa Fe attendees were a number of open access advocates. They saw OAI-PMH as a way of aggregating content hosted in local - rather than central - archives. And they envisaged that the archived content would be papers that had already been published, rather than preprints. ... In other words, the OA advocates present were committed to the concept of author self-archiving (aka green open access). The objective for them was to encourage universities to create their own repositories and then instruct their researchers to deposit in them copies of all the papers they published in subscription journals. As these repositories would be on the open internet outside any paywall the papers would be freely available to all. And the expectation was that OAI-PMH would allow the content from all these local repositories to be aggregated into a single searchable virtual archive of (eventually) all published research.
Poynder's summary of the state of IRs is hard to dispute:
So while the OA movement may now appear unstoppable there is a growing sense that both the institutional repository and green OA have lost their way. It is not hard to see why. Not only are most researchers unwilling to self-archive their papers, but they remain sceptical about open access per se. Consequently, despite a flood of OA mandates being introduced by funders and institutions, most IRs remain half empty. What content they do contain often consists of no more than the bibliographic details of papers rather than the full text. More strikingly, many of the papers in IRs are imprisoned behind "login walls", which makes them accessible only to members of the host institution (and this is not just because of publisher embargoes). As a result, the percentage of content in IRs that is actually open access is often pretty low. Finally, since effective interoperability remains more aspiration than reality searching repositories is difficult, time-consuming and deeply frustrating.
A small part of this is because OAI-PMH was, as Herbert van de Sompel and Michael Nelson pointed out in Reminiscing About 15 Years of Interoperability Efforts, insufficiently "webby" to be an effective basis for aggregated search across IRs. A larger cause was inadequate investment in IRs:
What has surely also limited what IRs have been able to achieve is that by and large they have been seriously under resourced. This point was graphically made in 2007 by erstwhile repository manager Dorothea Salo. Her conclusion nine years ago was: there is need for a "serious reconsideration of repository missions, goals, and means."
My analysis of the major causes is different, and differs between the advocates of pre- and post-print IRs:
  • The pre-print IR advocates missed the key advantage that subject as opposed to institutional repositories have for the user; each is a single open-access portal containing all the pre-prints (and for arXiv.org essentially all the papers) of interest to researchers in that subject. The idea that a distributed search portal built on OAI-PMH would emerge to allow IRs to compete with subject repositories demonstrates a lack of understanding of user behavior in the Web.
  • The post-print IR advocates were naive in thinking that a loose federation of librarians with little institutional clout and few resources could disrupt a publishing oligopoly generating many billions of dollars a year on the bottom line. It should have been clear from librarians experience of "negotiating" subscriptions that, without strong support from University presidents and funding agencies, the power of the publishers was too great.
There was an interesting discussion in the comments to the blog post, to which Poynder responded here.


Hvdsomp said...

Great blog post, as always, David. I agree with most of the analysis, really. I agree with your observation that the preprint advocates (including myself) might not have fully understood the advantage of subject repositories as natural habitats for scientists of a certain discipline. But I do not agree that preprint advocates aimed at having institutional repositories compete with subject repositories. Actually, as far as I remember, the Santa Fe meeting was rather neutral when it came to this dichotomy. For example, the Universal Preprint Service experiment that Michael Nelson, Thomas Krichel, and I conducted prior to the meeting as a means to guide discussions was largely about aggregating subject repositories, see this D-Lib paper. It is correct that we (also) envisioned preprint servers at the institutional level. It kind of didn't matter whether the set-up was discipline-based or institutional as long as it was about preprints, about materials that had not been formally published. This approach was seen as a way to build a communication system parallel to the established one; not even really to compete with it. It would be disruptive in the sense of Christensen's Disruptive Technologies, whereby preprints lacked a significant feature of the published literature (perceived quality provided by peer-review) but had appealing features the established system did not have (speed of communication, availability). Anyhow, next time we preprint aficionados checked in, the institutional repositories had become anything but preprint servers. In their quest for Open Access, they had ended up focusing largely on published literature.

David. said...

Whether or not that was the intention, subject and institutional repositories were inevitably going to compete. Given the hassle of depositing a paper, authors were only going to do it once. This sets up a competition for deposits.

In a world where aggregating distributed repositories (e.g. via OAI-PMH) worked well, it might not have mattered where a paper was deposited. But aggregation hasn't worked well in search, for example. And if the aggregation was at the metadata level, users would still have to learn the UI of the individual repositories in order to access the content.

A subject portal that provided a single UI to search and access the content provides a much better user experience, even supposing that the aggregation worked well. So the authors would gravitate to the subject repositories for preprints, where access to the content is the whole point. And, having deposited their preprint there, they would have absolutely zero motivation to deposit the published version in an IR. So, in subject areas with a preprint repository, it is pretty much guaranteed to win out.