Tuesday, March 15, 2016

Elsevier and the Streisand Effect

Nearly a year ago I wrote The Maginot Paywall about the rise of research into the peer-to-peer sharing of academic papers via mechanisms including Library Genesis, Sci-Hub and #icanhazpdf. Although these mechanisms had been in place for some time they hadn't received a lot of attention. Below the fold, a look at how and why this has recently changed.

In 2001 the World Health Organization worked with the major publishers to set up Hinari, a system whereby researchers in developing countries could get free or very-low-cost access to health journals. There are similar systems for agriculture, the environment and technology. Why would the publishers give access to their journals to researchers at institutions that hadn't paid anything?

The answer is that the publishers were not losing money by doing so. There was no possibility that institutions in developing countries could pay the subscription. Depriving them of access would not motivate them to pay; they couldn't possibly afford to pay. Cross-subsidizing their access cost almost nothing and had indirect benefits, such as cementing the publishers' role as gatekeepers for research, and discouraging the use of open access.

Similarly, peer-to-peer sharing of papers didn't actually lose the major publishers significant amounts of money. Institutions that could afford to subscribe were not going to drop their subscriptions and encourage their researchers to use these flaky and apparently illegal alternatives. The majority usage of these mechanisms was from researchers whose institutions would never subscribe, and who could not afford the extortionate pay-per-view charges. Effective techniques to suppress them would be self-defeating. As I wrote in The Maginot Paywall:
Copyright maximalists such as the major academic publishers, are in a similar position. The more effective and thus intrusive the mechanisms they implement to prevent unauthorized access, the more they incentivize "guerilla open access".
Then last June Elsevier filed a case in New York trying to shut down Library Genesis and Sci-Hub. Both are apparently based in Russia, which is not highly motivated to send more of its foreign reserves to Western publishers. So the case was not effective at shutting them down. It turned out, however, to be a classic case of the Streisand Effect, in which attempting to suppress information on the Web causes it to attract far more attention.

The Streisand Effect started slowly, with pieces at Quartz and BBC News in October. The EFF weighed in on the topic in December with What If Elsevier and Researchers Quit Playing Hide-and-Seek?:
Sci-Hub and LibGen have now moved to new domains, and Sci-Hub has set up a .onion address; this allows users to access the service anonymously through Tor. How quickly the sites have gotten back on their feet after the injunction underscores that these services can't really be stopped. Elsevier can't kill unauthorized sharing of its papers; at best, it can only make sharing incrementally less convenient.
But the Streisand Effect really kicked in early last month with Simon Oxenham's Meet the Robin Hood of Science, which led to Fiona MacDonald's piece at Science Alert, Kaveh Waddell's The Research Pirates of the Dark Web and Kieran McCarthy's Free science journal library gains notoriety, lands injunctions. Mike Masnick's Using Copyright To Shut Down 'The Pirate Bay' Of Scientific Research Is 100% Against The Purpose Of Copyright went back to the Constitution:
Article 1, Section 8, Clause 8 famously says that Congress has the following power:
To promote the progress of science and useful arts, by securing for limited times to authors and inventors the exclusive right to their respective writings and discoveries.
and the 1790 Copyright Act, which was subtitled "An Act for the Encouragement of Learning." Encouragement of learning is what Sci-Hub is for. Mike Taylor's Barbra Streisand, Elsevier, and Sci-Hub was AFAIK the first to point out that Elsevier had triggered the Streisand Effect. Simon Oxenham followed up with The Robin Hood of Science: The Missing Chapter, making the connection with the work of the late Aaron Swartz.

Barbara Fister made the very good point that Universities don't just supply the publishers with free labor in the form of authoring and reviewing:
Because it is labor - lots of labor - to maintain link resolvers, keep license agreements in order, and deal with constant changes in subscription contents. We have to work a lot harder to be publishers' border guards than people realize.
and she clearly lays out the impossible situation librarians are in:
We feel we are virtually required to provide access to whatever researchers in our local community ask for while restricting access from anyone outside that narrowly-defined community of users. Instead of curators, we're personal shoppers who moonlight as border guards. This isn't working out well for anyone. Unaffiliated researchers have to find illegal work-arounds, and faculty who actually have access through libraries are turning to the black market for articles because it seems more efficient than contacting their personal shopper, particularly when the library itself doesn't figure in their work flow. In the meantime, all that money we spend on big bundles of articles (or on purchasing access to articles one at a time when we can't afford the bundle anymore) is just a really high annual rent. We can't preserve what we don't own, and we don't curate because our function is to get what is asked for.
The Library Loon has a series of posts that are worth reading (together with some of their comments). She links to A Short History of The Russian Digital Shadow Libraries by Balázs Bodó, a must-read analysis starting in Soviet times showing that Sci-Hub is but one product of a long history of resistance to censorship. Bodó has a more reflective piece In the Name of Humanity in Limn's Total Archive issue, where he makes the LOCKSS argument:
This is the paradox of the total piratical archive: they collect enormous wealth, but they do not own or control any of it. As an insurance policy against copyright enforcement, they have already given everything away: they release their source code, their databases, and their catalogs; they put up the metadata and the digitalized files on file-sharing networks. They realize that exclusive ownership/control over any aspects of the library could be a point of failure, so in the best traditions of archiving, they make sure everything is duplicated and redundant, and that many of the copies are under completely independent control.
The Loon's analysis of the PR responses from the publishers is acute:
Why point this effluent at librarians specifically rather than academe generally? Because publishers are not stupid; libraries are their gravy train and they know that. The more they can convince librarians that it is somehow against the rules (whether “rules” means “law” or “norms” or even merely “etiquette,” and this does vary across publisher sallies) to cross or question them, the longer that gravy train keeps rolling. Researchers, you simply do not matter to publishers in the least until you credibly threaten a labor boycott or (heaven forfend) actually support librarian budget-reallocation decisions. The money is coming from librarians.
Last weekend the Streisand Effect reached the opinion pages of the New York Times with Kate Murphy's Should All Research Papers Be Free?, replete with quotes from Michael Eisen, Alicia Wise, Peter Suber and David Crotty. Alas, Murphy starts by writing "Her protest against scholarly journals’ paywalls". Sci-Hub isn't a protest. Calling something a protest is a way of labelling it ineffectual. Sci-Hub is a tool that implements a paywall-free world. Occupy Wall Street was a protest, but had  it actually built a functioning alternative financial system no-one would be describing it that way.

The result of the Streisand Effect has been, among other things, to sensitize the public to the issue of open access. Oxenham writes:
vast numbers of people who read the story thought researchers or universities received a portion of the fees paid by the public to read the journals, which contain academic research funded by taxpayers.
This clearly isn't in Elsevier's interest. So, having failed to shut down the services and garnering them a lot of free publicity, where does Elsevier go from here? I see four possible paths:
  • They can try to bribe the Russians to clamp down on the services, for example by offering Russian institutions very cheap subscriptions as a quid pro quo. But they only control a minority of the content, and they would be showing other countries how to reduce their subscription costs by hosting the services.
  • They can try to punish the Russians for not clamping down, for example by cutting the country off from Elsevier content. But this would increase the incentive to host the services.
  • They can sue their customers, the institutions whose networks are being used to access new content. In 2008 publishers sued Georgia State for:
    pervasive, flagrant and ongoing unauthorized distribution of copyrighted materials
    Eight years later the case is still being argued on appeal. But in the meantime the landscape has changed. Many research funders now require open access. Many institutions now require (but fail to enforce) deposit of papers in institutional repositories. Institutions facing publisher lawsuits would have a powerful incentive to enforce deposit, because their network isn't needed to leak open access content to Sci-Hub.
  • They can sue the sources of their content, the individual researchers who they may be able to trace as the source of Sci-Hub materials. This would be a lot easier if the publishers stopped authenticating via IP address and moved to a system based on individual logins. Although this would make life difficult for Sci-Hub-like services if they used malware-based on-campus proxies, it would also make using subscription journals miserable for the vast majority of researchers and thus greatly increase the attractiveness of open access journals. But the Library Loon correctly points out that Sci-Hub's database of credentials is a tempting target for the publishers and others to attempt to compromise.
None of these look like a winning strategy in the longer term. One wonders if Elsevier gamed out the consequences of their lawsuit. The cost of pay-per-view access is the reason Elbakyan gives for starting Sci-Hub:
“Prices are very high, and that made it impossible to obtain papers by purchasing. You need to read many papers for research, and when each paper costs about 30 dollars, that is impossible.”
It seems I was somewhat prophetic in pointing to the risk pay-per-view poses for the publishers in my 2010 JCDL keynote:
Libraries implementing PPV have two unattractive choices:
  • Hide the cost of access from readers. This replicates the subscription model but leads to overuse and loss of budget control.
  • Make the cost of access visible to readers. This causes severe administrative burdens, discourages use of the materials, and places a premium on readers finding the free versions of content.
Placing a premium on finding the open access copy is something publishers should wish to avoid.
Elsevier and the other major publishers have a fundamental problem. Their customers are libraries, but libraries don't actually use the content access they buy. The libraries' readers are the ones that use the access. What the readers want is a single portal, preferably Google, that provides free, instant access to the entire corpus of published research. As Elbakyan writes:
On the Internet, we obviously need websites like Sci-Hub where people can access and read research literature. The problem is, such websites oftenly cannot operate without interruptions, because current system does not allow it.
The system has to be changed so that websites like Sci-Hub can work without running into problems. Sci-Hub is a goal, changing the system is one of the methods to achieve it.
Sci-Hub is as close as anyone has come to providing what the readers want. None of the big publishers can provide it, not merely because doing so would destroy their business model, but also because none of them individually control enough of the content. And the publishers' customers don't want them to provide it, because doing so would reduce even further the libraries' role in their institutions. No-one would need "personal shoppers who moonlight as border guards".


David. said...

In this context Steven Curry's Zika virus initiative reveals deeper malady in scientific publishing is well worth a read:

"The real difficulty for the scientific community is that we remain tied to a publishing system that retards the dissemination of information because of its overwhelming preoccupation with using publications to award academic credit. Researchers will spend months chasing for spots in the most prestigious journals because at present we are locked into a system that judges us by the reputation of the journal where we publish, even though the measure of that reputation is inaccurate and dysfunctional.

In other words, the central problem is that our research ecosystem provides no incentives for publishing reliably, rapidly or openly – all features that one might hope to see in a system that works effectively."

David. said...

The Streisand Effect keeps rolling right along.

Amy Harmon at the New York Times reports in Handful of Biologists Went Rogue and Published Directly to Internet that biologists are starting to use the bioRxiv preprint server.

And John Willinsky has Sci-Hub: research piracy and the public good in Times Higher Education. He writes about his work with the Public Knowledge Project:

"As a result of all these efforts, you can now freely – and legally – access roughly a third of the research papers published in the past few years. We clearly have a long way still to go with this new economy. And ultimately, Elbakyan’s point that pirating research is different from pirating music is only part of the story.

More fundamentally, research represents a different order of intellectual property. This is not only because of the public and tax-exempt funding involved in its production and publication. It is because of how this work’s value and benefit is realised through others’ access to and use of this work."

Irene North said...

I, too, have written about Sci-Hub. It might be in a small newspaper, but people are paying attention and, hopefully, things will begin to change.

Frank Huysmans said...

There's a fifth way out for Elsevier and other publishers: jump on the open access train. In a gold or hybrid open access strategy money will keep flowing in, while the publications will not need to be pirated anymore to be 'liberated'. Perhaps this is what made Springer, SAGE, Elsevier and Wiley strike deals with the Dutch university libraries association - https://warekennis.nl/vsnu-wiley-not-such-a-big-deal-for-open-access/

David. said...

I have been remiss in not linking to How Much Does $1.7 Billion Buy You? A Comparison of Published Scientific Journal Articles to Their Pre-print Version by Sharon Farb et al from UCLA at the last CNI. It provides yet more data reinforce the skepticism I've expressed since 2007 about the value publishers provide in return for their extortionate margins:

"we present our preliminary results based on pre-print publications from arXiv.org and their post-print counterparts obtained through subscriptions held by the UCLA Library. After matching papers via their digital object identifiers (DOIs), we applied comparative analytics and evaluated the textual similarities of components such as the title, abstract, and body. The results of our assessment suggest that the vast majority of post-print papers are largely indistinguishable from their pre-print versions. These findings contribute empirical indicators to discussions of the value that academic publishers add to scholarly communication and therefore can influence libraries’ economic decisions regarding access to scholarly publications."

David. said...

Eric Hellman's post Sci-Hub, LibGen, and Total Information Awareness emphasizes some important warnings for users of Sci-Hub, listing many of the ways they can be tracked. The best advice is:

"The best solution for a user wanting to download articles privately is to use the Tor Browser and Sci-Hub's onion address, http://scihub22266oqcxt.onion. Onion addresses provide encryption all the way to the destination, and since SciHub uses LibGen's onion address for linking, neither connection can be snooped by the network. Google and Yandex still get informed of all download activity, but the Tor browser hides the user's identity from them. ...Unless the user slips up and reveal their identity to another web site while using Tor."

Even better advice is to use Tor via Tails, The Amnesic Incognito Live System, which ensures that there's no trace of your activity on your hardware.

As Eric writes:

"It might also be a good idea to use the Tor Browser if you want read research articles in private, even in journals you've paid for; medical journals seem to be the worst of the bunch with respect to privacy.

If publishers begin to take Sci-Hub countermeasures seriously (Library Loon has a good summary of the horribles to expect) there will be more things to worry about. PDFs can be loaded with privacy attacks in many ways, ranging from embedded security exploits to usage-monitoring links.

This isn't going to be fun for anyone."

David. said...

The Economist joins the chorus of main-stream media pointing out that the current scientific publishing system is not "fit for purpose" with Taking the online medicine, concluding:

"But if more researchers feel comfortable about uploading their work to preprint servers, it will break the stranglehold of elite journals on biomedical science and accelerate discovery. That would save millions of dollars. More importantly, it would save lives."

David. said...

Joi Ito has a thoughtful piece On Disobedience that is very relevant to this discussion.

David. said...

Hybrid journals are one of the ways major publishers have tried to defuse the open access movement while not imperilling their cash flow. David Matthews at Times Higher Education reports that Wellcome Trust is taking publishers to task for taking the money but not delivering the product:

"Elsevier and Wiley have been singled out as regularly failing to put papers in the right open access repository and properly attribute them with a creative commons licence.

This was a particular problem with so-called hybrid journals, which contain a mixture of open access and subscription-based articles.

More than half of articles published in Wiley hybrid journals were found to be “non-compliant” with depositing and licensing requirements, an analysis of 2014-15 papers funded by Wellcome and five other medical research bodies found.

For Elsevier the non-compliance figure was 31 per cent for hybrid journals and 26 per cent for full open access. In contrast, for PLOS, which only publishes full open access journals, all papers were compliant."

David. said...

Jared Linzon's piece in Fortune entitled The Grateful Dead as Business Pioneers is also highly relevant to this discussion.

David. said...

If you believe that the publishers are adding value by their high-quality peer review processes, you need to explain why papers are retracted from journals indexed in MEDLINE (which are "higher-quality" journals) at a rate of nearly two a day. The rate and the proportion of papers retracted are both increasing. These data from Alison McCook in Retraction Watch's 3000th post:

"The number of retracted articles jumped from 500 in Fiscal Year 2014 to 684 in Fiscal Year 2015 — an increase of 37%. But in the same time period, the number of citations indexed for MEDLINE — about 806,000 — has only increased by 5%."

David. said...

Elsevier's disregarding of the Streisand Effect is shared by the Chinese Government.:

“One can’t help but notice how the tactic is backfiring,” said William Nee, a China researcher for Amnesty International in Hong Kong. “Conducting an aggressive manhunt against anyone allegedly involved in commenting on the letter only serves to put more attention on the letter, giving it a longer shelf life.”

David. said...

I've linked before to the work of Carolyn Caffrey Gardner and Gabriel J. Gardner. Now, via Carl Straumsheim at Inside Higher Ed, I find their study of the motivations for sharing papers entitled Fast and Furious (at Publishers): The Motivations behind Crowdsourced Research Sharing. They are to be commended for studying this interesting phenomenon, especially as they cast doubt on my assertion that:

"The majority usage of these mechanisms was from researchers whose institutions would never subscribe, and who could not afford the extortionate pay-per-view charges."

Rather, they conclude that:

"if our sample is representative of the sharing population, the typical user is not a scientist toiling away in the developing world locked out of the scholarly community due to “the cost of knowledge.” Rather, she is a social or hard science researcher who has academic library privileges but prefers crowdsourced methods of obtaining access for any of the reasons enumerated above."

There are a number of reasons to doubt that their sample is representative. First, they have only 252 response to their survey, some incomplete. Second, although almost 80% of their sample are affiliated with Universities, this does not mean that the University to which they are affiliated subscribes to the material they need. Third, as they note, much of the traffic to LibGen comes from Iran and China, two countries where the probability of subscription is low.

However, their study does support my conclusion that:

"What the readers want is a single portal, preferably Google, that provides free, instant access to the entire corpus of published research."

As they write:

"Poor usability is also hindering our patrons from gaining access to materials. Librarians need to apply user experience thinking to all our online systems. At our respective libraries we have to click multiple times just to discover if an item is owned."

Compared to LibGen, this cognitive overhead is entirely due to the publisher's rent extraction. Its cost needs to be added to the value the publishers subtract.

David. said...

Ars Technica's contribution to the Streisand Effect focuses on the similarities between Alexandra Elbakyan and Aaron Swartz.

David. said...

Via Kevin Drum at Mother Jones, I find Brian Resnick's fascinating piece on oxytocin, replication and publication bias, How scientists fell in and out of love with the hormone oxytocin.

David. said...

Science amplifies the Streisand Effect with John Bohannon's must-read Who's Downloading Pirated Papers? Everyone. But he misses the mark when he writes:

"It’s hard to discern how threatened by Sci-Hub Elsevier and other major publishers truly feel, in part because legal download totals aren’t typically made public. An Elsevier report in 2010, however, estimated more than 1 billion downloads for all publishers for the year, suggesting Sci-Hub may be siphoning off under 5% of normal traffic."

The implication is that Elsevier et al are losing under 5% of their income. This is the same argument the record labels use, that each download is a lost sale.

But the publisher's don't lose any income from Sci-Hub downloads. If the institution has a subscription, they already got paid. Librarians are not going to cancel their Elsevier subscription because some of their readers use some illegal site.

If the institution doesn't, as the article starts by pointing out, the chance that the downloader could pay Elsevier is pretty much zero.

David. said...

Justin Peters at Slate comments on the Science article in "Everyone" Downloads Science Papers Illegally, blasting Marcia McNutt's editorial that accompanies it. He reinforces the point I made:

"It is strange to see McNutt argue that academic publishers work tirelessly to make their own websites convenient for readers, given that Bohannon’s reporting indicates that readers find these websites thoroughly inconvenient. For all the money these publishers are purportedly spending on state-of-the-art web design, it’s odd that their sites are nowhere near as easy to use as the search engine Elbakyan has been able to build—and for a larger data set, to boot. But I suspect that McNutt defines the word convenient far differently than most people do. The entire business model for academic publishing relies on successfully monetizing inconvenience. This perpetual state of inconvenience is the whole reason that Sci-Hub exists."

David. said...

John Dupuis takes umbrage at Bohannon's quote from "Anonymous Publisher" who:

"lays the blame on librarians for not making their online systems easier to use and educating their researchers. “I don’t think the issue is access—it’s the perception that access is difficult,” he says."

Dupuis reinforces Justin Peter's point:

"Are our systems difficult? Aren’t you publishers the ones that “break” the hyperlink ethos of the web by creating the paywalls in the first place? And aren’t you the ones who have a different interface created by each company that people have to learn?

Google and Google Scholars are the tools most scholars use to find papers and they bypass searching systems. What those researchers are finding hard to deal with is YOUR set of barriers and Tower of Babel systems across publishers. We’re trying to make it better, you’re trying to make it worse because that’s how you make your money.

As for educating our researchers — we do, or at least we try to. You try explaining to a young researcher how the one thing that doesn’t work like the rest of the web is finding journals. Proxy servers, VPNs, Interlibrary Loans systems, content aggregators, library discovery systems, one hack or barrier after another imposed by YOU.

No wonder they use Sci-Hub, which does work like the rest of the web."

David. said...

The data behind Bohannon's paper is up on Dryad.

David. said...

For a perspective on other non-free media in the Internet, see Bob Lefsetz' Music is the Future:

"You can’t even get all the films in one place online. And flicks and TV shows come and go on Netflix. As for the vaunted victories at Amazon, “Transparent” and “Mozart In The Jungle,” they may have accolades, but few people have seen them, because they’re behind a paywall most are not paying to get through and with so much noise in the channel, people who are paying don’t know they have this access. Whereas in music you can just go on Spotify, see the chart and find out what’s happening, everything can be clicked on."

David. said...

On the liblicense list, Toby Green of OECD asks:

"What is the share of SciHub downloads at subscribing institutions? If it becomes significant, then we are failing, if it isn't, then we're not."

Toby points out that the OECD publisher has about half as many annual downloads as the whole of Sci-Hub. Ivy Anderson responds that the University of California system, a pretty big University, had about the same number of annual downloads:

"But still, 30M downloads at just the University of California, large as we are (250k students and faculty), makes 47M SciHub downloads look like not such a big deal."

Yet another argument showing that Elsevier is unlikely to be losing any significant income from Sci-Hub.

David. said...

A much less safe way than the Tor Browser Bundle or Tails to access SciHub is to use one of the Tor2Web proxies such as OnionLink by browsing to http://scihub22266oqcxt.onion/link. These gateways connect the "light" and "dark" webs, but as OnionLink says in its security page:

"Although publishers remain anonymous, when you use OnionLink your internet service provider can see what content you are accessing. OnionLink trades privacy for speed and convenience. Do not use OnionLink if others discovering which onionsites you visit would be legally perilous."