Tuesday, April 14, 2015

The Maginot Paywall

Two recent papers examine the growth of peer-to-peer sharing of journal articles. Guilliame Cabanac's Bibliogifts in LibGen? A study of a text-sharing platform driven by biblioleaks and crowdsourcing (LG) is a statistical study of the Library Genesis service, and Carolyn Caffrey Gardner and Gabriel J. Gardner's Bypassing Interlibrary Loan via Twitter: An Exploration of #icanhazpdf Requests (TW) is a similar study of one of the sources for Library Genesis. Both implement forms of Aaron Swartz's Guerilla Open Access Manifesto, a civil disobedience movement opposed to the malign effects of current copyright law on academic research. Below the fold, some thoughts on the state of this movement.

In the years leading up to WWII, the French built the Maginot Line as an impregnable barrier against a German invasion:
While the fortification system did prevent a direct attack, it was strategically ineffective, as the Germans invaded through Belgium, going around the Maginot Line.
Copyright maximalists such as the major academic publishers, are in a similar position. The more effective and thus intrusive the mechanisms they implement to prevent unauthorized access, the more they incentivize "guerilla open access".

Some copyright owners are coming to terms with this phenomenon. Today, Hugh Pickens reports that the first 4 of the 10 episodes of Game of Thrones new season have leaked:
The episodes have already been downloaded almost 800,000 times, and that figure was expected to blow past a million downloads by the season 5 premiere. Game of Thrones has consistently set records for piracy, which has almost been a point of pride for HBO. "Our experience is [piracy] leads to more penetration, more paying subs, more health for HBO, less reliance on having to do paid advertising. If you go around the world, I think you're right, Game of Thrones is the most pirated show in the world. Well, you know, that's better than an Emmy."
LG shows the massive scale on which "guerilla open access" is happening in the field of academic journals. As of the study, Library Genesis hosted nearly 23M articles identified by DOI, 15TB of data. The distribution was heavily skewed to the major publishers, representing 77% of Elsevier's DOIs, 73% of Wiley's and 53% of Springer's, although only 36% of all DOIs. To give some idea of the scale, this is about 60% of Ontario's Scholar's Portal, which has 38M.

Although some open access DOIs are included, the motivation to upload them is much less. A recent estimate by Khabasa and Lee Giles is that 24% of all articles are openly accessible on the Web, their methodology excluded most content from Library Genesis. Not all DOIs from major publishers are paywalled, they publish some open access journals and allow Gold open access (author pays) in some cases. Despite these elements of double counting, it appears likely that at least a majority of all articles, and significantly more than a majority of major publisher articles, can be accessed without passing though a paywall.

Although the bulk of the Library Genesis content arrived via a small number of large uploads, the median upload rate is 2720 new articles/day. Among the sources for them are:
  • The Scholar subreddit, which LG estimates sees about 45 requests/day for articles to be shared via Library Genesis.
  • Sci-Hub, a service using proxies running on networks with subscriptions to paywalled publishers that allows users to enter a DOI. It it is not available from Library Genesis, the service tries proxies at random until one is found that can access the paper, which is both served to the user and added to Library Genesis.
Presumably, the #icanhazpdf hashtag is another of the Library Genesis upload paths. TW analyzed 824 requests from 475 users over 3 months, or about 10/day. 674 of them were for articles, from 493 different journal titles. The mechanism doesn't provide information about how many were satisfied, or how many of the results ended up on Library Genesis.

LG doesn't have an estimate of the Sci-Hub traffic, but unless it is very large there must be other mechanisms filling the large gap between the Scholar subreddit and #icanhazpdf rates and the Library Genesis median upload rate.

Admittedly, it takes time for newly published articles to appear outside their paywalls. Some publishers operate "moving walls", so their articles become open access after an embargo period. It takes time for the various mechanisms driving Library Genesis to locate and upload articles. LG shows that their most recent year (2013) has only about half as many articles as the previous year, so the average delay is similar to the moving wall.

Paying to pass through paywalls thus delivers some value, not just access to a minority of the content but also more timely access to some of the majority. Nevertheless, the multi-billion dollar profits of the major publishers, let alone the other multiple billions that represent their costs in supplying their services, are hard to justify. We have already seen that their peer review process fails in its assigned role of ensuring the quality of the papers they publish. Now we see that the majority of the content for which they charge these enormous sums is available without payment.

My previous posts on scholarly communication.


David. said...

Library Genesis is apparently based in Russia, which has just strengthened its anti-piracy law.

David. said...

TorrentFreak reports:

"Elsevier has filed a complaint at a New York District Court, hoping to shut down the Library Genesis project and the SciHub.org search engine."

David. said...

The Register reports on a talk by "Storm Harding" at the Chaos Computer Club about the watermarking and other techniques publishers use to track down the sources for systems such as Library Genesis.

David. said...

Glyn Moody at Techdirt posts Copyright Fail: 'Pirating' Academic Papers Not Only Commonplace, But Now Seen As Mainstream pointing to two mainstream pieces about #icanhazpdf, from Quartz and BBC News, that treat the sharing of research articles as a reasonable thing to do. Moody writes:

"But what's striking is that after mentioning that this kind of activity may be against the law, there's none of the traditional hand-wringing about "piracy", and how it will end Western civilization as we know it unless tough measures are brought in to stop it."

David. said...

On the other hand, Glyn Moody also reports that Canadian Judge Says Asking For A Copy Of A Legally-Obtained But Paywalled Article Is Circumvention. The person who asked for a copy was hit with C$11,470 plus tax plus C$2000 punitive damages.

David. said...

Fiona MacDonald at Science Alert has an update on Sci-Hub.

David. said...

So does Kieren McCarthy at The Register.

David. said...

Mike Masnick at TechDirt points out that Elsevier has triggered the Streisand Effect for Sci-Hub.

David. said...

Yes, Streisand Effect has been triggered - see Simon Oxenham's piece at BigThink Meet the Robin Hood of Science.