In the years leading up to WWII, the French built the Maginot Line as an impregnable barrier against a German invasion:
While the fortification system did prevent a direct attack, it was strategically ineffective, as the Germans invaded through Belgium, going around the Maginot Line.Copyright maximalists such as the major academic publishers, are in a similar position. The more effective and thus intrusive the mechanisms they implement to prevent unauthorized access, the more they incentivize "guerilla open access".
Some copyright owners are coming to terms with this phenomenon. Today, Hugh Pickens reports that the first 4 of the 10 episodes of Game of Thrones new season have leaked:
The episodes have already been downloaded almost 800,000 times, and that figure was expected to blow past a million downloads by the season 5 premiere. Game of Thrones has consistently set records for piracy, which has almost been a point of pride for HBO. "Our experience is [piracy] leads to more penetration, more paying subs, more health for HBO, less reliance on having to do paid advertising. If you go around the world, I think you're right, Game of Thrones is the most pirated show in the world. Well, you know, that's better than an Emmy."LG shows the massive scale on which "guerilla open access" is happening in the field of academic journals. As of the study, Library Genesis hosted nearly 23M articles identified by DOI, 15TB of data. The distribution was heavily skewed to the major publishers, representing 77% of Elsevier's DOIs, 73% of Wiley's and 53% of Springer's, although only 36% of all DOIs. To give some idea of the scale, this is about 60% of Ontario's Scholar's Portal, which has 38M.
Although some open access DOIs are included, the motivation to upload them is much less. A recent estimate by Khabasa and Lee Giles is that 24% of all articles are openly accessible on the Web, their methodology excluded most content from Library Genesis. Not all DOIs from major publishers are paywalled, they publish some open access journals and allow Gold open access (author pays) in some cases. Despite these elements of double counting, it appears likely that at least a majority of all articles, and significantly more than a majority of major publisher articles, can be accessed without passing though a paywall.
Although the bulk of the Library Genesis content arrived via a small number of large uploads, the median upload rate is 2720 new articles/day. Among the sources for them are:
- The Scholar subreddit, which LG estimates sees about 45 requests/day for articles to be shared via Library Genesis.
- Sci-Hub, a service using proxies running on networks with subscriptions to paywalled publishers that allows users to enter a DOI. It it is not available from Library Genesis, the service tries proxies at random until one is found that can access the paper, which is both served to the user and added to Library Genesis.
LG doesn't have an estimate of the Sci-Hub traffic, but unless it is very large there must be other mechanisms filling the large gap between the Scholar subreddit and #icanhazpdf rates and the Library Genesis median upload rate.
Admittedly, it takes time for newly published articles to appear outside their paywalls. Some publishers operate "moving walls", so their articles become open access after an embargo period. It takes time for the various mechanisms driving Library Genesis to locate and upload articles. LG shows that their most recent year (2013) has only about half as many articles as the previous year, so the average delay is similar to the moving wall.
Paying to pass through paywalls thus delivers some value, not just access to a minority of the content but also more timely access to some of the majority. Nevertheless, the multi-billion dollar profits of the major publishers, let alone the other multiple billions that represent their costs in supplying their services, are hard to justify. We have already seen that their peer review process fails in its assigned role of ensuring the quality of the papers they publish. Now we see that the majority of the content for which they charge these enormous sums is available without payment.
My previous posts on scholarly communication.