The long tail of non-English science

Ben Panko's English Is the Language of Science. That Isn't Always a Good Thing is based on Languages Are Still a Major Barrier to Global Science, a paper in PLOS Biology by Tatsuya Amano, Juan P. González-Varo and William J. Sutherland. Panko writes:
For the new study, Amano's team looked at the entire body of research available on Google Scholar about biodiversity and conservation, starting in the year 2014. Searching with keywords in 16 languages, the researchers found a total of more than 75,000 scientific papers. Of those papers, more than 35 percent were in languages other than English, with Spanish, Portuguese and Chinese topping the list.

Even for people who try not to ignore research published in non-English languages, Amano says, difficulties exist. More than half of the non-English papers observed in this study had no English title, abstract or keywords, making them all but invisible to most scientists doing database searches in English.
It has long been a problem that the resources for preserving e-journal content were almost exclusively devoted to providing post-cancellation access rather than to preserving the academic record (Both links are from 2007). In other words, resources went to preserving content that, because it was expensive, was not at risk. Estimates of how much of the record was being preserved ranged from a half down. It is clear that the expensive, low-risk content is almost exclusively in English.

Over the years, the LOCKSS team have made several explorations of the long tail. Among these were a 2002 meeting of humanities librarians that identified high-risk content such as World Haiku Review and Exquisite Corpse, and work funded by the Soros Foundation with South African librarians that identified fascinating local academic journals in fields such as dry-land agriculture and AIDS in urban settings. Experience leads to two conclusions:
  • Both subject and language knowledge is important to identifying the worthwhile long-tail content.
  • Long-tail content in English is likely to be open access; in other languages much more is subscription.
Both were part of the motivation behind the LOCKSS Program's efforts to implement National Hosting networks. Librarians in-country are far more capable of identifying, and negotiating with the publishers of, worthwhile long-tail content than we are. An example is Brazil's Cariniana, established by consortia of libraries to preserve their national open access academic literature, mostly in Portuguese.

Despite its importance, as shown by Amano et al, few if any individual libraries have the resources to collect and preserve their national language academic literature.  The collaborative networks of libraries engendered by the LOCKSS technology can operate at a national scale to address this problem more effectively and affordably.

