Inevitably Open

It's either the beginning of the end or the end of the beginning.

August 1, 2017

According to a new study uploaded in preprint form to PeerJ and profiled in Science, subscription-based publishing is a doomed business. Scientists have flocked to Sci-Hub because it’s easy to use – no traipsing from one site to another, no fumbling for a credit card or a delay while an interlibrary-loan request is processed. Just drop in a DOI and there’s a good chance you’ll get the paper instantly or, if it’s not already in Sci-Hub, very shortly. The chances papers will be there are greatest for the largest publishers, including those that have brought suit against Sci-Hub. (Interestingly, the researchers charted Google searches for the site's name and saw they spike every time a suit is brought against the pirate aggregator.) The conclusion: “this is the beginning of the end for subscription scholarly publishing.”

Of course, there is one problem: libraries have to subscribe to these journals in the first place for them to end up in Sci-Hub, which relies on borrowed (or phished) library login credentials. What the popularity of Sci-Hub has demonstrated, though, is that the limited access libraries have to offer is less effective and efficient than what can happen in a less copyright-hobbled world. How are you going to keep the scientists down on the intellectual property farm once they’ve seen how much easier it could be?

Another recent Science article (hat tip to my friend Larry for pointing it out) reports a study conducted by Danish researchers (posted to the open repository BioRxiv) that found you can learn a lot more by data-mining the full text of a large numbers of scientific papers than by simply analyzing a lot of abstracts, which is what you do when you can’t get hold of the actual papers. The researchers created two pools of data: one, abstracts of over 15 million articles, the other the actual texts of those articles. I’m tired just thinking about it. They faced two big challenges: getting the publishers’ permission to mine their articles and making PDFs consistently machine-readable. They’re related issues: journal publishers use different formats which makes conversion to text files labor-intensive, and (no surprise) they aren’t willing to let researchers share the articles that have been painstakingly readied for text-mining. Though 15 million is a lot of articles, the authors drew on only three sources: the backfiles of two large publishers (Springer and Elsevier) and PMC from the National Institutes of Health. It would be incredibly valuable to be able to text-mine vast amounts of scientific knowledge, but that’s tricky when so many publishers are involved using different formats who have little incentive to make the literature available for mining. It would be sort of like mapping the human genome, but with more intellectual property lawyers.

I have no idea where we’re going with this. If the SciHub model succeeds, it will kill its host. Maybe there’s some poetic justice, there; publishers that depended on libraries budgets have slowly but surely been killing their hosts, too. Open platforms like ArXiv, Humanities Commons, BioRxiv, SocArxiv,  and similar author-upload initiatives are gaining ground. Big Deals are meeting with big resistance in Germany, where 60 universities have dropped their contracts with Elsevier, following similar fraught negotiations in the Netherlands, Finland, and Taiwan. Another site of resistance is among editorial boards. The Journal of Algebraic Combinatorics is the latest journal to be migrating to an open platform. Springer will keep the journal’s name, but the talent and the reputation are moving to a new address. (Similar “flipped” journals are moving from commericial publishers to the library-membership-funded open access platform Open Library of Humanities.)

There seem to be certain things that people want and have come to expect: to be able to read research without a hassle, to be able to share their own research without having to ask permission, and to be able to develop new ways to understand what the research is telling us without endless legal roadblocks. These things can only happen if research is open access. We’ll also need a lot of interoperability, standards, indexing and preservation plans, not to mention solving the knotty problem of sustainable funding models. Librarians and other OA activists are working on it.

But look at how much change we’ve adapted to in recent decades. A physics professor once told me what a marvelous thing it was when fax machines became common equipment. You didn’t have to wait days or weeks to read a preprint, it came in minutes! We’ll figure this out, and then we’ll wonder what took us so long. Though librarians have pushed OA for years, it’s really up to authors and societies to determine what the future looks like. All indicators are pointing toward open.

