Predictive Analytics for Publishing

Start-up uses data analytics to tackle information overload among researchers and publishers in science, medical and technical fields.

May 10, 2016

There are more than 34,000 scientific, medical and technical peer-reviewed scholarly journals in the world. They publish nearly 2.5 million articles a year -- about an article every 13 seconds. In a field of more than seven million researchers, how is anyone supposed to stay up-to-date? And for publishers, how do they know which manuscripts will capture readers’ interest?

By some measures, the problem is only getting worse. According to the International Association of Scientific, Technical and Medical Publishers, the number of journals and articles published in those fields has grown by between 3 and 3.5 percent a year for the past two centuries. When the organization in 2015 took inventory of that segment of the publishing market, it noted that growth appears to be increasing because of an influx of new researchers.

“You’ve got this global awareness problem,” said Sam Molyneux, a former cancer researcher who co-founded Meta, a Toronto-based start-up. “You can essentially capture it as, nobody has any idea what’s being published on a daily or weekly basis.”

But newcomers who need to catch up aren’t the only ones suffering from information overload, Molyneux said. It extends to those who need to keep up -- clinicians treating patients, investors pondering whom to fund and government officials debating how to regulate.

Meta, formerly known as Sciencescape, believes data analytics can benefit all of those groups. The company has built a “knowledge graph” similar to the one Google uses for its own search service (though, at the moment, only for biomedical research) that aggregates data about who publishes, and who reads, what.

The company has struck deals with publishers in science, technology and mathematics fields, which give the company access to full-text versions of more than 18,000 journals. Using machine reading and natural language processing, Meta scans those articles -- as well as the millions of articles stored in open-access repositories -- collecting information about authors, citations and topics. The participating publishers receive exposure for their journals in return.

The result, according to Meta’s website, “is a continuously evolving knowledge graph that knows every paper, person and entity that make up the universe of science,
 and how they are connected.”

The public-facing side of the knowledge graph is a free literature discovery platform, Meta Science, which was introduced last month. At the time of launch, the platform featured more than 62 million journals, researchers and topics. Like Google, Meta Science learns about its users’ search patterns, ranking articles based on their interests and tracking trending topics.

Meta hopes to make money by selling information gleaned from its knowledge graph to publishers and life science companies. For publishers, for example, the company says it can predict emerging research topics, recommend reviewers and estimate how many citations a manuscript will generate before it is published. A spokesperson said Meta is in final contract negotiations with the first few publishers that have signed up for those services.

“It’s a bird’s-eye view of what’s happening in science,” Molyneux said about Meta’s knowledge graph. “It essentially flags concepts, technologies and focal areas that [publishers] should be paying attention to.”

Meta is the latest example of how initiatives primarily meant to benefit medical science are influencing core functions of academe. The same phenomenon can be seen in the realm of student retention, where a handful of colleges are using a strategy from the health care industry to identify and advise students who may be at risk of dropping out.

With Meta Science, the company is also responding to shifting research patterns among scholars. Ithaka S+R, a nonprofit research and consulting firm, routinely surveys faculty members about information use and their attitude toward libraries. The most recent edition of that survey, released last month, found that many faculty members are no longer starting their research by searching electronic databases, but rather by going to their library’s website or doing a simple web search.

One potential challenge facing Meta's abilities to anticipate the scientific headlines of tomorrow is the fact that it relies on scholarly research to make those predictions. A topic that hasn't yet been written about won't emerge as a trending topic -- though search data from Meta Science might alert the company to what researchers are looking for.

“There’s always going to be a fraction of information that doesn’t get published in articles,” Molyneux said. “There’s also the unpublished leading edge of science, but a lot of that is migrating into the articles on a rolling basis.” He added that Meta is going to keep developing services that analyze manuscripts well before they are published.

While the company may expand into science more generally and has even spoken with people in the social sciences, Molyneux said Meta will continue to focus on biomedicine, offering services to companies, publishers and researchers in the field.


Back to Top