New Measures of Scholarly Impact

Data analytics are changing the ways to judge the influence of papers and journals.
December 17, 2010

Higher education might be a high-end marketplace of ideas, but its mechanisms for taking inventory for that marketplace have been, until recently, relatively basic. The method for measuring the influence of journals and authors by counting the number of times their articles are cited by other articles — called the “impact factor” — has hardly changed since 1955, when it was created by Eugene Garfield, a University of Pennsylvania graduate student who went on to found the first citation database, the Institute for Scientific Information (now owned by Thomson Reuters).

But the way researchers read journal articles has changed, especially in the sciences. “If you look at the traffic, it’s pretty clear that most scholarly communications is consumed online, not in print,” says Johan Bollen, an associate professor of informatics at the Indiana University at Bloomington.

Bollen is principal investigator for MESUR (Metrics for Scholarly Usage of Resources), a project founded in 2006 on a grant from the Andrew W. Mellon Foundation, that is trying to shift how scholarly impact is measured away from citations — which he describes as inherently “backwards-looking … kind of like astronomers looking at a galaxy whose light reaches us 50 million years after the events that cause that light to happen” — and toward the sort of real-time usage metrics that Web-based consumption enables.

“If you look at the role citations have played in scholarly assessment, it’s very clear that citations originated when most scholarly publications are printed and consumed via print,” Bollen says.

These days, the availability of “usage data” — information on how many times a digital article has been downloaded, and in what context — means that people like Bollen can track the spread of an idea in a scholarly community using the same principles that epidemiologists use to track the spread of a virus in a village. Usage data do not just mean how many times an article is downloaded; they also mean breaking down the browsing patterns of researchers using the scholarly literature. Drawing from a database that includes 346,312,045 "user interactions" around digital versions of articles stored by Thomson Reuters and others, the MESUR team assessed journal impact across dozens of technical dimensions — such as "betweenness centrality," a metric that assesses whether a publication often serves as a bridge as scholars browse from article to article, a pattern that implies “strong interdisciplinary appeal, high influence, high prestige, and high popularity,” according to Bollen.

Now that so much journal consumption is digital, the MESUR team is confident that its analysis paints a pretty good picture of influence in the scholarly community writ large, not just a tiny subset.

Bollen and his colleagues are not the only ones applying social network analysis to scholarly publishing. Eigenfactor, a project based at the University of Washington, measures the influence of scholarly journals using the old-fashioned method of counting citations, but adds an algorithmic wrinkle similar to the one that Google deploys in ranking search returns: when evaluating the impact of a journal, Eigenfactor takes into account how many times that journal is cited by other journals that are themselves frequently cited. (Google ranks search returns using a similar method, but with hyperlinks instead of citations.) The idea is to control for the fact that, as with Google, “a single citation from a high-quality journal may be more valuable than multiple citations from peripheral publications,” wrote Carl T. Bergstrom, one of the project’s developers, shortly after it launched in 2007. (Thomson Reuters has since incorporated Eigenfactor's weighted metrics into its famous annual Journal Citation Report, alongside Garfield's original impact formula.)

Jevin West, another member of the Eigenfactor development team, told Inside Higher Ed that Eigenfactor is currently looking into how to incorporate real-time usage data, as well as “citation-like references” from scholars on social networking sites. Shout-outs on scholarly blogs, Twitter, and Facebook, along with digital dog-earing on social bookmarking sites such as CiteULike and Connotea, might also be used as proxies for influence. Those data are “not quite as clean or as well-defined as academic citations,” West says, but they could prove relevant to assessing the impact of a particular article. After all, scholars are not influenced only by the articles they end up formally citing in a paper.

Some online journals are already publishing social media impact metrics alongside their articles. The Public Library of Science, widely known as PloS, which publishes seven open-access journals, earlier this fall began sharing not only how many times an article has been cited by other academic articles, but also how many times it has been commented on, rated, blogged about, hyperlinked, and bookmarked online. And last week the publishing behemoth Springer, which publishes 1,750 digital journals, announced a real-time tracking application that shows where and when different articles are being downloaded.

Thomson Reuters, inheritor of Garfield's original citation database, says it is hardly ignoring the trends toward usage-based metrics. "We continue to follow the initiatives in the library and publishing communities regarding usage, access and download-based metrics," Marie McVeigh, director of the company's Journal Citation Reports, told Inside Higher Ed via e-mail, "not only for full text, but for the increasingly diverse range of materials that are a part of emerging methods of scholarly discourse."

What Does This Mean?

The trend toward data-driven assessment of scholarly communications has implications for how much university libraries are asked to pay, and are willing to pay, for different journals. For example, Eigenfactor uses its impact scores to rank journals based on how influential they are versus how much they cost — a potentially crucial variable for librarians who are deciding which subscriptions to renew at a time when journal prices are climbing and budgets are tight.

The American Chemical Society, which publishes 38 different journals, in 2008 started keeping track of how many times articles from each of its journals were being downloaded at individual campuses and setting different prices, based on usage, for each subscribing library.

This upset some librarians who saw their rates rise. But in some cases, librarians might be able to use their own institution’s usage data to their advantage. “Many librarians have informed me that they use usage data to build ratios for price-per-use,” says Roger Schonfeld, manager of research for the nonprofit Ithaka S+R. “If you have one provider that has a higher price than another provider, that would be a way of trying to estimate the value received from different resources.” Librarians might also use it as leverage when trying to negotiate a lower subscription rate, Schonfeld says.

There are also individual implications for the scholars who write the articles and the agencies that fund their research. Since it was founded by Garfield in 1955 and especially since citation databases went digital, various institutions have reportedly used the journal impact factor as a way of evaluating candidates for promotion and tenure based on where they’ve published, and granting agencies might take the number into account when deciding which scientists to sponsor.

Mendeley, a fast-growing online network that uses an algorithm to make recommendations to users based on their academic interests, is taking steps toward coming up with a system that rates the impact of individual scholars based on how well their articles have done. Such novel iterations of the "impact factor” should not be given undue weight in deciding who should get tenure or whose research should get funded, says Jan Reichelt, Mendeley’s president. But more data is better than less data.

“There’s this pressure to get data to make these decisions,” says Bollen, of MESUR. “And now we have the data.”

However, Bollen is careful to note that the value of all the data that come with online scholarly communications is not the ability to boil down the importance of a journal or a researcher to a single number. To the contrary, more layers of data mean it is possible to add more layers of nuance to the half-century tradition of measuring scholarly impact — a practice that has been criticized in the past for its simplicity.

For the latest technology news and opinion from Inside Higher Ed, follow @IHEtech on Twitter.


Be the first to know.
Get our free daily newsletter.


Back to Top