Essay on usage statistics and the research library

Going Meta on the Data

You leave digital footprints when you do research. Scott McLemee listens to the librarians who follow them.

You have /5 articles left.
Sign up for a free account or log in.

In a passage surely written with tongue in cheek, Friedrich Nietzsche states that a scholar of his era would consult 200 volumes in the course of a working day. That’s far too many, he suggests: a symptom of erudition’s decay into feeble bookwormery and derivative non-thinking. “During the time that I am deeply absorbed in my work,” he says, “no books are found within my reach; it would never occur to me to allow anyone to speak or even to think in my presence.” A noble modus vivendi, if not quite an admirable one, somehow.

Imagine what the philosopher would make of the 21st century, when you can carry the equivalent of the library of Alexandria in a flash drive on your keychain. Nietzsche presents the figure of 200 books a day as “a modest assessment” – almost as if someone ought to do an empirical study and nail the figure down. But we’re way past that now, as one learns from the most recent number of Against the Grain.

ATG is a magazine written by and for research librarians and the publishers and vendors that market to them. In the new issue, 10 articles appear in a section called “Perspectives on Usage Statistics Across the Information Industry.” The table of contents also lists a poem called “Fireworks” as part of the symposium, though that is probably a mistake. (The poem is, in fact, about fireworks, unless I am really missing something.)

Some of the articles are a popularization -- relatively speaking -- of discussions that have been taking place in venues with titles like the Journal of Interlibrary Loan, Document Delivery & Electronic Reserves and Collections Management. Chances are the non-librarians among you have never read these publications, or even seen them at a great distance, no matter how interdisciplinary you seek to be. For that matter, discussing the ATG articles at any length in this column would risk losing too many readers. They are peer communications. But the developments they address are worth knowing about, because they will undoubtedly affect everyone’s research, sooner or later, often in ways that will escape most scholars’ notice.

Most of us are aware that the prominence and influence of scholarly publications can be quantified, more or less. The Social Science Citation Index, first appearing in 1956, is an almost self-explanatory case.

As an annual list of the journal articles where a given paper or book has been cited, SSCI provides a bibliographical service. Counting the citations then yields bibliometric data, of a pretty straightforward kind. The metric involved is simplicity itself. The number of references to a scholarly text in the subsequent literature, over a given period of time, is a rough and ready indicator of that text’s influence prominence during said period. The reputation of an author can be similarly quantified, hashmark style.

A blunt bibliometric instrument, to be sure. The journal impact factor is a more focused device, measuring how often articles in a journal have been cited over a two-year period relative to the total number of articles in the same field, over the same period. The index was first calculated in the 1970s by what is now Thompson Reuters, also the publisher of SSCI. But the term “journal impact factor” is generic. It applies to the IDEAS website’s statistical assessment of the impact of economic journals, which is published by the Research Division of the Federal Reserve Bank of St. Louis. And there's the European Reference Index for the Humanities, sponsored by European Science Foundation, which emerged in response to dissatisfaction with “existing bibliographic/bibliometric indices” for being “all USA-based with a stress on the experimental and exact sciences and their methodologies and with a marked bias towards English-language publication.”

As the example of ERIH may suggest, bibliometric indices are not just a statistical matter. What gets counted, and how, is debatable. So is the effect of journal impact factors on the fields of research to which they apply – not to mention the people working in those fields. And publication in high-impact journals can be a career-deciding thing. A biologist and a classicist on a tenure committee will have no way of gauging how good the candidate’s work on astrophysics is, as such. But if the publications are mostly in high-impact journals, that’s something to go by.

The metrics discussed in the latest Against the Grain are newer and finer-grained than the sort of thing just described. They have been created to help research libraries track what in their collections is being used, and how often and intensively. And that, in turn, is helpful in deciding what to acquire, given the budget. (Or what not to acquire, often enough, given what’s left of the budget.)

One contributor, Elizabeth R. Lorbeer, associate director for content management for the medical library at the University of Alabama at Birmingham, says that the old way to gauge which journals were being used was to look at the wear and tear on the bound print volumes. Later, comparing journal-impact factors became one way to choose which subscriptions to keep and which to cancel. But it was the wrong tool in some cases. Lorbeer writes that she considered it “an inadequate metric to use in the decision-making process because sub-discipline and newer niche areas of research were often published in journals with a lower impact factor.”

From the bibliometric literature she learned of another statistical tool: the immediacy index, which measures not how often a journal is cited, but how quickly. In some cases, a journal with a low impact factor might have a higher immediacy index, as would be appropriate for work in cutting-edge fields.

She also mentions consulting the “half-life” index for journals – a metric as peculiar, on first encounter, as the old “count the footnote citations” method was obvious. It measures “the number of publication years from the current year which account for 50 percent of current citations received” of articles from a given journal. This was useful for determining which journals had a long-enough shelf life to make archiving them worthwhile.

Google Scholar is providing a number of metrics – the h-index, the h-core, and the h-median – which I shall mention, and point out, without professing to understand their usefulness. Lorbeer refers to a development also covered by Inside Higher Ed earlier this year: a metric based on Twitter references, to determine the real-time impact of scholarly work.

One day a Nietzsche specialist is going to be praised for writing a high-twimpact paper, whereupon the universe will end.

Other contributions to the ATG symposium paint a picture of today’s research library as a mechanism incessantly gathering information as well as making it available to its patrons – indeed, doing both at the same time. Monitoring the flow of bound volumes in and out of the library makes it relatively easy to gauge demand according to subject heading. And with digital archives, it’s possible to track which ones are proving especially useful to students and faculty.

A survey of 272 practicing librarians in ATG’s subscriber base, conducted in June of this year, shows that 80 percent “are analyzing [usage of] at least a portion of their online journal holdings,” with nearly half of them doing so for 75 to 100 percent of those holdings. It’s interesting to see that the same figure – 80 percent – indicated that “faculty recommendations and/or input” was used in making decisions about journal acquisitions. With book-length publications entering library holdings in digital form, the same tools and trends are bound to influence monograph acquisition. Some of the articles in the symposium indicate that it’s already happening.

Carbon-based life forms are still making the actual decisions about how to build the collections. But it’s not hard to imagine someone creating an algorithm that would render the whole process cybernetic. Utopia or nightmare? I don't know, but we're probably halfway there.