The carnage and manhunt in Boston last week obliged the Digital Public Library of America to postpone its grand opening festivities at the Boston Public Library until sometime this fall. So sudden a change of plans could only create a logistical nightmare. The roster of museums, archives, and libraries participating in DPLA runs into the hundreds, and the two-day event (Thursday and Friday) was booked to capacity, with scores of people on the standby list. But the finish line for the marathon was just outside the library, and rescheduling unavoidable.
The delay applied only to the gala, not to DPLA itself: the site launched on Thursday at noon, E.S.T., right on schedule. The response online has been, for the most part, enthusiasm just short of euphoria. The collection contains not quite 2.4 million digital “objects,” including books, manuscripts, photographs, recorded sound, and film/video. More impressive than the quantity of material, though, is how much thought has gone into how it’s made available.
That’s true even of the site’s address: DP.LA. I’ve seen at least one grumble about how anomalous this looks. Which it does, but in a good way. Even if you forget the address, it takes no effort to reconstruct. The brevity of the URL makes it convenient to type on a cellphone; when you do, the site’s homepage is readily navigable on the small screen. That demonstrates an awareness of how a good many visitors will actually use the site – more so than is often the case with library catalogs online.
DPLA is the work of people who understand that design is not just icing on the digital cake, but a significant (even decisive) factor in how we engage with content in the first place. They have made available an application program interface (API) for the site, which is a very useful thing indeed, according to my source in the geek community. With the API, users can create new tools for sorting and presenting the library’s materials. Combine it with a geolocation API, for example, and you could put together an application displaying the available photographs of the street you are on, organized decade by decade.
The library’s potential for assembling and integrating an incredible range of documents and knowledge is almost unimaginable. Excitement seems appropriate. But in describing my own impressions of DPLA, I want to be a little more qualified about the enthusiasm it inspires. Things are not nearly as far along as some comments have implied. This isn’t just naysaying. The site is currently in its beta version, and many of my points will probably be nullified in due course. But it’s better to be aware of some of the limitations beforehand than to visit the site expecting a digital Library of Alexandria.
One thing to keep in mind is that DPLA is not so much a library as an enormous card catalog, with the “shelves” of books, photographs, and so forth being the digital collections of libraries and historical societies, large and small, all over the country. The range of material offered through the Digital Public Library of America reflects what people running the local collections have decided to digitize and make available. What DPLA gathers and makes searchable is the metadata: descriptions of what a document contains (its subject, origins, copyright status, and so on) and of its characteristics as a digital object (size and file type).
The DPLA “card” gives the available information about an item, often accompanied by a thumbnail image of the book cover, manuscript, etc. – along with a link taking you to the digital repository in which it appears. DPLA puts the metadata into a standard format. But much of the content-description will inevitably be done by local librarians and archivists, making for a considerable range in detail. Often the DPLA entry will provide a bare minimum of description, though some entries run to a paragraph or two.
But the entry is only as strong as its link. It seemed appropriate to make one of my earliest searches at the Digital Public Library for the quintessential American poet Walt Whitman. There were 52 hits, with 9 of the top 10 being manuscripts of his letters in the Department of Justice collection at the National Archives. Not one of the links for the letters worked. By contrast, I had no trouble getting access to photographs of the poet held by the Smithsonian Institution.
This proved par for the course. Most links worked -- but out of two dozen entries for items in National Archives, only one did. It’s hardly surprising (gremlins have a strong work ethic), but it shows the need for troubleshooting. Users of the library can be expected to point out such glitches, if encouraged to do so. It might be worth adding a widget that would appear in each record allowing users to flag an inoperative link, a typographical error, or some problem with the content description. It's true that the site has a contact page, but people are more likely to report errors if they are encouraged to do so.
Continued thumbing through the catalog demonstrated how early a stage DPLA is in accumulating its collection – and how much fine-tuning its search engine may need.
Entering “Benjamin Franklin,” you get more than 1,400 results. Out of the first 30, all but 3 are documents (usually death certificates) for people named after the inventor and statesman. A toolbar on the left allows the user to refine the search in various ways – but the most useful filter, by subject, is at the very bottom and easy to overlook.
It was encouraging to get 17 results when searching for Phyllis Wheatley, the first published African-American poet, but 15 of them led to records from the 1940 census, by which point she had been dead the better part of 150 years. Only one of the other two was at all germane to her as historical figure. The other concerned an Atlanta branch of the Young Women’s Christian Association named in her honor.
I expected to locate just a few things about the Southern Tenant Farmers Union of the 1930s, but in fact got no hits at all. At the other extreme, DPLA has records for more than 90 items pertaining to the Ku Klux Klan – photographs, handbills, and cartoons, both pro- and anti-. Quite likely these were among the most striking and attention-grabbing items in various collections, and were digitized for use in print publications and online. It's concrete evidence that the Digital Public Library of America's offerings will be only as representative as the decisions made by the contributing institutions.
A number of foundations and government agencies have lent their support to DPLA, and its progress towards incorporation as a 501(c)3 organization should make it an even more appealing destination for the big philanthropic bucks. But important as funding certainly is for the library’s future, what it will ultimately be decisive for its success is a massive infusion of intellectual capital. Some of it will come from code writers hacking out new applications using the library's metadata and API. More than that, though, DPLA will need to encourage the participation and the expertise of people using the site. It's an impressive foundation and scaffold, but it's up to scholars, librarians, and other knowledgeable citizens to build the library, from the ground up.
After a successful pilot, JSTOR is launching its Register & Read program, which lets anyone read up to three articles from 1,200 of its journals every two weeks in exchange for demographic information.
In a passage surely written with tongue in cheek, Friedrich Nietzsche states that a scholar of his era would consult 200 volumes in the course of a working day. That’s far too many, he suggests: a symptom of erudition’s decay into feeble bookwormery and derivative non-thinking. “During the time that I am deeply absorbed in my work,” he says, “no books are found within my reach; it would never occur to me to allow anyone to speak or even to think in my presence.” A noble modus vivendi, if not quite an admirable one, somehow.
Imagine what the philosopher would make of the 21st century, when you can carry the equivalent of the library of Alexandria in a flash drive on your keychain. Nietzsche presents the figure of 200 books a day as “a modest assessment” – almost as if someone ought to do an empirical study and nail the figure down. But we’re way past that now, as one learns from the most recent number of Against the Grain.
ATG is a magazine written by and for research librarians and the publishers and vendors that market to them. In the new issue, 10 articles appear in a section called “Perspectives on Usage Statistics Across the Information Industry.” The table of contents also lists a poem called “Fireworks” as part of the symposium, though that is probably a mistake. (The poem is, in fact, about fireworks, unless I am really missing something.)
Some of the articles are a popularization -- relatively speaking -- of discussions that have been taking place in venues with titles like the Journal of Interlibrary Loan, Document Delivery & Electronic Reserves and Collections Management. Chances are the non-librarians among you have never read these publications, or even seen them at a great distance, no matter how interdisciplinary you seek to be. For that matter, discussing the ATG articles at any length in this column would risk losing too many readers. They are peer communications. But the developments they address are worth knowing about, because they will undoubtedly affect everyone’s research, sooner or later, often in ways that will escape most scholars’ notice.
Most of us are aware that the prominence and influence of scholarly publications can be quantified, more or less. The Social Science Citation Index, first appearing in 1956, is an almost self-explanatory case.
As an annual list of the journal articles where a given paper or book has been cited, SSCI provides a bibliographical service. Counting the citations then yields bibliometric data, of a pretty straightforward kind. The metric involved is simplicity itself. The number of references to a scholarly text in the subsequent literature, over a given period of time, is a rough and ready indicator of that text’s influence prominence during said period. The reputation of an author can be similarly quantified, hashmark style.
A blunt bibliometric instrument, to be sure. The journal impact factor is a more focused device, measuring how often articles in a journal have been cited over a two-year period relative to the total number of articles in the same field, over the same period. The index was first calculated in the 1970s by what is now Thompson Reuters, also the publisher of SSCI. But the term “journal impact factor” is generic. It applies to the IDEAS website’s statistical assessment of the impact of economic journals, which is published by the Research Division of the Federal Reserve Bank of St. Louis. And there's the European Reference Index for the Humanities, sponsored by European Science Foundation, which emerged in response to dissatisfaction with “existing bibliographic/bibliometric indices” for being “all USA-based with a stress on the experimental and exact sciences and their methodologies and with a marked bias towards English-language publication.”
As the example of ERIH may suggest, bibliometric indices are not just a statistical matter. What gets counted, and how, is debatable. So is the effect of journal impact factors on the fields of research to which they apply – not to mention the people working in those fields. And publication in high-impact journals can be a career-deciding thing. A biologist and a classicist on a tenure committee will have no way of gauging how good the candidate’s work on astrophysics is, as such. But if the publications are mostly in high-impact journals, that’s something to go by.
The metrics discussed in the latest Against the Grain are newer and finer-grained than the sort of thing just described. They have been created to help research libraries track what in their collections is being used, and how often and intensively. And that, in turn, is helpful in deciding what to acquire, given the budget. (Or what not to acquire, often enough, given what’s left of the budget.)
One contributor, Elizabeth R. Lorbeer, associate director for content management for the medical library at the University of Alabama at Birmingham, says that the old way to gauge which journals were being used was to look at the wear and tear on the bound print volumes. Later, comparing journal-impact factors became one way to choose which subscriptions to keep and which to cancel. But it was the wrong tool in some cases. Lorbeer writes that she considered it “an inadequate metric to use in the decision-making process because sub-discipline and newer niche areas of research were often published in journals with a lower impact factor.”
From the bibliometric literature she learned of another statistical tool: the immediacy index, which measures not how often a journal is cited, but how quickly. In some cases, a journal with a low impact factor might have a higher immediacy index, as would be appropriate for work in cutting-edge fields.
She also mentions consulting the “half-life” index for journals – a metric as peculiar, on first encounter, as the old “count the footnote citations” method was obvious. It measures “the number of publication years from the current year which account for 50 percent of current citations received” of articles from a given journal. This was useful for determining which journals had a long-enough shelf life to make archiving them worthwhile.
Google Scholar is providing a number of metrics – the h-index, the h-core, and the h-median – which I shall mention, and point out, without professing to understand their usefulness. Lorbeer refers to a development also covered by Inside Higher Ed earlier this year: a metric based on Twitter references, to determine the real-time impact of scholarly work.
One day a Nietzsche specialist is going to be praised for writing a high-twimpact paper, whereupon the universe will end.
Other contributions to the ATG symposium paint a picture of today’s research library as a mechanism incessantly gathering information as well as making it available to its patrons – indeed, doing both at the same time. Monitoring the flow of bound volumes in and out of the library makes it relatively easy to gauge demand according to subject heading. And with digital archives, it’s possible to track which ones are proving especially useful to students and faculty.
A survey of 272 practicing librarians in ATG’s subscriber base, conducted in June of this year, shows that 80 percent “are analyzing [usage of] at least a portion of their online journal holdings,” with nearly half of them doing so for 75 to 100 percent of those holdings. It’s interesting to see that the same figure – 80 percent – indicated that “faculty recommendations and/or input” was used in making decisions about journal acquisitions. With book-length publications entering library holdings in digital form, the same tools and trends are bound to influence monograph acquisition. Some of the articles in the symposium indicate that it’s already happening.
Carbon-based life forms are still making the actual decisions about how to build the collections. But it’s not hard to imagine someone creating an algorithm that would render the whole process cybernetic. Utopia or nightmare? I don't know, but we're probably halfway there.