Search News


Browse Archives

News

2 Models for Digitizing Collections

June 7, 2007

Share This Story

FREE Daily News Alerts

Advertisement

Google's Library Project, which is in the process of digitizing millions of books at top university libraries around the world, announced a major expansion Wednesday: The 12 universities that make up the Committee on Institutional Cooperation have agreed to let Google digitize up to 10 million of their collective volumes -- generally those from the most distinctive parts of their collections.

The announcement brings to 25 the number of universities involved in the Google project, which is being hailed by some scholars for the way it will assure online access to volumes that have been largely available only in a few locations and that are in danger of decomposition. The project will involve both books in the public domain and copyrighted materials -- and the latter have been controversial. Groups of authors and publishers are suing Google over the Library Project, charging that it is infringing on copyrights, and those suing indicated that they would expect any eventual settlement in the case (should Google lose) to be applied to the additional works being added under the new agreement.

On the same day Google and the 12 universities made their announcement, Emory University announced a plan to digitize major portions of its collection -- independent of Google and using an intentionally different model.

The Google Expansion

The promise of the Google Library Project has always been its ability to offer an unmatched collection of digitized materials. Such major universities libraries as those of Harvard, Princeton and Stanford Universities are already involved, as are key academic libraries abroad, such as those at University of Oxford and Ghent University. Two of the CIC members are already members: the University of Michigan and the University of Wisconsin at Madison.

The new collections involved will come from those two and the 10 other members of CIC: Indiana, Michigan State, Northwestern, Ohio State, Pennsylvania State and Purdue Universities; and the Universities of Chicago, Illinois, Iowa, and Minnesota.

The idea of the Google expansion is to take the portions of these collections that are unique and that would thus add the most to the project. While final lists of collections are still being set, they are expected to include Northwestern's Africana collection, Chicago's South Asia collection, Minnesota's Scandinavia collection, and agriculture and food science collections at the land grant institutions in the consortium. Many of the 300 languages represented in the university libraries will be represented.

The works will join the Google project and will also make up a common digital storage system so that each of the universities involved will gain immediate access to many more materials. The universities will not be paid, but Google will cover the costs, which are expected to be significant, given estimates of up to $100 per book to digitize.

Books in the public domain and copyrighted works alike will be included, but for the latter, the Google book search process will yield only background information, summaries and information on where to locate the book. For books in the public domain, full searching and reading will be provided to users.

Mark Sandler, director of the CIC's Center for Library Initiatives, said at a press briefing Wednesday that he saw the project as a significant way for libraries to fulfill their missions. "Society trusts libraries to organize and preserve our cultural heritage," he said, and libraries have historically taken a "long term view" of what works to include.

But books -- especially older works -- are threatened by deterioration, which could destroy them or force libraries to restrict access. In addition, with more and more people doing research online and not in the stacks, there is a danger that books not in digital format will be "squeezed into a smaller and smaller social space."

This project, he said, is designed to keep "generations of ideas alive."

Sandler acknowledged that universities' professors have a range of perspectives on the copyright issues involved, but he said that he sees more and more faculty attracted to the ability to share knowledge broadly and to have instant access to more materials.

Allan Adler, vice president for legal and government affairs of the Association of American Publishers, one of the groups suing Google, said that while his group would not sue libraries, he didn't want people to think that the addition of new members of the Library Project meant that the copyright issues had been resolved. The lawsuit is in discovery right now, and Adler said it is very much alive.

"Either the court cases will work themselves out or there will be a settlement in which additional libraries will be addressed in same manner," he said. "If there is a legal decision in favor of the plaintiffs, that will certainly necessitate an unraveling of these agreements," he said.

Another Model

The same day as the Google announcement, Emory announced another model for digitizing collections. Emory is planning to digitize about 200,000 of its volumes that are in the public domain and to make the materials available online free or available for purchase as inexpensive print-on-demand volumes through Amazon.com. While people would pay for the print-on-demand books, Emory officials said that pricing would be designed just to cover costs, not to earn a profit for the university.

An early focus of the project will be Emory's extensive collections in Southern history and culture.

Martin Halbert, director for digital programs and systems at Emory's Robert W. Woodruff Library, said that his institution agreed with Google about the importance of digitizing works, especially older works in danger of deterioration. But he said that the university's effort was intentionally different from the Google project. No copyrighted materials will be involved. And the university -- not an outside entity -- will have full control over the digital product.

"We saw that as a critical thing," Halbert said. "We needed to retain our role as stewards of these assets for Emory and the public."

See all postings »
Advertisement
Advertisement

Matching Jobs

Comments on 2 Models for Digitizing Collections

  • transmission of text to digital format
  • Posted by Rosemary Aud Franklin , English bibliographer at University of Cincinnati on August 8, 2007 at 9:50am EDT
  • Going digital has the potential to open a resurgence of textual scholarship.
    With regard to establishing an accurate text, Richard Altick in The Art of Literary Research says, "Today we realize that accurate texts, and knowledge of which text of a particular work has a bearing upon a given problem, are indespensible to the progress of literary study." Literary scholarship has been served.

    English bibliographer at the University of Cincinnati

  • Copyrights
  • Posted by Author on June 7, 2007 at 11:45am EDT
  • How is this different from the issues with Napster? What gives Google the right to copy books and other works that are copyrighted?

    I hope they can find a way to give wider access while protecting copyrights. Maybe Google could fork over the plentiful profits it stands to make from this project to the authors whose rights are being violated.

  • Not the piracy some claim it to be
  • Posted by Kokopeli on June 7, 2007 at 12:55pm EDT
  • Author, did you read the paragraph that said

    "Books in the public domain and copyrighted works alike will be included, but for the latter, the Google book search process will yield only background information, summaries and information on where to locate the book. For books in the public domain, full searching and reading will be provided to users."

    It's nothing like Napster - seems to be more like access to a set of abstracts, or an online card catalog. Plus, think of it as a massive archive of works that will hopefully endure for decades, long after it becomes near-impossible to find some of these books in print version. I was pleased to learn my institution (one of the CIC members) will be part of this.

  • Jumping Aboard A Boat With Leaks
  • Posted by Scrawed on June 7, 2007 at 4:06pm EDT
  • Digitization of library collections would appear to allow for expanded access, and in some instances to aid in the preservation of deteriorating original documents.

    Unfortunately it is not a panacea for issues involving preservation, as many archival librarians have already discovered. Usually awareness of the limits of digitization comes after the investment of vast quantities of money in both systems subject to rapid obsolescence and media that deteriorates more rapidly than the original documents, and after the discovery of a characteristic of storage media called "time to failure."

    That being said, I wonder that there aren't more people concerned with two other issues, namely problems involving provenance, and problems implicit in transfiguration and maintenance.

    With changes in the circumstances of a document (storage, location, medium) some aspects of its content may be irretrievably lost even if reconstituted, and through the agency of recontextualization. One can no longer say with authority that a given fragment "belongs to this document." Without provenance, the ability to authenticate is diminished, and therefore the quality of reliant scholarship impugned. The procedures that are implemented may help to mitigate this issue, but I would argue that is the fact of digitization itself that poses the greatest threat here.

    Anticipating that there may well be some people who find this argument questionable, I offer a recent real-world experience. It is now the frequent practice of agencies receiving checks to record the transaction digitally, and in some instances even to destroy the original instrument after digitizing it (usually as an image file). Recently I wrote a check, which went to such a processing agency, which failed to record the amount properly (input error) and then destroyed the check prior to its digitization. In fact then, there was NO record extant of the actual instrument, and an input error in the transaction. Such a result is actually illegal- but there are even fewer (legal) safeguards for the provenance of historical and literary documents, their accuracy, and maintenance. Substitute a document page for the check in the preceding story and the reason for concern should be made abundantly clear.

    Finally, the translation of the text into a digital document poses certain questions with respect to the accuracy and quality of the translation. Human error is a given, and the likelihood exists of malign tampering. This problem is greatly magnified with the propensity to offshore such work (numerous newspapers, attracted by cheap labor, have in fact offshored the digitization of their morgues to nations where English may be spoken, but may not be the language of daily discourse, with results ranging from humorous to disturbing). Offshoring is not the only potential source of such problems, just a particularly obvious one.

    With respect to the threat of the recontextualization and corruption of the text, I am reminded by Walt Kelly's J. Edgar Hoover cariacature from "Pogo" cutting off two legs of a spider and training the arachnid as an asterisk, noting "When you control the footnotes in a book, you control its meaning." Who will control the meaning (and even the words!) of our written heritage?