Advertisement

Advertisement

News, Views and Careers for All of Higher Education

At Libraries, Taking the (Really) Long View

One of the benefits of digitally encoded content is that it can’t deteriorate. With files that consist of 1’s and 0’s, there are no pages to turn yellow or brittle, tape to demagnetize or bindings to snap. In theory, that would be a boon to libraries that devote boundless resources to preserving old documents, ancient texts and even videos recorded in Betamax.

But as libraries shift more of their resources to holdings that either originate as digital or become digital through scanning, it’s become clear that just because something lives in the virtual stacks doesn’t mean it will be around forever. Anyone who’s ever suffered through a hard drive crash (or tried futilely to save a scratched DVD) has faced the inherent physical limitations of digital storage. Now librarians are having to do the same as they determine how digital holdings fit into their central mission: preserving works so that they can be accessed not just today, not just tomorrow, but indefinitely.

And for anyone who’s also worked through a mere “upgrade” in file formats or e-mail clients, it’s probably not a stretch to assert that in computer time, 10 years might as well be infinity. What does that make 100?

So, in a literal race against time — but one with a perpetually receding deadline — librarians from research universities and other institutions around the world are collaborating to tackle a whole host of problems that so far have no satisfactory solution. They include hardware complexities, such as constructing storage devices that continuously monitor and repair data while remaining easily scalable; redundancy measures, such as distributing and duplicating data across storage devices and even across the country; universal standards, such as formats that could conceivably remain readable in the distant future; and interfaces, such as open software protocols that manage digital holdings and make them accessible to the public.

Some of the solutions are still in development, while others are piecemeal. Various institutions are trying different approaches, and corporations are competing with each other as others collaborate on open-source approaches.

“For the most part, they’re all untested. None of the solutions have withstood the test of time yet,” said Michael Witt, an assistant professor of library science and interdisciplinary research librarian at Purdue University.

Coming Down to Earth

If worries about digital preservation seem premature or overly pessimistic about an eventual solution, it’s worth comparing the success of restoring traditional holdings with comparable digital records. In 1975, NASA’s Viking landers sent back reams of data from Mars, where they were scouring for possible evidence of extraterrestrial life. Unfortunately for scientists, the magnetic tapes used for storage became brittle and nearly unusable even after the space agency made considerable efforts to keep them in a properly controlled environment. Beyond the physical obstacles, moreover, scientists in the late 1990s found that they couldn’t read the data format anyway — and they had to crack open the original (analog) printouts to retype them.

That experience, recounted in a 2006 report from Britain’s Digital Preservation Coalition, was one of several that helped to jump-start a movement among librarians, information technology specialists and others concerned with the real possibility that much of today’s digital material is not only in flux but in danger of being lost in the ether altogether.

“The state of things is that we’re in the digital dark ages right now,” Witt said. “We’re losing a ton of valuable information that is electronic because of the transient nature of the Internet and of storage technology and how people use it.”

Tom Cramer, the associate director of digital library systems and services at Stanford University, said that NASA’s inadvertent discovery — that even machine-produced data can be lost to the environment or obsolescence — echoes his own experience. Closer to home, Stanford’s library was tasked with helping the Monterey Jazz Festival preserve its historical recordings from decades ago. Out of hundreds of tapes taken from nearly 40 years of recording history, Cramer said, only one couldn’t be recovered. But audio from a digital format the festival began using in the 1990s wasn’t as reliable: out of scores of those tapes, covering about six years, six were damaged beyond recovery.

So digital preservation encompasses not only the problem of reliable storage and recovery but of how to finance it, how to manage it and how to make such systems sustainable over the long run. For that to happen, though, enough institutions have to participate. The British report, “Mind the Gap,” found that although a slight majority of respondents in the United Kingdom said they had an institutional commitment to addressing the issue, only 20 percent said there was enough funding to tackle it, a third said there were “clear responsibilities” for handling it, and only 18 percent said there was a strategy for digital preservation at all.

Still, Stanford has been one of the pioneers in developing solutions to digital preservation, especially through its Silicon Valley ties to Sun Microsystems, which last year set up the Sun Preservation and Archiving Special Interest Group, or PASIG, to bring together leaders in research libraries, universities and the government to periodically meet and collaborate on digital archiving issues.

“We are trying to meet the needs of the evolving ‘cybrarian’ community that is grappling with storage and data management, workflow and high-level architecture trends in the area of preservation and archiving,” said Art Pasquinelli, Sun’s education market strategist, in the initial announcement.

One project Cramer has been working on is the Stanford Digital Repository, which he said currently hosts geospatial data as well as content from other scholarly sources. The SDR, according to its Web site, provides “a trusted environment for long-term digital information storage and preservation activities.”

As the project’s description implies, the trust issue is an important one for librarians. The fragility of partnering with companies was reinforced last month when Microsoft announced that it would discontinue its Live Search Books project that helped research universities scan books and journals to be accessed digitally. For many librarians, it was a signal — or a reminder — that corporate partnerships, while in many cases helpful financially, can raise questions not only of ownership, but of reliability over the short term (let alone the long view).

“I wouldn’t rely on [corporate sponsorships] as the sole source for digitizing and preserving and providing access to my materials. I think it’s very dangerous to go down that road both for reasons of the integrity of the information, any kind of ethical ... issues that may arise,” said Sarah Houghton-Jan, a blogger and the digital futures manager at the San Jose Public Library, which is run in partnership with San Jose State University.

So many developers have instead been taking the open-source route, collaborating and building on each other’s code. Already, there are three established “repository” packages — software that manages, organizes and allows access to online materials. Fedora, one of the major ones, has about 130 registered institutions and logged about 25,000 downloads over the past 12 to 18 months, said Sandy Payette, a researcher at Cornell University and the executive director of the foundation that supports the software. (The other popular repository solutions are DSpace and EPrints.)

“Some of the principles and elements of open source software communities really reinforce ... the principles of digital preservation,” Cramer said, noting that “[y]ou don’t want any black boxes ... because when someone starts taking your content and modifying it in ways that aren’t apparent to you, you’re kind of at the whims” of the company you’re working with.

The Biggest Obstacle?

Technology aside, however, what may be the biggest obstacle to a universal, agreed-upon solution might sound familiar: “The biggest challenge is actually related to human beings,” said Witt. Libraries need to acknowledge the problem they face and work it into their management structure.

Already, he said, libraries are starting to hire “digital preservation officers.”

“But really, if you’re going to have some assurance from an institutional standpoint that someone is stewarding these objects … [there’s] a human resources issue.”

Houghton-Jan summed up the daunting task facing libraries like this: “The clarity is that there is no set course, and that things are very much in the air. It’s nice to have clear uncertainty at the very least, I guess.”

Andy Guess

Got something to say?


Want it on paper? Print this page.
Know someone who’d be interested? Forward this story.
Want to stay informed? Sign up for free daily news e-mail.

Advertisement

Comments

Long? REALLY Long?

Michael Witt came the closest by naming the biggest challenge of long-term preservation of the digital cultural heritage as being “human beings” and the acknowledgement of the problem ... but that doesn’t complete the picture: there is also a problem with how we define “long". In a paper on the supposed demise of the card catalog, I proposed that the book is actually a “new” format for librarians. Given the Internet time proposed in this article, I was writing in the ancient times of nearly 10 years ago. That’s a piece of the picture that needs to be reconsidered: our concept of “how long” really needs to be lengthened. Thank goodness that the Long Now Foundation is taking a slightly longer view of things ...

Librarians began their existence as stewards of the record (much more akin to the role of archivists) over 4500 years ago. As a profession, we have only just (BARELY) come to grips with print material on paper. So taking a “long” view means looking MUCH further than 100 years.

If we look at the proliferation of storage formats in the past 100 year (that would be 1908, for those doing the math), there are several items that emerge as relevant. First, the commodification of almost every facet of our cultural heritage has resulted in many conflicting platforms and formats that have tremendously complicated their stewardship (stewardship being an essential duty of those entrusted with collection care, be they curators, archivists, or librarians). Second, the dependency upon corporate support only intensifies the difficulties of this phenomenon (some staff at recipients of Microsoft library grants would characterize the “support” as a set of golden handcuffs, as such support should NOT be requiring embrace of a proprietary system). Third, we (those of us in the trade) serve a public that is equally clueless ... the quote “...reruns all be come our history” is all too telling, all too accurate.

That we are looking at the open-source route is important, especially since one of the most enduring delivery mechanisms of knowledge and culture is an open source technology — but only after a long struggle was endured. When is the last time someone was arrested for owning a printing press? Have we forgotten the meaning behind “Freedom of the Press"?

Creating systems to preserve the digital cultural heritage must be as transparent as possible, as wide-spread as possible, and as free of corporate-meddling planned obsolescence as possible (I was surprised to NOT see mention of the LOCKSS (www.lockss.org/lockss/Home) and CLOCKSS (www.clockss.org/clockss/Home) Projects, as these are initiatives that build on the open-source principles).

My hope is that Mr. Guess’s essay will not be preaching to the proverbial choir; we need the activism now, not another committee or new job title to address the issues.

Dennis Moser, Digital Services Librarian at Somewhere in New England, at 8:40 am EDT on July 23, 2008

Also a Question of Footnotes

When Dr. Daniela Dimitrova and I began looking into this matter in 2004, our studies were initially dismissed as mindsets of the bygone literary era. New media and the new millennium were too exciting for librarians who increasingly spent less time preserving knowledge and more time teaching others how to disseminate and retrieve it through commercial computer networks.

It’s about time that libraries and the American Library Association in particular take the long view; they largely are responsible for the problem with their advocacy for corporate technology at the expense of scholarly preservation and archiving.

To understand the extent of the problem, visit the ALA’s strategic plan: ALAhead to 2010, available at http://www.ala.org/ala/ourassocia...aheadto2010/adoptedstrategicplan.cfm

Its vision statement seeks to promote access to information. It mostly overlooks preservation of that access.

True, there are initiatives concerning the preservation of cultural heritage. I was aware as well that in the past the ALA featured a site about the Society of American Archivists Committee. Before posting my comment, I wanted to see if the Committee had made headway in preservation efforts.

Go ahead and put that committee name in quotes on the ALA search button, and it will return an address. Click that and you’ll get a “Page Not Found” advisory.

The impact of 404 and related errors, as well as manipulated documents in library databanks, will continue to be focal points of our research on the erosion of digital footnotes and the impact of that on peer review and historical and cultural preseveration.

To understand a snippet of our concerns, consider how the ALA correctly warns against interference by government on our First Amendment rights. Then consider a future in which government can manipulate primary documents and alter them to rewrite history because library science neglected its role as guarantor of reliable, unaltered primary and secondary data.

As such, the long view is not to the future but to the past and a reconsideration of the nature of consumer technology. It allows for instant access to millions of databases, social networks, e-journals and more, with later more difficult and sometimes impossible retrieval. As such, the greater the dissemination, the lesser the preservation without manipulation.

With each of close to a dozen follow-up studies about the erosion of footnotes due to digitization and the inherent corporate manipulation that comes with it, Dr. Dimitrova and I continue to foretell the disastrous consequences of this phenomenon on peer-review methods in the social and natural sciences.

In recent studies we have been examining the impact on the liberal sciences, particularly the discipline of history, which now has three categories of footnotes: primary, secondary and emphemeral. We have a new study on how historical research will be affected forthcoming in American Journalism (a journal of media history) and will present more findings in Chicago next month at the annual convention of the Association for Education in Journalism and Communication.

In recent years the American Library Association, through hundreds of news releases and posts, has been positioning the role of libraries as a digital disseminator of information rather than a repository thereof. In doing do, it embraced the future and abandoned the past without fully assessing the importance of archiving since ancient times to the current day.

Indirectly, the IHE article above documents what libraries have overlooked in abandoning their historic, primary role as a repository rather than disseminator of knowledge and information. I encourage the ALA to promote and advocate for preservation while reclaiming its role from the corporations that compete for retrieval, for the result of the latter is easily prophesied: each commercial respository will be incomplete and documents in several might contain variations, a phenomenon not unlike that of Shakespearean times and proportions when scholars had to discern which was fair and foul in reassembling the Bard’s work for future generations.

That was the theme of my first essay in IHE, among the first to publish our concerns in April 2005 at this link: http://www.insidehighered.com/views/2005/04/22/bugeja

In the interim, we recommend that library vendors such as ProQuest continue to provide PDFs of data rather than only text as the former is less easily manipulated. We recommend paper sources whenever possible, especially in the sciences and history. We urge historians to begin addressing the third type of footnote, emphemeral, and take the long view on that in both directions: past and future.

There is a reason historians with the exception of a few, such as Anthony Grafton, have not focused fully on the threat to their discipline caused by disappearing or maniplated footnotes: They have been relying on primary and secondary documents still in paper or book form, the latter being the ultimate firewalled medium. How will later generations of historians document one of the most dynamic eras of communication history without footnotes?

For those interested in following our research, visit: http://www.halfnotes.org/

Michael Bugeja, Iowa State University, at 9:35 am EDT on July 23, 2008

Open-source IRs

Readers will be interested in the Registry of Open-Access Repositories (http://roar.eprints.edu), a listing of sharable research sites across the world, and the OAIster site at the Univ. of Michigan as a means of access to contents of those sites. For the record, Fedora installations are only about half of the EPrints sites worldwide, and a third of DSpace. UTM set up a fully functioning IR for a couple grand for server space and a few hours work by a too-clever student.

Richard Saunders, Univ. Tennessee at Martin, at 4:05 pm EDT on July 24, 2008

Digital preservation

In a book I wrote last year (Fool’s Gold) I bewailed the almost complete avoidance of this issue and the unwillingness of librarians (and especially ALA) to admit there’s anything really wrong with digital preservation. While it’s good to know the issue has finally flown high enough to be detected on radar, it’s still flying too low for anyone to care. We’re essentially putting all our hopes on a wing and a prayer that our digital fortunes will be in some cyberwhere in a decade or so. Let’s hope that along with the first roadkill on the information highway (literacy) we don’t also find it’s second: history.

MYHerring, at 11:55 am EDT on July 31, 2008

Advertisement

 Jobs Related to At Libraries, Taking the (Really) Long View

or search for jobs directly.

IT Architect — 3507J
Saint Louis University

Saint Louis University is a Jesuit Catholic University. Through teaching, research, health care and community service, Saint ... see job

Office and Classroom Automation Specialist
Loyola College in Maryland

The history department at Loyola College in Maryland invites applicants for a tenure-track assistant professorship in the ... see job

NETWORK ENGINEER — University Information Technology
Tufts University

The Network Operations and Engineering (NOC) team, a part of University Information Technology (UIT), seeks a highly ... see job

Web Applications Developer
NC State University

Join the Pack! A community with nearly 8,000 faculty and staff, and 30,000 students. NC State is one of the largest employers ... see job

Programmer Analyst Instr Syst
Maricopa Community College District

Grade: 013 Salary Range: $ 39522.00 — $ 46933.00 Location: Phoenix College Department: Learning Tech/Develop (LTD) Hours:Mon ... see job

Business Intelligence — Reporting Developer
Yale University

General Purpose
Responsible for maintaining, modifying and enhancing the University’s alumni/development business ... see job

FAMIS Web Specialist — Facilities, Finance & Business
University of Idaho

FAMIS Web Specialist Facilities, Finance & Business Open for Recruitment: September 19, 2008 — October 20, 2008 Announcement ... see job

Director, C.V. Starr East Asian Library
Columbia University

The Columbia University Libraries invite nominations and applications for the position of Director of the C.V. Starr East ... see job

Assistant Director for Academic Services and Systems
Duke University

Duke University is a dynamic research university in which is embedded Trinity College of Arts and Sciences, a fine liberal ... see job

APPLICATIONS TEAM LEAD — University Information Technology
Tufts University

Distributed Applications is looking for an innovative, detail oriented and self-motivated individual to fill its ... see job