Digital Preservation Network
Think of your favorite song. Pictures of loved ones. The apps you use every day. Now think of the devices you use to access them. If, a year, a decade or a generation from now, you needed to bring those files back to life, would you be able to?
“You take that scenario and apply it to research, and the risk is huge,” said Mary Molinaro, chief operating officer of the Digital Preservation Network.
The issue of digital preservation is a contradictory consequence of an increasingly digital world. Under the right conditions, a book can survive for centuries. A Blu-ray disc may last for a couple of decades. A regular spinning hard drive, however, can die after a few years. Even though the digital revolution has created an explosion of content being created every day, much of it risks being lost forever unless it can be preserved for future generations. The alternative, experts warn, is a “digital dark age.”
DPN, pronounced “deepen,” is one of several organizations working to avert that scenario. The organization will this year begin to accept and preserve digital content from its more than 60 member universities. For the $20,000 membership fee, each university receives an annual five-terabyte allotment (enough to store more than a million photos or tens of thousands of hours of music) that it can fill with content it deems worthy of preservation. Once submitted, the content is stored in multiple locations across the country for at least 20 years, though mechanisms are in place to preserve it for much longer.
Think of it as scholarly research’s version of the Svalbard Global Seed Vault. The facility, which tunnels into the side of a mountain on a remote Norwegian island well beyond the Arctic Circle, contains hundreds of thousands of seed samples that can -- should disaster strike -- reintroduce vital crops back to the world.
Like the seed vault, DPN was designed to “take in all that rich content, pass it forward and structure it in a way that guards it [against] all kinds of failure,” Molinaro said. “It’s something that you hope you won’t ever tap, but if you need it, it’s there.”
Since DPN deals in digital content, not seeds, it can take additional steps to ensure its deposits are preserved. Like a student who takes care to store a thesis on a flash drive, a hard drive and in the cloud, DPN is working with other digital preservation organizations to create a network of storage systems, eliminating the risk of a single point of failure. If one organization’s system fails, it doesn’t jeopardize the content.
The organizations and their repositories serve as “nodes,” replicating content between them and making sure the files still work. Research submitted for preservation to the node in San Diego may be backed up by the nodes in Michigan and Texas, for example.
“That means you’ve got both technical and geographical diversity,” said R. F. German Jr., program director for the Academic Preservation Trust.
The APTrust is one such node. The University of Virginia-based organization uses Amazon Web Services, a cloud hosting platform, to preserve the content submitted to it. Amazon also uses multiple data centers to keep content safe, so when a university submits data to the APTrust, it is stored at the company’s data centers in Oregon and Virginia. Membership in DPN adds an “even greater level of assurance” that the content is being preserved, German said.
Since opening for business in late 2014, the APTrust has preserved more than 16 terabytes of content from its 17 university members, German said. Membership in APTrust also costs $20,000 a year, but universities get 10 terabytes of space.
Other DPN nodes include Hathitrust, the book digitization project housed at the University of Michigan; DuraCloud Vault, a partnership between the University of California at San Diego and the nonprofit DuraSpace; the Stanford Digital Repository; and the Texas Preservation Node, an initiative created by library and IT organizations in the state.
Some nodes focus on preserving a particular kind of content, like Hathitrust and books. Others, like APTrust, accept all kinds -- documents, music, photos and more. Working together, German said, the organizations can build a “preservation ecosystem” better prepared to address the issue than they could on their own.
“All of these are nonprofits … aimed at trying to get after this huge mountain of digital materials that we continue to add to at a stupefying rate,” German said. “What we’re talking about is capturing the best that human researchers are doing in the digital world and making sure that it’s still viable for future scholars to build on.”
A Growing Problem
Digital preservation isn’t just an issue facing individual scholars. Even NASA famously erased the original tapes of Neil Armstrong and Buzz Aldrin walking on the moon and had to rely on broadcasts to recover the footage. History is littered with other examples of researchers and organizations losing not just their finished projects, but their data.
Born-digital research, however, can be more difficult to preserve than research contained in a journal or on a disc. An old book can be preserved simply by placing it in a climate-controlled facility, and chances are scholars of the future will know how to open and read it. To preserve a digital project, an organization such as DPN needs to collect both the files that make up the research and any supplementary information that explains what it is and how it works.
“We want to make sure that in addition to preserving, we provide the tools that allow people to be able to use it in the future,” German said.
As digital research gains greater acceptance in higher education, the need for digital preservation strategies will likely grow stronger, scholarly communication experts say. Several professional organizations, such as the American Historical Association and the Modern Language Association, have already released guidelines to help academic departments evaluate digital scholarship for hiring, tenure and promotion purposes. That development could lead more scholars entering academe filling their portfolios with digital projects instead of peer-reviewed papers.
Seth Denbo, director of scholarly communication and digital initiatives for the AHA, said the organization is exploring ways to educate its members about digital preservation.
“I come at this from the point of view of someone whose job is related to promoting historical scholarship, so I think about this as a historian who wants to ensure that future historians can know about our era,” Denbo said in an email. “My hobbyhorse about this is that historians need to understand preservation and be participants in the processes of decision making about what gets preserved and how we get access to it, or else we aren’t going to have born-digital resources (which will be an ever greater portion of the sources for history) that we can study and use to understand the past.”
While it is unlikely that the AHA will get into the actual business of preserving, Denbo said, he is working on a workshop series “to start a long-term discussion of the future of historical engagement with issues of access and preservation.”
“I think it’s an issue that’s central to history and the mission of the AHA, and moreover one that few historians are prepared for,” Denbo wrote.
Issues With Diversity, Storage
Even with DPN and other efforts up and running, there are still unsolved issues with digital preservation and plenty of research falling through the cracks.
One of the most significant issues is the lack of institutional diversity. Most of the institutions participating in DPN and other organizations are large or midsize research universities. Names such as Indiana University, the University of Maryland and the University of Notre Dame show up on many of the member lists.
The membership models created by preservation organizations may at the moment not be a good fit for the very largest and smallest institutions, Molinaro said. The largest universities have petabytes of data (one petabyte equals 1,000 terabytes) that they want to preserve, she said, while the smallest may not have the budget or dedicated staffers who focus on preservation.
The diversity issue also applies to the content being preserved. Large files -- for example, videos -- can be particularly cumbersome to preserve, Molinaro said. Indiana University has a plan to digitize its vast library of media files, but the size of that single project -- about eight to nine petabytes, Molinaro said -- is larger than DPN’s current capacity of 250 terabytes.
“Undoubtedly there will be content that is lost,” Molinaro said, but she added that there are “serious conversations” taking place at DPN about expanding membership and storage options.
There are also some legal hurdles to digital preservation. Membership in DPN means colleges and universities are required to sign a succession agreement detailing what would happen to the content if the institutions go out of business. Molinaro said the agreements are complicated because they essentially ask the institutions to relinquish control of their content, either to a different DPN member or to the organization itself. About 10 of DPN’s 60 members have signed agreements and are in the process of determining what content should go into their first five terabytes, she said.
While Molinaro said she was optimistic about tackling those and other challenges, German said there is a “dire need” for more organizations to participate in the preservation ecosystem.
“Even added together, we can’t come close to being assured that we would be able to handle all of the content that’s out there and available for preservation,” German said. “The scale of loss is doing nothing but growing every day.”