Microsoft's announcement last Friday that it would discontinue its book- and journal-scanning initiatives left its partners at university research libraries pondering the future of efforts to digitize materials in their archives. Analysts said the software giant was refocusing on its strengths, in effect conceding the digitization arena to Google, the company that in 2004 first started working with universities on book scanning -- to some fanfare as well as controversy.
Libraries increasingly see digitization as a preservation strategy. While Microsoft's departure probably won't cause significant upheaval, it will reinforce for universities the necessity of ensuring that they retain the rights to their scanned materials -- or that their digitization projects will be around next semester, let alone forever. One way to do that is to continue pursuing internal, proprietary scanning projects which, for many libraries, existed for years before Google and Microsoft made it possible to vastly increase their scope and scale. Another is to work with nonprofit initiatives. But if there's one thing libraries agree on, it's that the competition between the two companies was healthy.
"I think we wanted to maximize our opportunity and to work with both organizations," said James Neal, Columbia University's vice president for information services and its university librarian. The library system has complementary agreements with both companies. "I felt that it was important to diversify our partnerships, and therefore the value of having the ability to work with both Microsoft and Google was attractive, but has now proven to be probably the right decision." Neither company demands exclusivity in its contracts with libraries.
The competing programs work in a similar fashion. Libraries receive funding and equipment, often through a third party, to scan books, journal articles and other materials that they select. The digitized files then become available for the collections to offer through their online Web portals to students and faculty, theoretically forever. Both companies have also made sure that the documents would appear in their respective search results (but not their competitor's), presumably in a bid to eventually drive scholarly traffic (and ad revenues).
That provision has driven some university libraries, such as those at the Universities of Massachusetts and Connecticut, to partner instead with the Open Content Alliance, a project of the nonprofit Internet Archive that doesn't restrict access to scanned documents in the public domain. "I think people are starting to understand what the issues are when you have a corporate entity run things," said Brewster Kahle, the head of the Internet Archive. "When it isn’t in their business interests, then it can shut down very quickly."
In Google's case, scanned materials show up in Google Books searches -- either in snippets, in the case of copyrighted works, or in full for works in the public domain. Microsoft's effort operated through the Open Content Alliance to offer scanned materials a permanent home at the Internet Archive. Eventually, the company planned on integrating the works into its Live Search service, but it's unclear whether that will now happen. Either way, the full rights to all works the company has scanned so far -- and copies of the files themselves -- will remain with its university partners.
“Today we informed our partners that we are ending the Live Search Books and Live Search Academic projects and that both sites will be taken down next week. We have learned a tremendous amount from our experience and believe this decision, while a hard one, can serve as a catalyst for more sustainable strategies," said Microsoft in a statement attributed to Cliff Guren, whose title is senior director of publisher evangelism for Live Search Books.
"To that end, we intend to provide publishers with digital copies of their scanned books. We are also removing our contractual restrictions placed on the digitized library content and making the scanning equipment available to our digitization partners and libraries to continue digitization programs. We hope that our investments will help increase the discoverability of all the valuable content that resides in the world of books and scholarly publications.” (Microsoft elaborated on those comments in a blog post last Friday.)
The company's partners, including institutions with major research libraries such as Cornell and Johns Hopkins Universities, can now opt to pursue or continue partnerships with Google, fund internal scanning operations, work with groups like the Open Content Alliance or some combination thereof. The OCA, which scanned materials with Microsoft funding, also operates on private donations and foundation money but is looking to secure replacement funding to continue its digitization work.
"We will talk with Google about whether or not they want to increase our project with them," said Janet Gertz, the Columbia library system's preservation and digital conversion director. She added that ideally, the library would redirect the books it was scanning through Microsoft to the complementary Google program. The Microsoft program, she said, "will gradually taper off over the month of June."
Neal added: "I think it will be important to monitor carefully the choices and decisions that those other libraries" in the Microsoft partnership make. "We don't know at this point whether there is going to continue to be an alternative to our Google relationship."
Anne R. Kenney, the Carl A. Kroch University Librarian at Cornell, said she appreciated Microsoft's partnership with OCA, which furthered the university's goals of offering "multiple access points" for works online as well as keeping digitized materials preserved in a secure repository. But the advantage with Google, she said, was the ability to scan copyrighted works as well as those in the public domain. The university signed with Microsoft in 2006 but more recently announced a separate partnership with Google.
The initial Microsoft contract was already set to conclude in two months, Kenney said, and the university likely would have pursued a renewal had the project not ended. So far, between 90,000 and 100,000 volumes have been scanned through the program, she said, mainly pre-1923 monographs, in English, "across the subject spectrum." To get a sense of the scale, the partners scanned more books in the first four months than in the previous 15 years, before Microsoft signed on.
"We felt very positively about our relationship with Microsoft," Kenney said, adding that "the scanning was of extremely high caliber ... they were very responsive to our needs."
But she noted that Microsoft's announcement has sharpened considerations among research libraries of "ongoing preservation" efforts, the "long-term interest in preserving cultural heritage materials respective of their commercial value" -- for centuries, as opposed to decades or years in the life span of the typical company.
Google said it could not respond with a specific comment but noted that the company is "extremely committed" to the Google Books project.
One alternative, besides Google, is the realm of private, open-source scanning efforts. The major player in this arena so far is the Internet Archive, which for now is looking for stopgap funding. But Kahle, in a blog post, sounded optimistic: "Onward to a completely public library system!"
Microsoft spent some $10 million on its book-scanning operations through the Open Content Alliance, working with five major research libraries. But the OCA works with over 70 libraries in total, most of which partnered with the alliance directly, not through Microsoft. In an interview, Kahle likened Microsoft's (and, in the beginning, Yahoo's) contribution to "VC funding being put into the public sphere," referring to venture capital, since the development and technology will remain despite the program itself ending. He also noted in the blog post that the Archive could keep the equipment Microsoft funded; much of that funding went directly to staffing the 13 scanning centers operated by the alliance. (Over 300,000 books have been scanned as a result over the past three years, the post said.)
"So now it’s time for the public sphere to build digital services," he said in the interview. "And fortunately, a lot of the R&D is already done. But that does mean that it has to be funded and brought forward by we libraries. But that’s what we libraries are supposed to do."
Kahle added that, as "the Web has always worked," he foresaw a return to "many different organizations" competing, or working in concert, to continue the progress made so far in digitizing library reserves.
Meanwhile, Google along with a group of libraries will begin investigating how to handle works whose copyright status is unclear. Neal, of Columbia, said the university would collaborate with the company to discuss how information about whether pre-1964 works were copyrighted "can be more routinely identified." The institutions will work with the U.S. Copyright Office and the Online Computer Library Center, which offers long-term preservation services for digital archives, in an effort to more quickly determine whether certain works can be scanned by confirming whether they are copyrighted.