Proponents of the Google Books project have argued that the effort to scan every printed book in the world into a digital database will be a game-changer for scholarship. Now Google is trying help digital humanities scholars prove it.
The company plans to announce today that it is bankrolling 12 university-based research projects designed to demonstrate the potential value to scholarship of its growing digital vault.
Google has been scanning books from cooperating libraries since 2004, and currently indexes digital versions of 12 million books. Despite various legal challenges, it aspires to expand this collection over the next two decades to include all 80 million or so published works known to be in circulation, says Jon Orwant, engineering manager of Google Books.
For humanities scholars, having all the world’s writings available in a digital format opens up an entirely new realm of quantitative research to supplement the qualitative research that, because of limitations inherent to the print medium, has historically been their sole dominion, say Google officials.
“Traditionally, the conventional model of humanities research is that a professor has his graduate students do deep readings on a relatively small number of texts,” says Orwant. “Now, for the first time, we have so many books online and so many useful data mining techniques that it becomes possible, instead of reading 10 books deeply, to read 10,000 books shallowly.”
This is not to say that close reading and interpretive analysis should no longer figure into humanities scholarship, Orwant says; it just means that a different kind of research -- one based on systematic analyses of broad swaths of text -- is now possible, too.
One winning project, called “Reframing the Victorians,” seeks to test the anecdotal yet venerated thesis that well-heeled Britons living in the middle third of the 19th century were especially optimistic -- a view advanced by the scholar Walter Houghton in an influential 1957 book, The Victorian Frame of Mind. Houghton had based his thesis on an observation of the recurrence of words such as “light,” “sunlight,” and “hope.” But his sample was limited by his ability to catalog these sunny allusions. Google’s robots, which will be able to cover a much broader range of authors in much less time, will be able to explore the hypothesis more thoroughly, say Dan Cohen and Fred Gibbs, the George Mason University professors who won the grant.
Cohen and Gibbs plan to test another anecdotal thesis -- that the Victorian era marked a decline in religiosity in the United Kingdom -- by writing a program that tracks references to Biblical themes and passages in Victorian literature at a scale that would be impossible for even the most patient human scholars to achieve.
“Because this Google research program can provide word frequencies by year and country, we can finally and truly test these and other fundamental claims that have been at the heart of Victorian studies for generations,” the researchers wrote in their proposal.
Many of the 12 winning research proposals, to which a total of $479,000 will be divvied out over the next year, were picked because they not only take advantage of these analytic possibilities, but show potential to lay the methodological tracks for similar projects down the line, Orwant says.
For example, researchers at the University of California at Riverside and Eastern Connecticut State University will try to upgrade the metadata attached to the digital copies of pre-1801 books in the Google Books database, as well as research how to improve the project’s use of metadata -- code associated with each digital file that Google’s robots use to match files with search queries -- in general.
Cohen, the Victorian scholar, says that while he can imagine why some might see this as an attempt by Google to placate those turned off by copyright claims made against Google Books by academic authors, he believes Google’s interest in using its resources to advance scholarship is genuine.
Google has allocated a total of $1 million to supporting digital humanities research related to the Google Books project over the next two years. It will welcome new applications for the remaining $521,000, as well as renewals of applications from this year’s recipients, next year.
For the latest technology news from Inside Higher Ed, follow IHEtech on Twitter.
MULTIPLE: President, Los Angeles Harbor College, President, Los Angeles Southwest College, President, Los Angeles Valley College