Open Library

Imagine if world's most complete card catalog were just a mouse-click away. Scott McLemee chats with a young programmer who is making it happen.

August 8, 2007

Open Library is a new online tool for finding information about books – even (perhaps especially) for titles that are out-of-print, scarce, or likely to find one reader per decade, if even that. It is, so to speak, a catalog with benefits. If a text is available in digital format, there is a link. you to it. Citations and excerpts from reviews will be available. Likewise, cross-references to other works on related topics. A user of Open Library can see the cover of the book and, in some cases, search the contents.

The project is still very much under development. Force of habit makes us speak of the pre-optimal version of a site as its “beta” version. With Open Library, given its ambitions, chances are that “gamma” is probably more accurate.

But here's an encouraging sign: The basic framework is being established by my appallingly accomplished young friend Aaron Swartz -- who, at the age of 21, has already helped create RSS (that was in his early teens), published a couple of computer-science papers, and developed Infogami, a system enabling his digitally clueless elders to set up their own websites.He studied sociology as an undergraduate at Stanford University, presumably in his spare time. Aaron has written an essay called “How to Be More Productive” that can be recommended on the grounds that the author does know something on the subject.

I recently sent him a number of questions about the project. Some of his answers were, it seems, typed into a mobile phone. A transcript of the e-mail interview follows.

Q:How is Open Library funded? Are you working on it full time? And how many people are involved in the project?

A: It's currently being funded by the Internet Archive, with the help of some state and federal library grants. We have some volunteers, but also about 5 people working full-time (a couple programmers, a designer, and a product manager).

Q:What will Open Library offer that you can’t already find online? What was missing from the existing array of online book-data resources – WorldCat, Google Books, Amazon, etc. – that makes it worthwhile to create a new one?

A: As the kind of person who reads Intellectual Affairs (an academophile?), I'm often looking for interesting books on an obscure topic. I can look on Amazon, but its coverage of out-of-print books is pretty poor. (In my experience, most of the really interesting books are out of print.) I can search an academic library or WorldCat, but the quality of data is pretty weak -- you can get basic bibliographic info, but no reviews and weak search and a painful interface and most require a subscription.

So I wanted to build a site where one could more easily find those hidden great books, by combining all the data we have on them in one place and letting the people who love them go back and annotate and highlight them.

Q:With any Web 2.0 project, the question of safeguards comes up. Are any built in? I mean, to keep people from going through and systematically attributing the complete works of Shakespeare to Francis Bacon, or whatever.

A: Our plan is to leave it open and then lock things down as need be. Right now we're watching all the edits so that we can revert things if people do that and we hope to let users watch their favorite pages and so on. That kind of thing has worked pretty well for Wikipedia and we're hoping it will work similarly here. But we're willing to try other things if it doesn't.

Q:Some serious questions have come up about the shrinking depth of subject cataloging from the book records issued by the Library of Congress. That might sound like a problem just for librarians, but it isn't. It’s basic infrastructure for intellectual life, pretty much. To anyone doing research, having books adequately cataloged by subject offers tremendous benefits. Will Open Library be taking up the slack on this?

A: Yes, it's amazing the amount of politics around Library of Congress Subject Headings. (And I had no idea that they were thinking about abandoning them -- that's incredible; thanks for the pointer.) Lots of people have different opinions over how things should be characterized and cataloged and which things were important. When we first started the project, librarians kept arguing about which system we should use.

We decided early on to not be partisan but to be a clearinghouse for all the cataloging data we could get our hands on. So in areas where the Library of Congress doesn't do the cataloging, or doesn't do the cataloging to your taste, we'll try to make that data available.

We're hoping we'll be able to pull series data from the specialized libraries so that you can view them on our web site. We'll also republish them so that other libraries can import them from us.

Q:Will you be asking permission before incorporating data from, say, an academic library's online catalog?

A: Yes, we're talking to the academic libraries to make deals on how to import their catalogs. Our main pitch so far has been that this is an opportunity to contribute to a public commons -- contribute your library catalog to the public, and not only make it available to interested library users everywhere, but also contribute to a system where you'll get back everyone else's work, just like libraries have done with RLG.

Q:Open Library will also serve as a central directory for books available in digital formats. Some such material is freely available to everyone (e.g., the Project Guttenberg editions). And some of it has more limited access. Will you link to the latter? And do you have a policy or opinion about dealing with Google Books?

A: Yes, we hope to link to everything interesting -- free or not, although obviously we prefer free and can do more with it. We're planning to link to Google Books and we're hoping we can get copies of their public domain books.

Q:Do you have a long-term plan to make digitizing books part of the Open Library project? Or does it make more sense to leave that kind of initiative to others?

A: The Internet Archive has a big book digitization project, with scanning centers at the University of California, the University of Illinois Urbana-Champaign, the Brooklyn Public Library, Library of Congress, and others. We hope Open Library can raise money to increase their scanning.

Q:I have a question about Open Library to pass along from Matthew Battles, a senior editor of scholarly books at the Museum of Fine Arts in Boston and the author of Library: An Unquiet History (Norton, 2003). It’s about metadata – an important issue that I will admit just barely understanding. So before going on to the question itself, would mind giving a crash course on the topic?

A: This is a bit tricky. Metadata generally is stuff like cataloging data. It's what lets you find books when you want to do a search more complicated than "which books have these words in them?" (or when you don't have the full text of every book made available for searching, as seems to be the case for the foreseeable future). Whenever you look for everything by a particular author or in a particular subject, you're using metadata.

It becomes useful in two cases: When you don't have all the data and when you want to ask more interesting questions. If you just want to find a particular page, searching by full text is usually enough. But if you want to do something more interesting -- like graph an author's output by year, or see which country has produced the most romance novels, or find out which genre has the most growth in the past six months -- you need metadata. Here's a dorky metaphor for you: data is literary criticism and metadata is Franco Moretti.

Q:OK, now on to the question from Matthew Battles: “I wonder how much a resource like Open Library can make itself open to metadata mashups--giving developers openings through which they might take metadata, bibliographical info, and text and organize it in undreamt-of ways... and how robust and open will the system become not only with respect to image formats, but metadata concepts? In less convoluted terms, will it be possible for Open Library to ‘accrete’ tags and other metadata, to layer cross-references and hyperlinks--for its metadats to ‘learn’ from users?”

A: Yes, opening up our data to others is a key part of the plan. We will have full database dumps and XML and other formats as export. A big hope of mine is that by making all of this data available in a centralized place, we'll make it vastly easier to build applications around books. Want to build a site that lets people find other people who have the same books who live near them? No longer do you have to build a whole bunch of infrastructure to locate and refer to books -- instead, you just need to build the part relevant to your application. (Like the geolocation stuff.)

As for "accreting" tags, we spent a lot of time building an advanced new type of database for this project so that we could load in data of all sorts from numerous sources. So if someone has been keeping track of, say, the fonts used in every book, we can import all that data and store it with the other stuff we have. Similarly for any user-created data.

Q:So what is your sense of the master plan for this project? The future course of development?

A: We're taking it step by step. Our first goal is to get catalog information for every book -- a big project in itself. We've been calling all the publishers and national libraries and research libraries to get copies of their catalogs (we'd love readers' help with this, by the way!) and then we're working on algorithms
to integrate all that data into one coherent site.

After that, we want to work on improving the book-reading interface for books that we have scans of. We're hoping to make the scanned text into a wiki as well, so that people can fix typos and correct errors in our processing (OCR) of the scan. We'd also like to think about new ways that people can work with a book's full text online and what the proper interface for that should be. And, of course, we want to think about ways we can get more books scanned. One idea is a "Scan this book" button on every out-of-copyright book, where for $50 to $100, we'll page the book from a library, deliver it to the scanners, and then email you a PDF of the book and put the full text online, with a little nameplate thanking you for funding it.

And then, of course, we want to expand beyond just books. We're eager to do the same thing with journal articles: one open site where we list every journal article, all the journal articles by a particular author, sorts by subject and topic, the abstracts and references, and links to places where you can find a full text copy. I just got back from a science conference and the folks I talked to there loved the idea. And after that there's music and movies, naturally.

Q:One last thing... People should be using index-card catalogs to find print-and-ink books in brick-and-mortar libraries! This is just one more effort to turn the US into a nation of screen potatoes! Admit it -- you just hate books, don't you? (I say this tongue in cheek, but there are bound to be people muttering it in all earnestness.)

A: You found me out: I love books. Every time I walk into a library, my face just lights up. There's something so grand and inspiring about collecting all those books just to share them with people. And I visit them constantly; I always have a dozen books checked out at anyone time, with a couple new ones each week. I'm sure that's nothing for most IHE readers but to my friends in the computer industry, it's like I'm some kind of bizarre alien. I do this because in a world of Googles, Amazons, and Wikipedias, all encouraging people with computers to stay at home and talk to their screens, I want to have at least one countervailing force encouraging people to go find dusty library books off of disused shelves.


Be the first to know.
Get our free daily newsletter.


Back to Top