Can Your Data Come Out to Play?

You have /5 articles left.
Sign up for a free account or log in.

When I was planning to attend a conference in San Francisco last fall, I had to find a cheaper hotel than the conference venue if I was going to be able to afford to attend for more than one night. I found a small hotel a mile away that was half the price and had good online reviews. But how safe was that neighborhood? I went to EveryBlock to find out. Some property crime, a touch of auto theft, no homicides and the numbers overall were as low as you could want in an urban neighborhood. I booked a room.

EveryBlock had its start when Adrian Holovaty decided to improve on a GIS crime mapping system in Chicago that the city already made available online. You could type in an address, a beat, or a police district and find out exactly where and what kinds of crimes police were responding to. Cool! But Holovaty, then an editor and tech innovator at the Washington Post, thought he could make it better. He was one of the first people to use Google Map’s API to create a “mashup” – a melding of data from different sources - and he did it on a whim. He quickly realized other information could be mixed in to develop a real-time community news hub. He applied for a grant. Now EveryBlock (since acquired by MSNBC) offers local "news with a twist” for 16 cities. For any given neighborhood you can get not just crime reports, but news stories of local interest, blog posts, the results of restaurant inspections, photos posted to Flickr that are tagged with a location, meeting announcements, reviews of businesses, and even (in at least one case) the location of bike racks.

Holovaty took public information and available software and created something new and valuable, and it all started with him playing around in his spare time. Out of casual experimentation he came up with an innovation that was worth building on, a new approach to hyper-local news. It came naturally to a journalist to see the value of combining news with public data. Libraries could use innovation like that to build better catalogs, but OCLC, our primary cooperative for sharing catalog data, doesn't want our data to go out to play.

Most academics know OCLC through the WorldCat database that searches the holdings of over 70,000 libraries in over 100 countries. It’s an amazingly useful tool, and the organization took a huge step forward in 2006 when it made it possible to search this union catalog for free on the Web. (As I understand it, libraries have to subscribe to the proprietary version of the database to have their holdings included; OCLC needed to keep that revenue stream to fund the free access, so the interface you see in the library may look different than the free one.) WorldCat is a vast pool of data, and a lot of clever people could build interesting things with it. But here’s the rub: if others can build things with the records libraries have contributed to OCLC’s WorldCat database, OCLC will no longer be the destination, and member libraries (which pay OCLC tens of thousands of dollars annually in fees) might find more innovative and cheaper alternatives. How could the organization fund new projects, much less the upkeep on its home campus or its staff members’ salaries? (Though most of the staff involved in innovations pull down salaries that are within market norms, the CEO earns over a million a year and board members, normally unpaid in non-profit organizations, earn as much or more than the average librarian’s salary. Preserving this business model doesn't come cheap.)

Last year, OCLC proposed a policy for records use that kicked up a storm among librarians. It seemed to claim total ownership of all the records contributed over the years by libraries and impose new limits on its use by member libraries; the argument was that letting library records run feral on the Internet would weaken OCLC’s position. To remain strong and retain value, it must preserve its preeminence as the only mega-database of library records. That policy was withdrawn, but a similar policy was adopted last June. Karen Coyle, a blogger who has followed all kinds of cataloging developments more closely than I ever could, commented that “it reads as if the purpose of membership were to sustain OCLC (instead of the purpose of OCLC being to support libraries).”

Things have gotten murkier since a new company, SkyRiver, tried to sell libraries a cheaper system to catalog their books. OCLC punished libraries that switched their cataloging process over to this new company - libraries at public institutions hurt badly by the recession and and needing desperately to cut costs - by raising their rates for loading records into Worldcat more than ten times, enough to erase any savings the library might have realized. At least one of these libraries went ahead with the cheaper cataloging system, saying they will continue to loan their books to other libraries, but anything they catalog from 2009 on won’t show up in Worldcat. It would simply cost them too much to upload records at that new higher rate. OCLC officials have argued cataloging revenue is essential support for their unique position among libraries.

But at the same time, OCLC was rolling out an integrated library system to compete with commercial ones, including one sold by a company affiliated with SkyRiver. (These are basically inventory control systems with a public interface for those who want find a book; they cost libraries tens of thousands of dollars annually and generally leave much to be desired.) SkyRiver brought an anti-trust suit against OCLC, OCLC has filed a motion to dismiss . . . it’s like going to family court and watching a messy divorce in which the children are staring at their feet as their parents try to destroy each other over custody.

All that SkyRiver drama aside, the end result of the records use policy is that those who might improve on what we can do with catalogs can’t play with the data, not without getting OCLC's approval. That doesn’t sound like “play” to me.

Nor does it to Jonathan Rochkind, who writes from the Bibliographic Wilderness. In a response to a discussion at the Open Knowledge Foundation blog on the best way to share bibliographic data (linked? copied? corralled?) he wrote that there is a moment in a busy developer’s life when he or she might glimpse a cool idea that could be shared and tried out by others, and with a little extra effort, that developer might just throw something together and release it into the wild to see what happens. That’s not going to happen if you have to check with the boss, the boss with the cooperative, the cooperative with a lawyer, and on and on . . .

Those extra steps have a chilling effect on innovation, and when chilling is built purposefully into the system in order to protect the way we’ve always done things, or rather to protect an organization that worries that it might lose control, we’re losing the kind of playful “huh, I wonder what would happen if . . . ?” moment that ended up combining public data, newsfeeds, and anything else that can be mapped into this nifty thing called EveryBlock.

Who knows what nifty new catalog isn’t being dreamed up in someone's spare time?

(Hat tip to my friends at LSW for pointing me toward Rochkind's blog post.)

Can Your Data Come Out to Play?

Next Story

Written By

Sign up for a free account or log in.

Can Your Data Come Out to Play?

Next Story

Senate Tweaks ‘Big Beautiful Bill’ Ahead of Final Vote

Written By

Share This Article

Sign up for a free account or log in.