Scraping Campus Bookstore Data in the Hunt for Cheaper Textbooks

Textyard has open sourced the tool it build for harvesting course and textbook data from college textbooks.  Textyard used this to build its textbook price comparison site, and now that the startup's founders are moving on to a new project, they're releasing the technology in the hopes that other students and programmers can build projects with it.  

February 22, 2012

Ben Greenberg and Rui Xia, the co-founders of Textyard, are moving on to other projects, but in doing so they've open sourced the code that powers the Textyard website, asserting that "any college student with rudimentary coding skills will now be able to take on their local bookstore."

Textyard is a textbook comparison site that makes it easier for students to do a little research before buying their textbooks at their local campus bookstore. The service is meant to help counter the high price of textbooks by giving students more options and information about where to purchase their books. The website lets you enter all your classes and sections for the semester, pulls the list of required and recommended textbooks, and gives you their price on Amazon, Chegg, eBay and other online outlets.

How does Textyard know which textbooks are required? It "scrapes" the college bookstores' websites, meaning that it programmatically harvests the data about courses and textbooks. As Textyard's Greenberg notes, many colleges use one of six major online storefronts, so the team has written the code that can extract the information from each.

Greenberg also argues that this isn't illegal, although in the past some web-scraping companies have been sued for "trespass to chattels" (most famously, perhaps, when eBay sought an injunction to stop Bidder's Edge from scraping its site to display auction information). In the case of Textyard, Greenberg argues that course and textbook information must be available to all students and bookstores under the Higher Education Opportunity Act. He also contends that if scraping does not disrupt a website from functioning (and typically it doesn't), there's really no way to say that the tool damages its business.

There's been quite a bit of interest in Textyard's open sourced scraping tools since the startup announced their release last week. As eCampus News' Dennis Carter notes, at least one student has already said he'll use the code in a peer-to-peer textbook exchange site he's building for the University of Texas, Austin.

As the eCampus News story notes, there are several other websites that offer similar sorts of price comparison (such as SwoopThat, for example). But by opening up its code, Textyard has made it easy for plenty of other blooms to flower, whether they become fully-fledged startups or just local side-projects.

It's also an indication that the lack of transparency in the textbook market won't be able to continue -- whether that's something that's the result of the types of tools that Textyard's released into the wild or whether other shake-ups (open educational resources, for example) undermine the expectations that students cough up hundreds of dollars a year unquestioningly for their textbooks.

But the usage of Web scraping tools to build startups and websites raises a number of questions too about other aspects of data transparency and accessibility at colleges and universities. Another new startup OneSchool recently launched with a mobile app that pulls together course information, campus maps, and activities for 8 universities. OneSchool also gets its data by scraping campus websites.

The response to some Web scraping efforts is "Shut it down!" either with lawsuits, terms of service changes, or other anti-botting services. But as there's clearly an interest in the data -- interest from students and from entrepreneurs -- perhaps universities and campus bookstores alike need to figure a way to open up their data, create an API, monitor what people are doing with it, and perhaps even monetize it.


Back to Top