In the age of too much data, researchers believe that massive ecosystems of literary detritus can unlock new insights into culture, language, and history. Google’s massive book-scanning project has occasioned the coining of “culturenomics,” the study of large-scale data as a window into culture, as an emerging field of study. Then there’s Twitter. Maligned by many in academe as a digital landfill for gibberish, some scholars consider the popular micro-blogging service to be a valuable trove containing shreds of the collective diary of 21st-century humanity — which, with the right tools and access, might be assembled into coherent volumes.
But recent changes to Twitter’s application programming interface, or API — the rules governing how outside software can interact with data on Twitter’s servers -- may have affected how easily academics and other researchers can get hold of certain data they might want to study. Software experts who have been assisting research projects involving Twitter say the new rules could make it hard for scholars, especially those without technical savvy, to get their hands on enough data that they might begin fitting together discrete pieces into coherent puzzles.
The restrictions, first reported by ReadWriteWeb, are subtle and not necessarily insurmountable for scholars who want to use Twitter for research. But according to experts contacted by Inside Higher Ed, the new rules could make it harder for researchers to secure large sets of raw data from the Twitter archives and export those data to their own computers for analysis.
The Web Ecology Project's 140kit and Twapperkeeper.com — two websites that have received funding from academic groups to help researchers more easily acquire and sort raw data from Twitter — have scrambled to adjust to Twitter’s new API rules, which were promulgated last month by developers at the company.
Twitter makes it relatively easy to request and receive metadata for tweets that share common elements, such as origin, words, themes, and time period, says Devin Gaffney, managing director of the Web Ecology Project, but only if you need a limited amount — about 150 refreshed data requests’ worth per hour.
“That sounds like a lot, but it’s actually relatively limited,” says Gaffney, who got seed money for the Web Ecology Project from Harvard Law School’s Berkman Center for Internet and Society. “It’s definitely good access if you’re doing something basic and not necessarily high-volume,” he says, “… but not something you’d want to bet your dissertation on.”
Research that necessitates larger data culls, Gaffney says, means the researcher has to make a “whitelisting” request. "Whitelisting" allows applicants who get Twitter’s blessing to pull 20,000 requests per hour for user metadata from the company’s rapidly evolving archive of tweets. Academics generally prefer the larger requests, which can be the only way to keep up with the thousands of new tweets that materialize every minute, says Gaffney. “The more data you can get, the more solid your assertions become,” he says.
Twitter’s new API rules, however, put an end to whitelisting permissions for all new entities.
They also stipulate that established facilitators such as Twapperkeeper.com and 140kit, which retain some of their whitelisting privileges through a kind of grandfather clause, may not redistribute raw data to the researchers who use their services to query the Twitter archive. Both had previously allowed researchers to export the data on request.
While both Gaffney and Twapperkeeper.com’s John O’Brien say they are working on patches that will allow researchers to have useful access to raw data even if they don’t own it, both admit that the new API rules may raise the barrier on curious academics who lack technical skills but think there might be something in the Twitter archive for them.
Under the old regime, “At any time [users] could just walk into the system and download it,” says O’Brien. “…That was where the real power was for the academics.”
There are plenty of non-academics, often in business and advertising, who have been using data acquired through whitelisting requests for their own ends, says Gaffney, but those users will probably suffer less than scholars from the new rules against redistribution of raw data. “Really the only people who care about raw catalogs are [academic] researchers,” he says.
Professors with enough clout and seniority to assemble a technical staff to help them adjust to the tighter restrictions on Twitter API culls might not be as hurt by the changes as younger academics who are interested in exploring what Twitter archives might have to teach them, Gaffney says. In other words, the price of poking around just went up. Again, this is bad for academics, says Gaffney, since not all the possible uses of Twitter are intuitive. Maybe the media scholar or the social network theorist could pen a persuasive grant proposal on spec, but perhaps not the epidemiologist.
Not all academics believe that Twitter’s API changes spell big trouble for academic research. Leslie Johnston, a manager of technical architecture initiatives in the digital preservation wing of the Library of Congress, says she thinks the new rules will mainly affect those who wish to make money off the data, not those who wish to learn from them.
“Overall I think this has more impact on businesses that were hoping to redistribute the Twitter archive,” says Johnston. “…I think for [scholars] it is not going to have a big impact except to keep them from sharing the data they have done their research on.” If Twitter were in fact shutting scholars out of the archive, rather than just limiting the rate at which some are able to pull data and what they are allowed to do with it, that would be a problem, she added. Under the new rules, “I think it will take you longer to get it,” she says, “but you can still get what you need.”
About a year ago, the Library of Congress announced that it had negotiated a deal with Twitter that would give the library access to the entire archive. Johnston says the logistics of that deal — including how much data scholars are able to request from the archive through the library — are still being hashed out.
Twitter officials did not respond to interview requests.
For the latest technology news and opinion from Inside Higher Ed, follow @IHEtech on Twitter.
Read more by
Today’s News from Inside Higher Ed
What Others Are Reading