Certain research topics seem to destined to inspire the question, “Seriously, you study that?” So it is with the field of Twitter scholarship. Which -- just to get this out of the way -- is not actually published in 140 characters or less. (The average “tweet” is the equivalent of two fairly terse sentences. It is like haiku, only more self-involved.)
The Library of Congress announced in April that it was acquiring the complete digital archives of the “microblogging” service, beginning with the very first tweet, from ancient times. At present, the Twitter archive consists of 5 terabytes of data. If all of the printed holdings of the LC were digitalized, it would come to 10 to 20 terabytes (this figure does not include manuscripts, photographs, films, or audio recordings).
Some 50 million new messages are sent on Twitter each day, although one recent discussion at the LC suggested that the rate is much higher -- at least when the site is not shutting down from sheer traffic volume, which seems to be happening a lot lately. A new video on YouTube shows a few seconds from the "garden hose" of incoming Twitter content.
When word of this acquisition was posted to the Library of Congress news blog two months ago, it elicited comment by people who could not believe that anything so casual and hyper-ephemeral as the average tweet was worth preserving for posterity – let alone analyzing. Thanks to the Twitter archive, historians will know that someone ate a sandwich. Why would they care?
Other citizens became agitated at the thought that “private” communications posted to Twitter were being stored and made available to a vast public. Which really does seem rather unclear on the concept. I’m as prone to dire mutterings about the panopticon as anybody -- but come on, folks. The era of digital media reinforces the basic principle that privacy is at least in part a matter of impulse control. Keeping something to yourself is not compatible with posting it to a public forum. Evidently this is not as obvious as it should be. Things you send directly to friends on Twitter won't be part of the Library's holdings, but if you celebrated a hook-up by announcing it to all and sundry, it now belongs to the ages.
A working group of librarians is figuring out how to “process” this material (to adapt the lingo we used when I worked as an archival technician in the Library's manuscript division) before making the collection available to researchers. But it’s not as if scholars have been waiting around until the collection is ready. Public response to the notion of “Twitter studies” might be incredulous, but the existing literature gives you some idea of what can be done with this giant pulsing mass of random discursive particles.
A reading of the scholarship suggests that individual tweets, as such, are not the focus of very much attention. I suppose the presidential papers of Barack Obama will one day include an annotated edition of postings to his Twitter feed. But that is the exception and not the rule.
Instead, the research, so far, tends to fall into two broad categories. One body focuses on the properties of Twitter as a medium. (Or, what amounts to a variation on the same thing, as one part of an emerging new-media ecosystem.) The other approach involves analyzing gigantic masses of Twitter data to find evidence concerning public opinion or mood.
Before giving a thumbnail account of some of this work – which, as the bibliography I’ve consulted suggests, seems intrinsically interdisciplinary – it may be worth pointing out something mildly paradoxical: the very qualities that make Twitter seem unworthy of study are precisely what render it potentially quite interesting. The spontaneity and impulsiveness of expression it encourages, and the fact that millions of people use it to communicate in ways that often blur the distinction between public and private space, mean that Twitter has generated an almost real-time documentary record of ordinary existence over the past four years.
There may be some value to developing tools for understanding ordinary existence. It is, after all, where we spend most of our time.
Twitter shares properties found in numerous other new-media formats. The term “information stream” is sometimes used to characterize digital communication, of whatever sort. Inside Higher Ed “flows” at the rate of a certain number of articles per day during the workweek. An online scholarly journal, by contrast, will probably trickle. A television network’s website -- or the more manic sort of Twitter feed -- will tend to gush. But the “streaming” principle is the same in any case, and you never step into the same river twice.
A recent paper by Mor Naaman and others from the School of Communication and Information at Rutgers University uses a significant variation on this concept, the “social awareness stream,” to label Twitter and Facebook, among other formats. Social awareness streams, according to Naaman et al., “are typified by three factors distinguishing them from other communication: a) the public (or personal-public) nature of the communication and conversation; b) the brevity of posted content; and, c) a highly connected social space, where most of the information consumption is enabled and driven by articulated online contact networks.”
Understanding those “articulated online contact networks” involves, for one thing, mapping them. And such mapping efforts have been underway since well before Twitter came on the scene. What makes the Twitter “stream” particularly interesting is that – unlike Facebook and other social-network services -- the design of the service permits both reciprocal connections (person A “follows” person B, and vice versa) and one-sided (A follows B, but that’s it). This makes for both strong and weak communicative bonds within networks -- but also among them. And various conventions have emerged to allow Twitter users to signal one another or to urge attention to a particular topic or comment. Besides “retweeting” someone’s message, you can address a particular person (using the @ symbol, like so: @JohnDoe) or index a message by topic (noted with the hashtag, thusly: #topicdujour).
All of this is, of course, familiar enough to anyone who uses Twitter. But it has important implications for just what kind of communication system Twitter fosters. To quote the title of an impressive paper by Haewoon Kwak and three other researchers from the department of computer science at the Korea Advanced Institute of Science and Technology: “What is Twitter, a Social Network or a News Media?” (No sense protesting that “media” is not a singular noun. Best to grind one’s teeth quietly.)
Analyzing almost 42 million user profiles and 106 million tweets, Kwak and colleagues find that Twitter occupies a strange niche that combines elements of both mass media and homophilous social groups. (Homophily is defined as the tendency of people to sustain more contact with those they judge to be similar to themselves than with those who they perceive to be dissimilar.) "Twitters shows a low level of reciprocity," they write. "77.9 percent of user pairs with any link between them are connected one-way, and only 22.1 percent have reciprocal relationships between them.... Previous studies have reported much higher reciprocity on other social networking services: 68 percent on Flickr and 84 percent on Yahoo."
In part, this reflects the presence on Twitter of already established mass-media outlets – not to mention already-famous people who have millions of “followers” without reciprocating. But the researchers find that a system of informal but efficient “retweet trees” also function “as communication channels of information diffusion.” Interest in a given Twitter post can rapidly spread across otherwise disconnected social networks. Kwak’s team found that any retweeted item would “reach an average of 1,000 users no matter what the number of followers is of the original tweet. Once retweeted, a tweet gets retweeted [again] almost instantly on the second, third, and fourth hops away from the source, signifying fast diffusion of information after the first retweet.”
Eventually someone will synthesize these and other analyses of Twitter’s functioning -- along with studies of other institutional and mass-media networks -- and give us some way to understand this post-McLuhanesque cultural system. In the meantime, research is being done on how to use the constant landslide of Twitter messages to gauge public attitudes and mood.
As Brendan O’Connor and his co-authors from Carnegie Mellon University note in a paper published last month, the usual method of conducting a public-opinion poll by telephone can cost tens of thousands of dollars. (Besides, lots of us hang up immediately on the suspicion that it will turn into a telemarketing call.)
Using one billion Twitter messages from 2008 and ’09 as a database, O’Connor and colleagues ran searches for keywords related to politics and the economy, then generated a “sentiment score” based on the lists of 1,600 “positive” and 1,200 “negative” words. They then compared these “text sentiment” findings to the results of more traditional public opinion polls concerning consumer confidence, the election of 2008, and the new president’s approval ratings. They found sufficiently strong correlation to be encouraging -- and noted that by the summer of 2009, when many more people were on Twitter than had been the case in 2008, the text-sentiment results proved a good predictor of consumer confidence levels.
A different methodology was used in “Modeling Public Mood and Emotion: Twitter Sentiment and Socio-Economic Phenomena” by John Bollen of Indiana University and two other authors. They collected all public tweets from August 1 to December 20, 2008 and harvested from them data about the content that could be plugged into “a well-established psychometric instrument, the Profile of Mood States” which “measures six individual dimensions of mood, namely Tension, Depression, Anger, Vigor, Fatigue, and Confusion.” This sounds like something from one of Woody Allen’s better movies.
The data crunching yielded “a six dimensional mood vector” covering the months in question. Which, as luck would have it, coincided with both the financial meltdown and the presidential election of 2008. The resulting graphs are intriguing.
Following the election, the negative moods (Tension, Depression, etc.) fell off. There was “a significant spike in Vigor.” Examination of samples of Twitter traffic showed “a preponderance of tweets expressing high levels of energy and positive sentiments.”
But by December 2008, as the Dow Jones Industrial Average fell to below 9000 points, the charts show a conspicuous rise in Anger -- and an even stronger one for Depression. The researchers write that this may have been an early signal of “what appears to be a populist movement in opposition to the new Obama administration.”
“Tweets may be regarded,” write Bollen and colleagues, “as microscopic instantiations of mood.” And they speculate that the microblogging system may do more than reflect shifts of public temper: “The social network of Twitter may highly affect the dynamics of public sentiment…[O]ur results are suggestive of escalating bursts of mood activity, suggesting that sentiment spreads across network ties.”
As good a reason as any to put this archive of the everyday into the time capsule. And while my perspective on this may be a little off-center, I think it is fair that the Twitter record should be stored at the the Library of Congress, which also houses the papers of the American presidents up through Theodore Roosevelt.
Almost 20 years ago, I started to work there just around the corner from the bound volumes containing, among other things, the diaries of George Washington. The experience of taking a quick look at them was something like a rite of passage for people working in the manuscript division. And to judge by later conversations among colleagues, the experience was usually slightly bewildering.
You would open the volume and gaze at the very page where his hand had guided the quill. You would start to read, expecting deep thoughts, or historical-seeming ones, at any rate. And this, more or less, is what you found on every page:
"Rained today. Three goats died. Need to buy a new plow.”
He had another 85 characters to spare.
Read more by
Opinions on Inside Higher Ed
Inside Higher Ed’s Blog U
What Others Are Reading