It Seemed Like a Good Idea at the Time

Scott McLemee explores the Library of Congress’s recent announcement that it will no longer be a repository of every public posting to Twitter.

January 5, 2018
The Library of Congress

The Library of Congress has the reputation of holding a copy of every book ever published, or at least every book published in the United States -- a reputation that is invalid, however, and that persists in spite of the institution’s efforts to correct it. The collection is huge, a bibliomane's utopia, but it has never claimed to be exhaustive. Indiscriminate accumulation is a sign of hoarding, not of librarianship.

But an exception was made over the past seven years as the LC tried to create a repository of every public posting to Twitter. That experiment is now over. Henceforth, according to a white paper issued in late December, the library will “acquire tweets but will do so on a very selective basis,” in accord with its wider digital-collections policy. A lot goes unsaid in the document, which is perhaps best understood as a sign that the LC is finally getting its bearings again after a long period of erratic leadership.

As noted in this column a few weeks after the project was announced in April 2010, Twitter's initial gift to the library was a complete set of public posts from the social media platform's first four years -- some 21 billion tweets. (Private messages between users were not included.) Going forward, the collection would be supplemented by new batches of tweets that could be made available to library patrons at least six months after they had been tweeted. At that stage about 30 million users had Twitter accounts and produced an average of 50 million new tweets per day. Both figures have increased tenfold since then. And while there is no way to know how many human beings are actually behind the accounts, or how much of the content is computer generated, Twitter itself has grown so ubiquitous as to be a factor in the lives even of people who never use it. We will remember 2017 as the year when a Twitter message leading to war began to seem like a matter of time.

Meanwhile, the archive has been in limbo. Five years ago, an update on the Library of Congress’s blog announced that the work of establishing “a secure, sustainable process for receiving and preserving a daily, ongoing stream of tweets through the present day” was within a month of completion, along with “a structure for organizing the entire archive by date.” I returned to the subject in 2015 with column about a researcher from Germany who received a fellowship to work with the collection -- only to learn that it still wasn’t available for her to study. The recent white paper is at least candidly noncommittal about when, if ever, the archive will be open for use: “The Twitter collection will remain embargoed until access issues can be resolved … There is no projected timetable for providing public access at this time.”

When a mission fails, one possibility is to redefine it retroactively. "The library now has a secure collection of tweet text," says the white paper, "documenting the first 12 years (2006-17) of this dynamic communications channel -- its emergence, its applications and its evolution." And on those terms, the archive is complete, if also completely useless. When a collection is too huge for search and retrieval, being "secure" just means it's unavailable. Conversely, the decision to curate the Twitter stream "on a very selective basis" -- with an emphasis on "events such as elections, or themes of ongoing national interest, e.g. public policy," comes at a time when this will mean duplicating the efforts of other institutions with a vested interest in preserving the record. The president's tweets, for example, fall under the purview of the National Archives.

In retrospect, the decision to acquire the Twitter archive may go down in the record as an example of the problems that beset the final decade (at least) of James Billington’s tenure as librarian of Congress from 1987 to 2015. A report by the Government Accountability Office issued during Billington's final year found significant deficiencies in how the library managed its information technology resources. He appointed the library's first chief information officer only a short time before his own retirement. Acquiring the Twitter archive in 2010 must have seemed like a gesture that would wave off all those complaining that the LC was falling behind. Under good leadership, the LC might have assessed the problems created by the initial Twitter acquisition and gone on to develop the tools and policy needed to create a useful collection. Probably the best thing the institution could do now is to invite scholars to study the records of the whole episode, to see when it went hopelessly wrong and whether it offers any lessons by negative example.


Be the first to know.
Get our free daily newsletter.


Back to Top