Essay on the digital humanities' data problem

You have /5 articles left.
Sign up for a free account or log in.

In 2010, the National Science Foundation and National Endowment for the Arts convened a historic workshop -- it was their first jointly funded project. This meeting marked the beginning of a new level of national conversation about how computer science and other STEM disciplines can work productively with arts and design in research, creation, education, and economic development. A number of projects and follow-up workshops resulted in 2011. I was lucky enough to attend three of these events and, in the midst of all the exciting follow-up conversations, I couldn't help but wonder: What about the digital humanities?

After all, the digital humanities have made it now. A recent visualization from University College London shows more than 100 digital humanities centers spread across the globe. There are dedicated digital humanities funding groups within the National Endowment for the Humanities and Microsoft Research. The University of Minnesota Press published a book of Debates in the Digital Humanities in January.

So why doesn't the digital humanities have more of a seat at the table? Why is there the stereotype that, while computer scientists and digital artists have much to discuss, digital humanists only want to talk about data mining with the former and data visualization with the latter? I believe it is because the perception has developed, helped along by many in the field itself, that digital humanities is primarily about data.

Certainly a grasp of data -- the historical record, our cultural heritage -- is a great strength of the humanities. But in the digital world, the storage, mining, and visualization of large amounts of data is just one small corner of the vast space of possibility and consequence opened by new computational processes -- the machines made of software that operate within our phones, laptops, and cloud servers.

A key experience in my journey to understanding this began with a debate about James Meehan's Tale-Spin, the first major story generation system. I had always been basically uninterested in Tale-Spin, though I knew it was considered a landmark on the computer science end of electronic literature. I simply didn't get excited by the stories I had seen reprinted in the many scholarly discussions of the system.

During the debate it became clear that I would have to look a little deeper. When I looked at Tale-Spin's computational processes, what I found was surprising and complex, as evocative and strange as any of Calvino's invisible cities. Tale-Spin operates according to rules constructed as a simulation of human behavior, built according to cognitive science ideas that were current at Yale in the mid-1970s, when it was designed. For example, in this model, when characters interact, they take elaborate psychological actions, projecting multiple possible worlds to see if any course of action might create a world they desire.

In short, I learned that it is Tale-Spin's processes that have the literary value, creating a fictional world that gets its fascinating strangeness from taking a recognizable aspect of human behavior, exaggerating it, and stripping away almost everything else -- answering the question, "What would fiction look like if we accept the model of humanity being proposed by this kind of cognitive science?" More broadly, reading the processes of Tale-Spin also helped me think about the limits of simulations of human behavior, even those informed by the most recent scientific ideas, as well as how ideas and biases can be encoded in software in ways that are invisible to those who only see the output.

Finally, it helped me learn an important lesson about making media: fascinating, successful, hidden processes do little to make the audience experience stronger. As a result of these realizations I had to apologize to colleagues for dismissing Tale-Spin -- and my fascination with the project grew until it became a central object of study for my book Expressive Processing.

Over the years since, it has become clear to me that there are many other processes that cry out for attention. All the tools of our software society, from the document-crafting Microsoft Word to the architecture-designing AutoCAD, are enabled and defined by processes. Software processes operate Walmart's procurement system and Homeland Security's terrorist watch list. The interactivity of mobile apps and websites and video games is created through the design of processes. In other words, it is human-designed and human-interpretable computational processes that enable software to shape our daily work, our homes, our economy, our interpersonal communication, and our new forms of art and media. Processes even enable the data mining that drives much digital humanities work (and Amazon's recommendation system).

For these reasons and more, when computer scientists and digital artists get together, most of what they talk about is novel processes. Why invite digital humanists, if they're going to keep dragging the conversation back to data?

Of course, this stereotype is a distortion of the history and present of humanist engagement with the digital world, but it passes for truth far too often. Something needs to be done to fight it. I believe all of us with a stake in the future of the digital humanities -- and perhaps more of us have a stake than realize it at the moment -- should push for a vision of the field that acknowledges that it has never simply been about data. Here are two areas where I think pressure is particularly important.

First, the humanities is not simply defined by the data it has mastered. Whether in literature, philosophy, media studies, or some other discipline, humanists understand the data they study through particular methods. Two decades ago Phil Agre powerfully demonstrated that humanities methods could shed important new light on software processes. In his Computation and Human Experience, he performs close readings of computational systems and situates them within histories of thought. His analysis serves a primary humanities mission of helping us understand the world in which we live, while also helping reveal sources of recurring patterns of difficulty for computer scientists working in AI.

It is an early example of what is now increasingly being called "software studies" -- a tradition in which my work on Tale-Spin participates. In software studies, humanities methods and values engage with the specific workings of computational processes. This sort of approach has the potential to become an exciting point of connection between the humanities and computer science, both pedagogically (as a route to the "computational thinking" that is increasingly being put forward as a key component of 21st-century general education) and as a critical and ethical complement to the models of interpreting processes found in most computer science.

The good news is that work of this sort is already becoming more established, with the MIT Press having recently founded both a book series for software studies and one for its sibling "platform studies" (which focuses on the material conditions that shape and inspire the authoring of computational processes). The promise of software studies is that the digital humanities can be central to one of the most pressing issues of our time: helping us both to understand and to live as informed, ethical people within a world increasingly defined and driven by software.

And we can also go further, helping to create this world. More than a quarter-century ago, Brenda Laurel's dissertation established how deep knowledge of subject matter developed within the humanities -- in Laurel's case, classical drama -- could be used to inform the design of new technologies. Laurel became a leading creator and theorist of digital media by adapting insights and models from a long history of humanities scholarship on the arts. Such work is, if anything, even more vital today -- and is the second area of digital humanities which I believe we should press forward. With the rise of computer games as a cultural and educational form (along with other emerging media technologies) computer scientists are increasingly being called, both in universities and industry, to develop computational processes that make new forms of media possible.

But computer science has no knowledge or methods appropriate for guiding or evaluating the primary, media-focused aspects of this work. Computer science's next level of dialogue with the digital arts community is certainly encouraging, but there is also an essential role for the humanities to play in both contributing to innovative media technology projects and helping set the agenda. Unfortunately, unlike software studies, this area of digital humanities work does not yet have a name and is often not even identified as humanities, despite its deep grounding in humanities knowledge and methods (the scholars involved generally also have identities as digital artists/designers or computer scientists).

But the importance of addressing this lack is becoming clear. In fact, I am happy to announce that an unprecedented group of partners (including the NSF, NEH, NEA, and Microsoft) have stepped forward to help convene a workshop on this topic that Michael Mateas, Chaim Gingold, and I will host at UC Santa Cruz later this year. Our planned outcomes range from developing a greater understanding of this area of digital humanities to matchmaking a set of projects that are explicitly at the intersection of computer science, digital arts, and digital humanities.

Now for the bad news. Unfortunately, as digital humanities is coming to public consciousness, the vision of the field being put forth in the most high-profile venues leaves out entirely such possibilities as these. In January, Stanley Fish wrote in The New York Times that digital humanities is concerned with "matters of statistical frequency and pattern," and summarized digital humanities methodology as "first you run the numbers, and then you see if they prompt an interpretive hypothesis." Earlier in January, at the Modern Language Association mega-conference, a workshop on Getting Started in Digital Humanities suggested that the field's promise lies in the fact that "Scholars can now computationally analyze entire corpora of texts or preserve and share materials through digital archives."

How will digital humanities ever come to be something more diverse and relevant if both detractors and supporters seem to agree that its sole focus is storing and analyzing data? I believe digital humanists must begin by recognizing and developing important areas of work, already part of the field's history, that such conceptions marginalize. And those in the field must see these areas as important places for digital humanities to grow, even if they lie beyond the narrow confines of the wall digital humanists are inadvertently helping build around themselves.