Academics must collaborate to develop guidelines for ChatGPT (opinion)

Seeing Past the Dazzle of ChatGPT

To help put text generators in the proper perspective, we need to turn toward each other to determine guidelines for the use of such tools, Anna Mills writes.

You have /5 articles left.
Sign up for a free account or log in.

sixtwenty studio/istock/getty images plus

Some academics are worried and some are excited that AI text generators such as the newly minted ChatGPT and the latest version of GPT-3 can produce ever more passable prose. This is the moment to notice how the isolating, hypnotic process of getting to know this technology can influence our judgments about how it should be used.

As a teacher of writing for some 20 years, I was shocked when I first experimented with GPT-3, a text generator designed by OpenAI and licensed by Microsoft. I had only to suggest a topic and—presto—fluent, even elegant sentences filled the screen. For example, I wrote, “Watching waves, we often feel mesmerized. Brain science is only beginning to glimpse why the movements of water resonate so deeply.” GPT-3 continued thus:

Much of the answer lies in the fact that water is the stuff of life, a substance that fills the spaces between the cells of all living things. At a more basic level, water is the substance of all matter, a compound that, as the poet e. e. cummings once wrote, “is more wonderful than death.”

I recognized some flaws in this text (more about those later), but it wasn’t copied. What stood out were the humanlike clarity and relevance. So what now? This technology is available to any student who cares to make a free trial account.

I started experimenting with GPT-3 in OpenAI’s “playground,” ostensibly to see what it meant for teaching composition in higher ed. But it affected me strangely. In the weeks after my first encounter, I felt a constant undercurrent of adrenaline—as if an attractive person were leveling a desiring gaze on me. I had to cut back from four cups of black tea a day to one; my body and mind were too revved up.

Each time I waited for GPT-3 to produce, I was hoping it would astound me. When it flopped, I sighed and tweaked my prompt. The speed at which the model composed fluent sentences was exhilarating. What a contrast to the masochistic persistence I had practiced for so many years and preached to my struggling students.

But more than that, my assignations with GPT-3 felt intimate. I could ask it for anything, as if it were a priest or a shrink. With each pleasing or useful generation of text, I felt I had been met at a deep level. Even the opacity of GPT-3 made it more alluring. It ingested my prompt and immediately offered a “completion” that, at least some of the time, felt nothing short of oracular.

I might have wow-tweeted about GPT-3 and then ChatGPT’s prowess indefinitely had not a fellow teacher called me out as a promoter of AI hype. This Twitter denizen labels her account “Madame Pratolungo … Where Big Tech + artificial intelligence meet history and critical thinking.” I like to refer to her as Madame Incognito, since she keeps her role in academe separate from her sometimes fiery tweets.

It was she who pushed me to acknowledge GPT-3’s statistical nature. As I gradually came to recognize, GPT-3 was—in the words of sociologist Alex Hanna and computational linguist Emily Bender—less an artificial “intelligence” than a “mathy math,” a model of language sequences built by “scraping” the internet and then, with massive computing, “training” the model to predict the sequence of words most likely to follow a user’s prompt. Such large language models (LLMs) can mimic language patterns, but they cannot understand them—much less intentionally craft sentences in order to share an idea or a feeling. That kind of “AI,” at least for now, exists only in fiction.

I thought Madame Incognito was perhaps overstating the degree to which I had fallen prey to “the Eliza effect”—the tendency, first observed in the 1960s, for humans to attribute sentience to any entity that appears to “speak” sensible language. But I couldn’t deny there was some projection involved in my quasi-erotic feelings for GPT-3.

Madame Incognito pointed me to critical perspectives from Sasha Costanza-Chock, Kate Crawford, Gary Marcus, Safiya U. Noble and others. I learned that LLMs are biased because they mimic the stereotypes and preferences of data scraped from internet sites hardly renowned for their wisdom or objectivity. Training large models involves a significant carbon footprint; and, as Meredith Whittaker observes, “AI” today reflects the concentrated power of the few companies that have access to troves of data and vast computational resources.

In my own experiments, I saw that GPT-3 often contradicted itself and made logical errors—as does ChatGPT. Since it had no ability to check the accuracy of the word sequences it generated, it made up sources, facts and figures. Take, for example, that quote mentioned above from E. E. Cummings, the famous poet. It turns out that Cummings never called water “more wonderful than death.” In a society already struggling with fake news, a plausible fabrication like that, I realized, is more deceptive and dangerous than dumb doozies like “water is the substance of all matter.”

Collaborating for Clarity

Given all these considerations, what should I, as someone newly in the know about language models, recommend to my students, other teachers and my college? Should I ask students to prompt a language model and then critique its output? Learning to spot the shortcomings of automated outputs might boost student confidence and help them clarify their ideas. Or would it just distract them? As Madame Incognito points out, assigning students to read text-generator prose has opportunity costs: it means they read less prose by humans.

If I have a crush on text generators, I also have an enduring love for the organic writing process. I have long preached that writing struggles have purpose and meaning. In my textbook How Arguments Work: A Guide to Writing and Analyzing Texts in College, I frame academic reading and writing as a mode of slow thinking, an antidote to snap judgments. If LLMs help to remove some of the friction from our writing, will we get the same quality of thought? And whose thoughts will be blending with our own?

When I considered improvising with GPT-3 in the classroom, I felt a thrill at the thought of showcasing such a powerful tool. But I also had an intuition that I shouldn’t, at least not until I was clearer about why I was doing it. At first, I had been confident that through my tête-à-tête with GPT-3, I could come up with a solid stance worth sharing. Now I was coming to realize that the lone-cowgirl approach wouldn’t cut it.

On social media, in magazines and in newsletters, I could see that the idea of teaching with and about language models was as charged for many others as it was for me. Was the idea of a new quest or intellectual puzzle simply irresistible? Surely a desire to be personally associated with something cutting-edge played a role as well. Maybe all these individual takes were not the only or the best way forward.

I have come to think we need to turn toward each other and toward our existing networks to figure out guidelines for text-generator use. We need to acknowledge our emotional reactions and help each other to get distance from them. Given the complexity of the ethical, legal and social issues that LLMs touch upon, we need collaborative processes to seek clarity.

In The New Laws of Robotics, legal scholar Frank Pasquale argues for guidance from professional organizations about whether and how to use data-driven statistical models in domains such as education or health care. His vision is that democratic oversight of such technologies should grow out of research and deep discussion among experts. What we don’t want is a future driven by technophile crushes or the marketing hype emanating from behemoth corporations.

Despite my continuing attraction to GPT-3 and its cousin ChatGPT, neither is a partner to the shaping of this essay. Rather, it is Madame Incognito and others who have helped me along. I am tempted to ask ChatGPT for an ending—I imagine it could come up with a zinger. But its performance isn’t the point. The point is what we do with it. What ends will it serve?