You have /5 articles left.
Sign up for a free account or log in.

I wouldn’t declare what I’m about to say as an iron law of the ChatGPT era, something that is guaranteed to be true in all cases, but I’m starting to feel pretty confident in it.

Here goes: if a large language model (like ChatGPT or its brethren) can generate a product similar to or better than humans on the same writing task, that writing task is not worth doing.

And to be clear, I’m not talking about the writing task not being worth doing by the humans and simply outsourcing it to the AI. I’m saying it isn’t worth doing at all.

The examples validating my claim have been mounting since shortly after the first appearance of ChatGPT. The standard high school and college English essays are obsolete—and not a moment too soon from my perspective, as those exercises are examples of what I call “academic cosplay,” producing artifacts that give off an illusion of achievement without being traceable to any substantive learning.

Writing recently at The Chronicle of Higher Education, English professor Michael McClune agrees, declaring that for the vast majority of his 18-year career what he’s been asking his students produce resembles “bullshit … essays that meet all of the official criteria for student writing: They have a thesis; they are polished, coherent and well-argued; they support their points with evidence. They also lack any trace of surprise or originality, make no new connections and are devoid of any striking use of language or evidence of individual human sensibility.”

College application essays, recommendation letters, lesson plans for teachers and more have all been shown to be replicable by ChatGPT and other large language models primarily because they are pro forma exercises absent any original thought or feeling, putting them squarely in the wheelhouse of the AI.

Two fresh examples came across my radar this week. The first is more than capably handled by Dave Karpf at his (highly recommended) newsletter, writing about a working paper on an experiment run by Ethan Mollick (and co-authors) using GPT-4 to enhance the “productivity” of business consultants.

Mollick, a professor of entrepreneurship at Wharton, has become something like the pied piper of generative AI, showing across a number of well-constructed, valid experiments how LLMs (like GPT-4) seem to enhance the work of consultants. The most recent paper that Karpf breaks down involves a series of activities that constitute core consultant work. This is me quoting Karpf, quoting Mollick’s working paper.

“… creative tasks (‘Propose at least 10 ideas for a new shoe targeting an underserved market or sport.’), analytical tasks (‘Segment the footwear industry market based on users.’), writing and marketing tasks (‘Draft a press release marketing copy for your product.’) and persuasiveness tasks (‘Pen an inspirational memo to employees detailing why your product would outshine competitors.’).”

Mollick’s paper finds that GPT-4 enhances both productivity and quality on these tasks, suggesting, in Karpf’s words characterizing Mollick’s conclusions, “(1) the business opportunities are phenomenal and (2) the people who get rich will be the first-movers who really develop their skills in this grand new landscape.”

But Karpf has a different interpretation, one that I share: “an alternate reading would be something like ‘hey! I hear you think A.I. is a bullshit generator. Well, we gave a whole profession of bullshit generators access to A.I. and you’ll never believe how much more productive they became at generating bullshit! This is such a big deal for the Future of Work!’”

During my forays into market research, as researchers we were sometimes tasked with designing studies that could sift through this B.S. to determine what, if any of the B.S. generated by the methods and thinking Mollick is replicating had any utility in actual marketplaces, and often the answer to question of how much utility was “not much.”

Regardless, it was impossible to say if the output Mollick’s subjects are producing has any value unless and until it is put to an actual test in the market, so it is impossible to say whether or not generating more of the stuff is necessarily a good thing.

As Karpf says, the technology is mostly a way for consultants like McKinsey to produce bullshit at scale. Applying my law of GPT production, we can pretty much trust that collectively we’d be better off if there was much less of this stuff in aggregate, not more.

The other example was provided by another carefully designed study that looked at whether or not human raters could tell the difference between feedback on student writing generated by people versus feedback generated by ChatGPT.

Student writing from grades ranging from six to 12 was produced during two in-class periods of 50 minutes, responding to one of two prompts[1], utilizing source materials provided to them. Human raters were tasked to use rubrics designed to provide actionable, encouraging and formative feedback.

Long story short, as measured against the criteria used to evaluate the feedback, there was very little difference between that generated by humans and that generated by ChatGPT.

The rub is that, if the goal is to help students actually improve as writers and thinkers, speaking as someone who has evaluated thousands of student essays and puzzled on the problem of helping students develop as writers, to the point of writing two books (with a third on the way) about this challenge, the feedback is useless.

It is an example of the kind of mutual academic cosplay interchange between students and teachers when the work is rooted in inauthentic, uninspiring, unchallenging writing tasks meant to satisfy in a culture of standardization and assessment. The student produces something by rote formula and the teacher responds (via rubric) with a formula of their own.

This is not learning.

We can be thankful that the study was conducted so rigorously and so carefully because it clearly demonstrates that if learning, as opposed to academic cosplay, is the goal, none of this stuff is worth doing.

Given the focus of my work over the last decade or so, I didn’t need convincing that this was the case. In my teaching career, I produced volumes of pro forma feedback that followed all of the recommended criteria, dutifully responding to student writing with words that were part of a performance but had little relationship to meaningful educational experiences. I know I’m not alone in feeling this way. Michael McClune’s—I don’t know what to call it … “confession”?—suggests he’s harbored the same doubts for over a decade.

I’m trying to wrap my head around being excited about automating work that is actively not worth doing by anyone or anything.

I’m a bit of a broken record on this front, but ChatGPT is an opportunity to use the technology as a lens to see more clearly what work is worth doing and what work should be set aside. If ChatGPT can do it, this is a very strong indication that we should do something differently.

Instead, it’s likely we’ll go the other direction, primarily because there are dollars to be made and efficiencies to be gained. Karpf lays out a chilling blueprint for how the AI tools will be wielded by start-up culture to “colonize” various spaces, hoovering up money in the process.

Automated feedback on student writing has been a kind of holy grail for ed-tech companies for a long time, significantly predating the arrival of the large language models that make such things achievable. This tech will be sold as tools to liberate teachers from the drudge work, but as I’ve argued multiple times here before, the embrace of automated feedback to student writing will enshrine a whole host of literally inhuman practices, allowing the cosplay to stand in for the real.

It should be rejected outright on ethical and moral grounds, but this new study also shows its even worse effect—of enshrining the avoidance of original or interesting thought as the central goal of writing in order to please the rubric.

I almost can’t imagine a worse future, and it may be just around the corner unless we wise up pretty quickly.

[1] “How did the Delano grape strike and boycott succeed?” or “How did the Montgomery bus boycott succeed?”

Next Story

Written By