How to change writing assessment in a GPT world

You have /5 articles left.
Sign up for a free account or log in.

I think I have a new mantra for how faculty should think about approaching student writing assignments and assessment in this new ChatGPT era.

It’s a bit of a throwback idea, borrowed from MTV’s seminal reality show, The Real World, the tagline used at the end of the opening title and credits: “It’s time to find out what happens when people stop being polite and start getting real.”

This thought was triggered by a recent piece published at Matthew Yglesias’s Slow Boring newsletter written by the newsletter’s intern and current Harvard student Maya Bodnick.

Most Popular

As an experiment, Bodnick fed versions of class assignment prompts from first-year courses into GPT-4 and then had the actual graders for the courses assign scores. To prevent bias, the graders were told the writing could be human or AI, but in reality, everything was written by the AI.

The bot did pretty good, grade-wise:

Microeconomics: A-minus
Macroeconomics: A
Latin American Politics: B-minus
The American Presidency: A
Conflict Resolution: A
Intermediate Spanish: B
Expository Writing: C
Proust Seminar: Pass

The primary initial response to the piece—including my own—was to zero in on the rather uninspiring nature of the assignments themselves, for example, this one from the course in Latin American Politics: “What has caused the many presidential crises in Latin America in recent decades (5-7 pages)?”

While I share the concern of many who look at the prompts and wonder what is going on, it’s important to remember that these assignments are decontextualized from the larger framework of the individual courses. We only know what was shared in the piece, which is not much.

For example, I have some familiarity with the Harvard College Writing Program, which is responsible for the Expos courses, and know that an assignment telling students to write a four- to five-page close reading of Middlemarch without additional context or purpose is not consistent with the ethos that underpins the program.

So, OK. It’s fun to take some shots at Harvard when it seems like they’re not all that, and I reserve the right to do so in perpetuity, but the information made available provides a more interesting opportunity to mine insights on how to operate in a GPT world by looking more closely at these GPT-produced artifacts and the instructor responses.

First, we should recognize a couple of truths: 1. There is no reliable detection of text produced by a large language model. Policing this stuff through technology is a fool’s errand. And 2. While there is much that should be done in terms of assignment design to mitigate the potential misuse of LLMs, it is impossible to GPT-proof an assignment.

This means that the primary focus—as I’ve been saying since I first saw an earlier version of GPT at work—needs to be on how we assess and respond to student writing.

The fact that it’s impossible to GPT-proof an assignment was driven home to me specifically by one of the sample assignments that’s rather close to one I use in my text The Writer’s Practice. In the course on conflict resolution, students are asked to “Describe a conflict in your life and give recommendations for how to negotiate it (7-9 pages).”

In a meta twist, GPT wrote a paper from the POV of a student whose roommate is using generative AI to do his assignments and feels like this is cheating. It earned an A from the instructor, including some very strong praise:

To my ear, the paper is written in a kind of cloying bullshitter tone of a diligent student acting diligently and trying to impress, e.g., “Neil, you see, is an incredible student, brilliant and diligent, with a natural talent for solving complex equations and decoding the mysteries of quantum physics. We’ve been sharing not only our room but also our academic journeys since we were freshmen, supporting each other through all-nighters, exam anxieties, and the odd existential crisis. Yet, in our senior year, I’ve found my faith in him—and in our friendship—shaken.”

I would not call this good writing in any context outside of a school assignment. It’s weird, a put-on to impress a teacher, not a genuine attempt at communication. This is a student saying, “Look how smart I am,” which is not a particularly difficult thing for GPT (or most students) to do.

In order to move away from this kind of performance, it’s time to stop being polite and start getting real.

The most important thing I do in my version of the conflict resolution experience is to change the assignment into three different pieces of writing, completed in sequence.

The first is literally a rant letter, addressed to the person with whom the student is in conflict where I tell students to let them have it, no holds barred. For the student, this exercise serves as a kind of catharsis as they unburden their pent-up anger and resentment at the target (on the page, at least).

Next, I have students exchange rants in a workshop where they are given a process for reading their colleague’s rant and then imagining how the intended recipient of the rant would receive it. The answer in just about every case is: not well.

Here we talk about approaches to conflict resolution, rhetorical sensitivity and how they might analyze the dispute in a way that would craft a win-win solution, rather than engaging in a series of escalations.

After that, they write a second letter to the person they’re in conflict with, this time trying to express understanding of the other’s perspective and then moving the conversation to a territory where that solution might be forged.

But wait, there’s more! The final piece of writing is a short reflective piece where the students analyze their own rhetorical choices, comparing and contrasting the two letters, and then spend time thinking about their own emotional states as they worked on the different pieces. Many realize that while being angry provides a brief and exciting emotional charge, they feel tangibly better when working through the piece on conflict resolution.

Rather than demonstrating content knowledge in the context of a real situation by writing to a teacher (polite), I make students directly address the situation (real). No doubt, my approach is less “academic,” but it requires the application of the same concepts, arguably in a more sophisticated and challenging way.

Another example from the experiment where the “stop being polite and start getting real” framework would add value is the GPT answer to the question about Harry Truman’s presidency.

The style of the response is a true masterclass of pseudoacademic B.S., the elevated tone designed to signal to a teacher that the student is smart, but it also reads like a performance of “studentness” rather than a genuine style coming from a unique intelligence. This is the paper’s opening:

“The American presidency is an emblem of political power and leadership that has been shepherded by a medley of personalities, each carrying distinct ideologies and governing styles. Among the pantheon of American presidents, Harry S. Truman’s tenure stands out as a compelling period of profound successes and notable failures. Truman’s presidential period was framed by a post-war world, a landscape dotted with challenges and opportunities alike. His presidency was marked by pivotal decisions, policy shifts, and ground-breaking initiatives that have continued to echo in the corridors of history. However, alongside his triumphs, his tenure was also characterized by several disappointments and missteps.”

While the prose is fluid and even attempts a kind of style, e.g., “shepherded by a medley of personalities,” once you get past that surface-level fluency, it literally says nothing more than, “Harry Truman did some good things and some bad things.”

This kind of performance has traditionally been highly valued in academic contexts. This looks like diligence and skill but really is exactly that, a performance. My students would eagerly tell me all the different ways they performed for teachers on their writing assignments, making sure to give them the things they were looking for, often surface-level things, like basic transitions, that essentially sent a message: I’m a good student who is paying attention.

This was me. I was a sucker for making sure students used claim verbs when summarizing sources. If you had a claim verb, you got at least a B. If the claim was at all accurate … A.

This bar is far too low, not just because GPT can clear it, but because it fails to give students something substantive to chew on.

This work is all very polite, but it wouldn’t take much to make it real. Simply require the student to develop and express their own opinion on the topic at hand. Ideally it’s more specific than was Truman a good or bad president. Find a prompt or frame that asks students to reflect on the past in the context of what they know and believe about the world.

When it comes right down to it, isn’t this the actual work of scholars?

The last example where I think the “stop being polite and start getting real” framework helps us rethink assessment is in non-A grades—B on the Intermediate Spanish, B-minus on the Latin American Politics, and the C on the Expository Writing.

Again, we don’t have the context to fully evaluate the meaning of the specific grades, but the comments shared by Bodnick suggest that the evaluators found fundamental shortcomings in the writing.

The Spanish professor said the paper had “no analysis.” The Latin American Politics professor says, essentially that the thesis is wrong and unsupported. The Expository Writing instructor again says the effort lacks analysis.

The comments are on target, but a traditional A-through-F grading system allows the pro forma output of GPT to pass. Here’s where we can get real by changing how we view grades.

Rather than waving this performance through, simply require revision until it reaches the specific threshold for passing. This criterion may change from assignment to assignment, but in the above cases, if the goal is for the student to produce analysis, do not accept the assignment for credit until it meets that threshold.

This is where alternative grading strategies work well, because I don’t tell students they’ve “failed.” I tell them they’re not done. If they’ve used GPT to do the work for them, maybe they’re convinced to try doing it themselves next time around and save the hassle.

Or if they’re going to keep using GPT, at the very least they need to be more thoughtful and purposeful about how they’re employing the tool. Maybe they learn some of the principles around critical thinking I’m trying to drive home in the process.

The solutions that Bodnick offers are rooted in a very narrow notion of what school is about and illustrate how deeply the idea of performing for a grade, rather than demonstrating learning is inside the existing system. Trying to make it so GPT can’t be used while maintaining the status quo of what we ask students to do is a failure to take advantage of an opportunity to rethink approaches that already don’t work.

In-person essays or proctored exams are absolutely biased toward proficient performers (and even bullshitters), as the standards for content and analysis are lowered because of the pressures of time. This was the chief reason I gravitated toward classes with these assessments in college.

Why go backward when GPT is giving us a lens to think about new and better ways to engage and teach students?

Let’s be real.

Stop Being Polite and Start Getting Real

Next Story

Written By

Sign up for a free account or log in.

Stop Being Polite and Start Getting Real

Next Story

A Multiday In-Class Essay for the ChatGPT Era

Written By

Share This Article

Sign up for a free account or log in.