The leap in capability between ChatGPT (GPT-3.5) and the recently released GPT-4 model from OpenAI is pretty astounding, particularly considering it has only been a handful of months since ChatGPT came into the wider world.
For example, GPT-4 scores near the 90th percentile on the LSAT and the bar exam. In fact, it knocks out strong scores on just about any standardized test. OpenAI has made its app programming interface available to users, kicking off a race between OpenAI and competitors like Google, Meta and Anthropic (a lesser known company started by former OpenAI folks) to capture the market for the integration of large language model (LLM) systems into existing applications like Duolingo and Khan Academy and basically everything else.
We are going to see chat-bot technology everywhere very quickly, including education.
I would like to propose a framework for thinking about the inevitable deluge of LLM-enhanced education technology as a way to prevent being swept away by the fervor attached to these developments.
Beware the Education-Technology Hype Cycle
Remember MOOCs? They were going to revolutionize education. Did they? They did not. Did some institutions dump millions of dollars into a fantasy? Yes. Were any lessons learned from just the latest example of technology hype not living up to reality?
Time will tell.
Developers will be coming with their products, making all manner of promises around increased engagement, enhanced learning and superior outcomes, and it will be sorely tempting to believe that a failure to get on the hype train means falling behind, but it is important to know that none of these promises have yet been proven true. None of them have been tested, and it’s unlikely that none of them are silver bullets that solve any existing problems in education.
Resist Believing in the Existence of a ‘Teaching Machine’
The first part of this recommendation is to read Audrey Watters’s Teaching Machines: The History of Personalized Learning, which covers the long arc of people believing that machines can—and even should—replace the human labor of teaching.
Watters shows how the failures of the various attempts at creating a teaching machine fail not necessarily because the technology is not yet ripe, but because the very mind-set that leads to one believing in the possibility of a teaching machine is at odds with the human experience that is life and learning.
To Define Learning Down to What a Machine Can Do Is to Miss Crucial Elements of the Experience
The amazing feats of GPT-4 make it tempting to believe that this time the technology is ripe for the creation of a teaching machine, but this is the kind of hubris that has led to the string of failures riddled across history.
Sure, maybe this is the time, but there’s no urgency to throw one’s institution or students into the maw in order to test that proposition. Let the technology prove itself first.
Connect With Your Root Values
Passing the legal bar exam is an impressive feat for a syntax assembly machine, but rather than saying, “OMG, AI can be a lawyer now!” we should instead be questioning whether or not the things we ask our human students to do are genuinely meaningful experiences related to learning.
The bar exam is a good example of a barrier to entry into a guild that doesn’t actually have all that much to do with the actual practice of law. Most lawyers will tell you they remember very little from the exam itself.
This should be an occasion to examine why we ask students to do what we ask them to do.
This has been the focus of my thinking when it comes to teaching writing since long before ChatGPT revealed to the world that the five-paragraph essay format is disconnected from the kinds of experiences that help students learn to write.
I wouldn’t say this is an ironclad rule, but in general, if GPT-4 can do something, we should be looking hard at whether or not it’s something our human students should continue to do.
If it is something students should continue to do, we should be considering what aspects of the experience we should be valuing in terms of assessment of and response to student work.
More about that in my next point.
Do not mistake that which could be done by the technology with that which should be done by the technology.
One of the areas where I expect LLM-enabled tech to show up pretty quickly is in grading student writing. Prior to the LLM models, we had semireliable machine language “scorers” of student writing that could even offer some measure of feedback as long as students were responding inside of closed-universe prompts and formats.
LLMs massively expand the size of that universe and, even without training, can offer plausible (if not always correct) feedback on a piece of writing.
For an example, you can scroll toward the bottom of a post from my newsletter in which I prompted ChatGPT to produce a steadily improving beginning to a short story and then asked it which of two versions was superior and why.
I would not have believed such a thing possible prior to ChatGPT, but there it is. It is going to be sorely tempting to allow this kind of feedback to substitute for human response to student writing and I’m even willing to grant there’s some occasions where AI responses may be genuinely helpful to student learning.
But I’ll also repeat a couple of things I say in Why They Can’t Write. One is that we should not ask students to produce writing that is not going to be read, and LLMs cannot “read.” They cannot think. They do not have human, emotional reactions to text.
Two, because writing is an embodied process and writing is thinking, the best feedback on student writing is not summative—which LLMs will do passably—but formative, where the instructor can help the student reconsider and reflect upon some part of their process. Asking a writing teacher to do this where they have not read the student work is like asking a coach to work with a team where they know the score but have not watched the game itself.
It does not make sense in a world where learning, rather than cost or efficiency, is the root value.
In a lot of ways, I think this technology will be a test of how much we value our own humanity. The companies developing these technologies appear willing to take big risks when it comes to the safety of releasing this technology in the name of winning the arms race and turning a profit.
In a recent episode of the Ezra Klein Show podcast, Klein and his guest, tech journalist Kelsey Piper, suggest that the direction of development for this artificial intelligence will follow the money—a good bet—and then speculate on a number of different ways that the AI could escape its controls and cause significant havoc.
It’s not Skynet sending a Terminator, but it’s not not that, either.
It’s a dark thought, and it’s not something most of us have any control over, so my hope is that the people with the power to regulate this technology are working quickly to put some governors around its use. Given that some observers see this AI as the most profound threat to humanity to emerge since the advent of the atomic bomb, this seems pretty urgent.
Where we do have control, we have to be as mindful as possible before leaping into the unknown.
Perhaps this means embracing something like a Hippocratic oath for the adoption of this technology: first do no harm.
This is actually not that hard to achieve, but it will mean resisting the tide that insists the future is inevitably heading in a particular direction. Last I checked, we’re the humans and still in charge and absolutely have the power to unplug the machines.