I often say that I’d teach for free but need to be paid to grade.
Grading is hard work. It’s not only time-consuming, but it is stressful, exhausting and at times even agonizing. All too often, grading triggers conflicts with students.
What if there was a stress-free way to grade? Could artificial intelligence take the pain out of grading?
If you ask a text generator to grade an essay, you’ll be surprised by the quality of the feedback. I am afraid that the comments will exceed the quality of those students typically receive. The feedback will be more specific, detailed, comprehensive and constructive than what instructors customarily provide.
The reason is simple: robo-graders rely on explicit rubrics that spell out the standards of evaluation.
Too often, time-pressed, cross-pressured, overwhelmed faculty members and teaching assistants grade essays through instinct.
A grading rubric divides an assignment into a series of component parts and identifies detailed criteria for evaluating student performance along each dimension. Rubrics are designed to make the grading process more transparent and objective by communicating to students what’s expected of them. They also align course assignments with an instructor’s learning objectives.
As assessment specialist Barbara E. Walvoord explains,
“Rubrics are also especially helpful for three groups of students: first-generation college students, students who didn’t come from elite high schools and students who aren’t majoring in your field. In other words, the majority of your classroom can benefit from a clear, comprehensible statement of what makes for an A, C or F paper as far as sources, arguments, mechanics and the like.”
Rubrics clarify an instructor’s standards and expectations. They help ensure consistency in assessing student work. They show students why they received a particular grade. Most important of all, rubrics shift attention to an assignment’s most important learning objectives.
If you and I are to do a better job of grading than ChatGPT or other auto-graders and if we are to provide at scale the constructive feedback that students need, we have a lot to learn from the AI-powered robo-graders.
The most important lesson: develop a rubric appropriate for a particular assignment and then use that rubric religiously.
At their most basic level, automated essay-scoring systems focus on mechanics: grammar, word choice, word usage, lexical and syntactic sophistication, clarity, style, and voice. But these tools can also grade at a higher level, evaluating relevance to a prompt, text cohesion and coherence, use of evidence and examples, organization and structure, and development of ideas.
A growing body of research suggests that automated grading systems demonstrate greater consistency, reliability and validity than human graders.
If you and I are to grade better than an auto-grader, we need to think seriously about what our writing assignments are designed to accomplish. We must also ask an epistemological question: How do we know whether students have actually met our expectations?
I am a historian, and in evaluating a history essay I look for certain things that automated essay scoring isn’t yet very good at.
- Whether an essay has a succinct thesis statement that is original, imaginative and sophisticated and that builds on existing scholarship.
- Whether an argument is developed with nuance and a respect for historical complexity.
- Whether the essay supports its argument with evidence.
- Whether the essay situates its arguments in an appropriate context.
- Whether the essay considers counterarguments and alternative perspectives.
- Whether the essay is engagingly written.
Everyone who grades or evaluates student work should create a rubric, preferably in collaboration with their students. By making your evaluation criteria explicit, a rubric helps clarify your expectations and ensures greater consistency, fairness and objectivity in grading while minimizing subjectivity.
A well-designed rubric can also highlight a student’s areas of strength and areas needing improvement. It can help you provide specific, actionable, constructive feedback by showing students why they received the grade they did. In cases where students question or dispute their grade, rubrics can provide a concrete foundation for this discussion, focusing on specific aspects of their work rather than on general impressions of an essay’s quality.
Here’s a sample rubric that I use to evaluate a history essay.
- Title: Is it engaging, relevant and reflective of the essay’s content?
- Introduction: How effectively does the essay introduce the topic and engage the reader? Does it provide a clear thesis statement?
- Thesis and argument: Does the essay take a definitive stance? Are the essay’s thesis statement and argument clear original, creative and insightful?
- Organization: Is the argument developed in a logical, coherent, well-reasoned manner and are transitions between paragraphs and sentences smooth?
- Accuracy: Is the information presented factually correct?
- Research depth: Is the research base sufficient in depth and breadth to support the essay’s argument?
- Use of evidence: Is the essay’s evidence accurate, credible, relevant and sufficient in quantity? Does the author evaluate the evidence’s validity, reliability and biases? Is the evidence properly cited?
- Integration of sources: How smoothly are the sources integrated into the text? Does the evidence support the argument? Are the sources properly explained, evaluated and interpreted?
- Depth of analysis: Does the essay demonstrate a sophisticated and nuanced understanding of the topic’s complexity? Does it provide original insights or an innovative interpretation and consider the material from multiple perspectives?
- Contextualization: How well does the essay locate its argument within the broader historical, literary, theoretical or social context?
- Conclusion: How effectively does the conclusion summarize the essay’s main points and provide closure? Does it draw broader conclusions, articulate the essay’s relevance or significance, or suggest avenues for further investigation?
- Persuasiveness: How convincingly and compellingly does the essay support its thesis?
- Critical thinking: Does the essay go beyond surface-level analysis and apply higher-level thinking skills to the material? Does it, for example, draw comparisons and contrasts, render sophisticated judgments, critique or defend various arguments, effectively synthesize information, construct compelling generalizations, and apply existing knowledge, methods and theories in a new context?
- Sophistication: Does the essay reflect an understanding of multiple and alternative viewpoints and counterarguments?
- Significance: Do the essay’s arguments and conclusions adequately recognize the topic’s implications, importance, relevance and consequences?
- Reflectivity: Does the essay reflect on the limitations of the argument or approach, including potential biases, gaps in evidence or areas for further research.
- Writing quality: Are the essay’s use of grammar, word choice and usage, mechanics, style, lexical and syntactic sophistication, clarity, style, and voice correct and appropriate?
Too many categories? Perhaps. But these do convey the dimensions that most concern me.
There are those who disparage rubrics, claiming that they are too rigid, forcing students to conform to a procrustean bed of unbending and inflexible expectations. Critics also argue that rubrics don’t do a good job of measuring improvement or hard-to-assess skills like grit, collaboration, curiosity and initiative.
But I think that view is misguided. Rubrics help make grading less arbitrary. They also make it easier to provide “meaningful and timely feedback” and promote “self-regulated and independent learning.”
Rubrics “share the rules of the game,” allowing our students to peek behind the curtain and better understand what we look for in an assignment.
Isn’t it ironic that software engineers and artificial intelligence and machine learning specialists have taken the lead in designing sophisticated tools for evaluating student writing? Their instruments can help students diagnose writing problems and then make their written arguments with greater precision, clarity, logic and depth.
Instead of treating text generators as adversaries, I think we’d do better to treat them as allies that can help provide the kinds of constructive feedback that students need—and, far too often, don’t get.
No, instructors can’t yet hand off essay grading to machines. But our role as graders and classroom teachers should change. We need to provide the kind of comments that AI can’t, whether that means writing with style, grace and elegance or making arguments that are more sophisticated, complex and compelling.