Ethics of Grading III: Revisiting the Question of Who (What?) Does Grading
The question of whether computers should grade work is a question of professional ethics. Those who teach should be clear about what makes them professionals: their ability to judge.
In the past several days we’ve heard about automated essay grading software from edX, my example company in my recent post on MOOCs, and the still-amorphous Minerva, a concept (I can’t bring myself to call it a start-up yet) to create an elite online college.
I will reiterate my previous comment that I like the idea of MOOCs. And I actually wrote something about 12 years ago when I was a doctoral student arguing that there was a market opportunity to start new elite higher education institutions, so clearly I like that idea as well. But it’s all about the business model, and the question of for whom value is being created: it bears repeating that business models are amazingly ethics-laden. Grading of written work by computer does not serve students, particularly in the absence of professors with whom to discuss the reasons for a particular grade; what was done well; and where to improve, not only in terms of content but also structure, concept, argument, even creativity. Which means faculty need to read most kinds of written work.
The more complex the assignment, which is to some extent linked to the complexity of the subject matter/field and level of the course, the more important the expertise of the grader. I continue to believe that grading takes experience, and is one of the most nuanced things faculty do; it is the kind of integrative task that cannot readily be replicated by a machine -- or delegated away.
And here is why it is important to distinguish among assessment goals rather than assessment types per se, because contrary to some claims, it is possible to assess first-order critical thinking skills with machine scored multiple-choice, true-false, matching, and other item types. Such skills include, as a few examples, location and use of evidence; ability to relate an example or instance to a concept; analogical reasoning; alignment and internal consistency (e.g., between question and response or argument elements); specificity (differentiation) and relevance; categorization and hierarchical ordering, and others, including rudimentary forms of lateral thinking. Doing so requires a case-based or other foundational reading/problem approach, and the application of sound test construction practices. But it works, and I do it (including the machine scoring part) all the time for testing basic application of skills and concepts. This is a far cry from testing declarative -- what I refer to as spit-back -- or even simple procedural knowledge.
What this type of testing cannot properly assess is the higher-order thinking sought in original (student-constructed), creative, integrative application of learning; quality of research/evidence; structure and support of arguments; the many aspects of appropriateness to audience; the value, quality, and suitable use of graphics such as models, graphs, tables, and other items that may contain important elements of analysis or argument. For example, I teach innovation, and I don’t see how it could evaluate a business opportunity -- not even something as limited as the financial aspects.
Or, more important, how a computer could coach/question students throughout the innovation process leading up to their final written submission. I teach leadership, and it is difficult to see how a computer would evaluate an individual’s grounded model of leadership. I teach ethics, and I am not sure how a computer would evaluate the way a student represents his meaning-making around the relationship among values, beliefs, ethics, morality, integrity, and the law. There is just too much context involved -- and in all these examples, not necessarily a right, or a final, answer. So this translates to any social science field, the humanities, and any other profession or science (physics comes to mind) dealing in unstructured problems, and problems that cross disciplinary boundaries (and what does not, except maybe math?).
And herein is why machine grading would not work for elite education. Very capable students already do well on the first-order critical thinking; it is a common joke in elite higher education that the top students are going to do well on the traditional academic stuff no matter what; indeed, that’s partly why everybody wants them. But such students are not satisfied with the level of work that can be examined and graded by a computer. They want the more complex, nuanced, individual (or small group), creative work. And while they can do a great deal in interaction with each other, they need, and want, the guidance of experts with depth and breadth in the field at hand. They want and need feedback because they don’t yet have experience in solving those kinds of problems. Neither are they satisfied just to get their A—for many top students, A’s are easy, but the A in and of itself does nothing to motivate them, or do other than present a false sense of complete mastery; you can get an A and still need to advance to the next level of thinking. So for elite students, the teacher is mentor, coach, prodder, supervisor who provides his or her guidance through feedback. Every student at every level deserves this, but in a hierarchy of needs, top students who have already moved well beyond first-order thinking skills really need it. The best students will outright demand it.
So let’s not confuse grading with evaluation, or grades with feedback. Some of the former may be done by a computer, but the latter can only be done by those who have developed judgment, even wisdom, over long experience, and who can recognize when a very-well-written paper is actually garbage or an unusually structured document is brilliant; or recognize when a crucial piece of evidence is missing, an ostensibly neatly crafted argument encompasses a fatal flaw, a skill is not yet perfected. Even were we to get rid of grading altogether—and many of us would be happy if we did—students would still need this questioning and guidance and correction from professors, just as employees need it from mentors and supervisors, physicians need it from attendings and chiefs, ballet dancers need it from ballet masters and mistresses. Evaluation is the job—and the privilege—of those who have reached the top of their fields.
Please comment below or in confidence through the above comment form. Follow @janeerobbins.
Read more by
Opinions on Inside Higher Ed
Inside Higher Ed’s Blog U
What Others Are Reading