Let’s Stop Relying on Biased Teaching Evaluations

Using such evaluations reflects colleges’ lack of a true commitment to diversity, writes Joanna Wolfe, who offers three actions institutions should take sooner rather than later to change the situation.

January 21, 2022

In the wake of protests surrounding George Floyd’s murder, colleges and universities across the nation have doubled down on their public efforts to confront racism on their campuses. They have created and appointed new high-profile administrative positions focused on diversity, equity and inclusion; founded new centers and partnerships; and dedicated new faculty lines to increase representation of scholars of color.

In the midst of all this outpouring of funds and publicity to promote equity and justice, many institutions persist in their use of teaching evaluations that study after study has demonstrated to be biased against women and faculty of color, with Black faculty receiving the lowest scores of any racial group. Despite wide acknowledgment that evaluations do not actually measure teaching effectiveness, colleges and universities rely on this sexist, racist and fundamentally flawed measure in a higher education landscape where fewer than 13 percent of full-time faculty are Black, Latina/o or Native American.

The sheer arbitrariness of teaching evaluations is easy to see. Simply change the biographical information for an online instructor whom students never meet from male to female and watch student ratings drop in every aspect of the class. Want to see those biases in action yourself? Just look at Ben Schmidt’s “Gendered Language in Teaching Reviews,” an interactive database analyzing 14 million reviews from RateMyProfessor.com. Across every discipline, male instructors are more “laid-back,” “funny” and “smart” while women are more “strict,” “annoying” and “crazy”—and more likely to be called “teacher” rather than “professor.”

The questions we use in our evaluations almost seem designed to produce such biases. We know that asking people to share vague, general impressions—versus providing them with well-defined criteria—produces skewed, stereotypical judgments. And yet we assess our teaching by asking students questions like, “Over all, how would you rate this faculty’s teaching?” or “How would you rate the overall quality of this course?” Such questions measure bias much more effectively than they measure teaching.

Some might be skeptical that student evaluations of teaching could make a difference in the representation of our faculty. But consider that women, underrepresented minorities and BIPOC faculty tend to be concentrated in the adjunct, teaching and lecturer tracks, where these evaluations are often the primary (or only) measure of their job performance. And what about the emotional costs of a system that seems to reward cisgender white men just for acting the part? How many potential underrepresented faculty are turned off by such clearly biased evaluation mechanisms?

There is no need to completely abolish these evaluations, as some people have proposed. Student evaluations of teaching can tell administrators about behaviors that need to be stopped or improved—especially when they form a clear pattern in one’s teaching repertoire—and they can provide instructors with insights that can be used to improve their teaching. But we can no longer afford to keep using the same flawed instruments year after year, while simultaneously proclaiming our institutions’ commitment to inclusivity, equity and diversity.

Here are three actions institutions can take sooner rather than later.

No. 1: Remove or replace vaguely worded questions prone to eliciting bias. These are the questions that ask about general impressions of the teacher or course. Vaguely worded, they tell us nothing about teaching or what students learned. They do, however tell us about factors outside the instructor’s control, including not just the color of her skin or the timbre of her voice, but the size and comfort of the room and whether the instructor brought cookies to class. They do also tell us a lot about stereotypes and implicit biases, and there is no reason any college or university in 2022 should have such questions on their evaluations.

Likewise, no institution should ask students if assignments were “returned promptly” when this vaguely worded and highly unreliable question can easily be rephrased with concrete parameters. Ask students to specify a time frame in which assignments were returned (e.g., one week)—not to report their feelings on whether the instructor was prompt.

While some students will undoubtedly provide unreliable answers when asked concrete questions, it should be easy to spot the outliers and perhaps even flag those students as unreliable evaluators. If a student hasn’t been paying enough attention to know when assignments were returned or if class started on time, are they really qualified to give input on other aspects of the class?

No. 2: Educate students to be less biased evaluators. Evaluation is central to academic and professional life. Learning how to do it well is a key life skill. While in college, students don’t just evaluate their teachers, they evaluate their peers on team projects and on some class assignments. When they leave, they will be asked to evaluate colleagues and subordinates. The evidence suggests that with a small amount of training, students can become better evaluators of their peers or less biased evaluators of teachers. Giving students strategies for evaluating others fairly should be part of our mission.

We might consider holding students accountable for their evaluations. Just because student evaluators need to be anonymous to their instructors does not mean that others cannot know who they are. A student who makes a clearly sexist or racist comment should be confronted. We might also imagine a system, especially at smaller elite and liberal arts colleges, where students review their evaluations with academic advisers to receive feedback on how helpful their comments are. If students making clearly biased evaluations had to talk them over with someone, they might think twice before they put pen to paper. For some students, no better lesson might be learned in the university than that they should be accountable for checking their own biases.

It should also go without saying that racist, sexist and homophobic comments should be removed from instructors’ files—preferably before the instructor see them. No faculty member should be subjected to such abuses and violence.

No. 3: Incentivize faculty members to study ways to mitigate bias in teaching evaluations. In a recent Inside Higher Ed article, Laurel Smith-Doerr writes, “Colleges have excellent faculty who are contributing new knowledge on equity and inclusion, but sadly that knowledge is rarely applied to the institutions themselves.” That dynamic applies to teaching evaluations. We have faculty in business and psychology whose expertise could be tapped to develop better evaluation instruments. We have faculty in machine learning and computation linguistics who could help us automatically flag problematic comments and ratings.

Institutions are already spending millions on DEI initiatives. Offering course releases to a few faculty with domain-specific expertise to give them time to implement and study a revised evaluation system would be a small extra expenditure with tangible effects at every level of the institution.

In sum, revising our teaching evaluations in the name of equity may not make for an attention-grabbing press release, but it is a concrete, low-cost step that can make a real difference in attracting, retaining and promoting a diverse professoriate.


Joanna Wolfe is a teaching professor of English at Carnegie Mellon University. She is the author of Team Writing: A Guide to Working in Groups and Digging Into Literature: Strategies for Reading, Analysis, and Writing.


