The potential bias in so-called peer evaluations of teaching (opinion)

Who Is a Faculty Peer?

Breda Eubank, Irene Shankar and Mary-Lee Mulholland explore the potential bias in so-called peer evaluations of teaching.

You have /5 articles left.
Sign up for a free account or log in.

Small figure of a woman pushing up against a large red gavel wielded by a male hand

Prasong Maulae/Istock/Getty Images Plus

Over the past few decades, colleges and universities have become increasingly corporatized, to the point that students are seen as consumers who are recruited through promises of a safe and wonderful learning experience. As such, students’ “satisfaction” is diligently measured so that institutions can trumpet the positive results in marketing campaigns to increase student enrollment.

Student satisfaction has been measured through annual institutional surveys, national and international multi-university surveys, and by magazines such as U.S. News and World Report and, in Canada, Macleans, which rank different institutions based on, among other things, students’ experiences and perceptions. Most often, universities conduct quantitative surveys with open-ended questions, commonly known as student evaluations of instruction, to assess “good teaching.”

In addition to significant concerns regarding the reliability and effectiveness of those evaluations, dozens of articles have demonstrated an equity bias in student evaluations. That is, the evaluations can often be more about the race, gender identity, weight and perceived accent of their instructor than class content. The widespread use of such biased and discriminatory results has had detrimental effects on hiring, tenure and promotion decisions, especially for women and those from other marginalized groups. As a result, many faculty senates and unions are calling for the end of using student evaluations as assessments of effective teaching—particularly for tenure, promotion or securing job contracts.

Most Popular

Faced with calls to reduce or eliminate the use of student evaluations, colleges and universities are increasingly turning to peer assessments of teaching. Originally conceptualized as peer observations of teaching, they began as nonevaluative, voluntary, formative, reciprocal, self-reflective and collaborative modes of professional development in teaching. In those modes, peer observations of teaching are quite effective ways to improve teaching.

Over the past 20 years, however, such exchanges have become required, formalized and summative. In fact, in Canadian universities, peer evaluations of teaching are increasingly being used to inform personnel decisions such as tenure, promotion and the hiring of instructors. Moreover, such evaluations are often no longer conducted by peers. Rather, the people performing the observations are usually tenured and, in some cases, program chairs, while the observed are frequently untenured or precariously employed instructors. Thus, a power differential always exists between the observer and the observed, making the use of the term “peer” misleading and deeply problematic. Peer observations have often become bureaucratic evaluations.

At some institutions, untenured faculty encounter several teaching evaluations over their first five years, performed by the chair or chair designate, an internal departmental tenured peer, and/or an external (to the department) tenured peer. Although the observers are required to undergo training, it often does not include content on discrimination, racism, ableism, fatphobia, transphobia, homophobia and gender-based bias. Two of the authors of this piece, Mary-Lee Mulholland and Breda Eubank, undertook a cursory scan of 25 universities in Canada and found that a handful of them require training yet only one referenced equity, diversity and inclusion as part of their training module.

Due to the concerning power differential present within these contexts and the potentially negative implications of such evaluations on people’s careers, higher education needs to study the impact of bias, power and hierarchy within peer evaluations of teaching. That’s the case especially given the fact we already know how evaluative frameworks in the postsecondary context—such as student evaluations of teaching and tenure—can discriminate against teachers located within intersections of race, gender, class, disability, nationality, gender identity and sexuality.

Equally pressing and related to the issue of bias is a very foundational, yet seemingly unanswered, question: Who should constitute a peer in these evaluations of teaching? Should peers be of the same rank? Should they be from the same academic discipline? Which peers are equipped to evaluate the feminist, Indigenous or anti-racist pedagogies of their colleagues?

To answer those questions, more nuanced discussions and research are required to identify the validity and impact of peer evaluations of teaching. Specifically, to avoid encountering the same pitfalls that occur with student evaluations of teaching, we need more information on who is doing the evaluation, what is being evaluated, and how is it being evaluated. Based on our observations, experiences and research, we have serious concerns regarding the validity of peer observations of teaching, as currently conducted, being used as a measure of “good” teaching.

In the meantime, in the absence of any evaluation of peer evaluations of teaching themselves from a critical and intersectional lens, faculty should approach peer evaluations with great caution. Under what circumstance can peer evaluation be done effectively? By whom? For what purpose? Can any training make it better?

Underlying those questions are larger questions regarding who gets to decide what constitutes “good teaching.” To the point, we need to carefully examine whether or not measures of “good teaching” used in peer evaluations are reflections of feminist, anti-racist or decolonial pedagogies or whether they are products of privilege. It is not lost on us that those who do the evaluation disproportionately have racial, gender and other forms of privilege that have led to their current position of power within academe. Similarly, those evaluated are disproportionately from historically marginalized and/or currently underrepresented groups.

Thus, how can we ensure that peer evaluations of teaching do not get constituted as gatekeeping by those who arrived first and are seen as “natural” (read as white and male) inhabitants of academia?

Although we realize that academia is not about to do away with teaching evaluations in the immediate future, we urge caution against an uncritical large-scale adoption of peer evaluations of teaching. Instead, we need research on their efficacy in their formal and summative mode today. In particular, the use of such evaluations must be informed by research on the impact of the power imbalance and bias that frequently can occur in them.

Breda Eubank is an assistant professor in the health and physical education department at Mount Royal University and former vice-chair of the faculty evaluation committee. Irene Shankar is associate professor of sociology in the department of sociology and anthropology at Mount Royal. Mary-Lee Mulholland is professor of anthropology in the department of sociology and anthropology and has served as chair of the faculty evaluation committee at Mount Royal.