Flawed Evaluations

AAUP survey finds declining response rates on student reviews of professors, too many colleges that do little beyond student reviews, and concerns about bias against women, minorities and adjuncts. But association panel wants to improve system, not end it.

June 10, 2015
Getty Images

They’re almost universally loathed by professors as being too subjective and an unreliable indicator of performance. But beyond that, surprisingly little is known about student evaluations of faculty teaching. How many colleges require them, and what do they ask? How many students complete them, and what effect do they have on instructors’ careers?

A committee of the American Association of University Professors wanted to help answer some of the questions, and help stir discussions about a better way to rate professors in the classroom. Survey responses gathered by the committee from some 9,000 professors suggest diminishing student response rates for course evaluations, too much focus on such evaluations alone in personnel decisions -- especially for non-tenure-track faculty -- and a creep of the kinds of personal comments seen on teacher rating websites into formal evaluations.

But while the committee argues that whatever value student evaluations ever had is shrinking, it says student surveys can play an important role in a more holistic faculty evaluation system.

“I’m a department chair myself, and it matters to me to get some feedback from students about how their experience in the classroom was,” said Craig Vasey, who heads both the AAUP committee that conducted the study and the department of classics, philosophy and religion at the University of Mary Washington. “But [student evaluations] have to be supplemented by class visits by peers and reviews of syllabi, and participation in ongoing faculty development.”

Noting that one survey respondent had offered up what is a perhaps a more fitting name for student evaluations -- “student satisfaction surveys” -- Vasey added, “We’re not calling for them to be abolished, but there’s something dishonest about what they are and how they’re being used.”

Last fall, the AAUP’s Committee on Teaching, Research and Publication sent out 40,000 invitations to tenure-track and non-tenure-track faculty members to participate in its online survey about teaching evaluations. It asked questions about institution type and required mechanisms for evaluating teaching -- such as student evaluations on paper or online, development of teaching portfolios, engagement with an on-campus center for teaching, and evaluation by peers or administrators. It asked about the existence of a faculty mentoring program, how student teaching evaluations are crafted and by whom, and faculty members’ feelings about them.

The committee received about 9,000 responses back. The majority came from tenured professors (54 percent). Some 18 percent came from full-time, non-tenure-track faculty members and 15 percent from tenure-track, not tenured faculty members, according to a write-up of the data to be presented at the annual meeting of the AAUP in Washington this week. Most respondents were from four-year, teaching-intensive institutions (48 percent), followed by four-year research institutions (35 percent). The rest were from community colleges or professional schools.

Here’s what respondents said: frequent use of online and paper evaluations is now about even. Required use of quantitative evaluations beat required qualitative evaluations, at 55 percent versus 44 percent, respectively.

Respondents who said their institutions had adopted online evaluations reported much lower student return rates than those who stuck with paper evaluations: 20-40 percent versus 80 percent or higher.

“With such a rate of return, all pretensions to ‘validity’ are rendered dubious,” the paper says. “Faculty report that the comments coming in are from the students on either of the extremes: those very happy with their experience and/or their grade, and those very unhappy.”

Faculty members said they had little to no input in crafting evaluation instruments, and pointed out that teaching in one field is quite different from the next -- something evaluations should reflect.

In a comment section the survey, some faculty members said they’d seen the kind of “abusive, bullying effects of anonymity that are today pervasive on websites… making their way into student evaluations,” the committee says. “Women faculty and faculty of color report negative comments on their appearance and qualifications, and it appears that anonymity may encourage these irrelevant and inappropriate comments and attacks, which are sometimes overtly discriminatory.” Those findings are in line with recent research suggesting strong gender bias in student evaluations.

Other professors talked about how being a tough professor works against them in student evaluations. Here’s an example: “My students often give me (I'm a woman) lower course evals than my peers because I assign a lot of work and hold them to high standards. They don't like this in the moment, but I know from talking with them that a few years later, students are able to see the ways in which this work influences their current abilities and vision and they are grateful. But I don't get the benefit of this perspective.”

Most evaluations are done in the last weeks of the semester, according to the survey. Some institutions allow students to complete the evaluation even after they’ve received their final grade, potentially compromising objectivity.

Some 25 percent of professors say their evaluations were frequently published for students and others to see. Other means of evaluation vary. About half of respondents said they were evaluated frequently by administrators, and about two-thirds by peers.

“The development of teaching portfolios, mentoring of junior colleagues or teaching assistants, or engagement with an on-campus center for teaching and learning, while often recommended, was tagged as required by very few respondents,” the committee says.

Roughly half of respondents reported a mentoring program for junior faculty on their campuses, but few were involved with one. And while 75 percent of respondents said there’s a center for teaching and learning, most said the centers were better known for helping instructors with technological needs than pedagogical ones.

Most agreed that teaching and learning centers demonstrate a campus’s commitment to pedagogical excellence, and 86 percent supported the idea of mentoring programs for junior faculty. Even more respondents (90 percent) said institutions should evaluate teaching with the same seriousness as research and scholarship. While two-thirds of respondents said student evaluations create upward pressure on grades, some 77 percent were against the idea of quotas to fight grade inflation being imposed by the administration.

Who decides what goes into a student evaluation instrument? Some 55 percent of respondents said that was not the job of the faculty primarily. Some 62 percent said decisions concerning the use of student evaluations in personnel decisions, such as promotion, tenure and merit, did not lie with the faculty.

Over all, some 69 percent of respondents said they saw some or a strong need for student feedback on their teaching. But only 47 percent said teaching evaluations were effective.

“We saw numerous claims that faculty are evaluated and recommended (or not) for contract renewal or promotion as a result of the grades they assigned, especially claims that there is administrative pressure to pass many students who deserve to fail courses,” the committee says.

The committee pays significant attention in its write-up to adjunct faculty concerns, to include graduate teaching assistants, saying that most respondents noted that traditional monitoring of teaching was limited to those on the tenure track.

For adjunct faculty, the committee says, there is “significantly less support and, oftentimes, exclusion from participation in mentoring, teaching programs, instructional development and peer evaluations. Given the reality that [non-tenure-track] faculty are responsible for teaching the majority of courses and that graduate students represent the next generation in higher education, this lack of mentoring and attention to quality seems disturbing and a cause for concern.”

The committee also argues that online course evaluations, with their low rates of return, “aren’t working” for any faculty member, tenure track or not. It endorses having faculty within departments and colleges -- not administrators -- develop their own, more holistic teaching evaluations, and they raise the possibility of ending student anonymity, saying that students might be more accurate and fair if required to give their names.

Perhaps most importantly, the committee calls on “chairs, deans, provosts and institutions to end the practice of allowing numerical rankings from student evaluations to serve as the only or the primary indicator of teaching quality, or to be interpreted as expressing the quality of the faculty member’s job performance.”

Addressing adjunct faculty concerns, the committee adds, “We especially call on administrations to stop the lazy practice of making contract renewals on the basis of such partial, biased and unreliable data.”

Philip B. Stark, a professor of economics at the University of California at Berkeley and co-author of a widely read 2014 paper that was critical of student evaluations of teaching, said he was even more against them now, given the growing body of evidence of their unreliability -- especially concerning gender bias.

“I no longer think [student evaluations] should be used in any formal way by any institution, especially not as a measure of teaching quality and especially not for the purposes of hiring, merit evaluations, firing, tenure, et cetera,” Stark said. “They do not measure what they purport to measure.”

Stark said he thought that basic items such as “Could you hear the instructor from the back of the room?” or “Could you read the instructor's handwriting?” or even “Did you enjoy the class?” might be worth collecting, but only for the instructor’s eyes.

Vasey’s committee doesn’t claim that its sample is representative as a whole. In fact, the paper discusses at length the fact that non-tenure-track faculty, for example -- a minority of respondents -- actually make up the majority of the teaching force. But it says the survey results offer a valuable snapshot nonetheless.

Adrianna Kezar, director of the Delphi Project on the Changing Faculty and Student Success at the University of Southern California, has studied student evaluations extensively, along with how adjunct faculty employment impacts student learning. She said student evaluations are the primary means for evaluating non-tenure-track faculty, and often for non-rehiring. Like the AAUP committee, she said student evaluations shouldn’t be abolished, but that non-tenure-track faculty need more robust, complete measures of performance.

“Research demonstrates that student evaluations can be valuable among several sources of input on faculty teaching but need to be combined with other sources including peer observations, syllabus review, portfolio analysis and teaching philosophy and reflection, among other approaches,” she said. “Single metrics of teaching have not been found to provide a complete enough picture for improvement.”

Kezar said via email that the issue has implications for student success, namely that overreliance on teaching evaluations for adjunct faculty ratings has led to “integrating effective educational practices such as active or collaborative learning because students often resist new evidence-based teaching approaches that require greater engagement and challenge and therefore penalize instructors who use such approaches. Faculty are often given higher evaluations if they do not challenge students to work hard.”

Of course, not all course evaluations are created equal. Ken Ryalls is president of IDEA, which offers colleges and universities research-based course evaluation systems that can control for class size, student motivation and other factors. He said he understands faculty concerns about low response rates baring statistically insignificant data, but that the correlation between response rates and teacher ratings is actually quite low.  

Ryalls said IDEA has an 81 percent response rate on paper and 66 percent online, and that a mobile device delivery system eliminates the paper advantage, since students can fill it out at their fingertips.

“Even without a mobile option, response rates can be just as high online as with paper if teachers take certain actions,” he said. “Faculty can clearly communicate their expectations for student compliance, ensure confidentiality, monitor response rates, send reminders, and create a culture that values student feedback.” The most effective ways to get students to complete course evaluations is to assure them that their responses will be valued and make a difference in the future of the course, Ryalls added.

Qualitative comments, just like numerical scores, “should be used as part of a global picture and analyzed over time to look for informative patterns of feedback that may help instructors improve,” he said. Instructors also may guide students on how to write helpful comments before the surveys are given.

Ryalls said he agreed with the AAUP paper in that faculty members should have input in what questions are asked. He also supported a more comprehensive faculty ratings system. Ideally, he said, student evaluations should count for no more than 30 to 50 percent of the overall assessment of one's teaching, and ratings should be collected from at least six classes before summative decisions about effectiveness are made.  

“When we stop thinking of evaluation as an event that occurs at the end of the semester and start thinking of it as an ongoing process that is based on multiple sources of information," Ryalls said, "we will begin to accept the value of student ratings gathered from a reliable and valid system.”


Back to Top