Vectorios 2016 / iStock
Questioning what student evaluations of teaching actually measure, various institutions have already said they won't use them in high-stakes personnel decisions or as the primary measure of teaching effectiveness.
Now the American Sociological Association and 17 other professional organizations, including the American Historical Association, are urging all colleges and universities to do the same.
"Because these instruments are cheap, easy to implement, and provide a simple way to gather information, they are the most common method used to evaluate faculty teaching for hiring, tenure, promotion, contract renewal and merit raises," reads a new statement from the sociological association, endorsed by other scholarly groups.
Despite these evaluations' "ubiquity," however, "a growing body of evidence suggests that their use in personnel decisions is problematic." The statement cites more than a dozen studies finding that students' evaluations are weakly related to other measures of teaching effectiveness, used in statistically problematic ways and can be influenced by factors such as times of day and class size. It notes that both observational and experimental research has found these evaluations to be biased against women and people of color, and says that adjuncts are particularly vulnerable in a system that depends on them for teaching performance data.
Given these "limitations," the association "encourages institutions to use evidence-based best practices for collecting and using student feedback about teaching."
Feedback, Not Ratings
More specifically, the association recommends that questions on student evaluations should be framed as "an opportunity for student feedback, rather than an opportunity for formal ratings" of teaching effectiveness. It nods to Augsburg University and the University of North Carolina at Asheville, which have both revised their evaluation instruments and renamed them (as the university course survey and the student feedback on instruction form, respectively), to emphasize the difference between feedback and ratings.
Moreover, the association says -- echoing many researchers and faculty advocates -- these evaluations should "not be used as the only evidence of teaching effectiveness." They should instead be used, when used at all, as part of a "holistic assessment that includes peer observations, reviews of teaching materials, and instructor self-reflections."
Such holistic approaches have been used at teaching-focused campuses for years, and are making their way to research institutions, the association says. It names as examples the University of Oregon's ongoing efforts to that end and the University of Southern California's relatively new peer-review faculty evaluation process. The University of California, Irvine, also requires faculty members to submit two types of evidence of teaching effectiveness. Beyond student evaluations, professors there may submit reflective teaching statements, peer evaluations of teaching and even something like physicist and science educator Carl Weiman's Teaching Practices Inventory.
Institutions should not use student evaluations of teaching to "compare individual faculty members to each other or to a department average," the association further recommends. Rather, as "part of a holistic assessment, they can appropriately be used to document patterns in an instructor's feedback over time."
When and if quantitative scores are reported via evaluations, they should include distributions, sample sizes, and response rates for each question on the instrument, to provide "an interpretative context for the scores." The idea is that survey items with low response rates, for example, should be given little weight. And evaluators, such as chairs, deans and personnel committees should be trained in interpreting evaluations, according to the sociological association.
Teresa Ciabattari, director of research, professional development and academic affairs at the association, said it's hard to know exactly how many institutions currently use evaluations in ways that the she and many other experts recommend against. Most colleges and universities have "some sort of process in place for student evaluation of teaching," she said. But how much weight they're given in faculty personnel decisions and what other kinds of evidence are included vary.
In any case, she said, the association has been thinking about student evaluations for some time and consulted widely with members in drafting the statement. The goal is that the document becomes a "tool" with which members may engage their colleagues in "fruitful conversation about the recent research on this topic," Ciabattari said.
Ultimately, "our hope is that institutions will begin moving toward more holistic evaluation approaches."
Philip Stark, a professor of statistics at the University of California at Berkeley who has written a number of studies demonstrating how student evaluations are flawed as a measure of teaching effectiveness, said that his own views against them have strengthened over time, and with new research. Evaluations may measure students' "satisfaction," he said, but that's not the same as teaching effectiveness.
Stark said he didn't know if institutions will necessarily heed professionals organizations' call on the matter. Yet associations do have an important role to play in educating members about student evaluations' limitations, he said. And what might get institutions to listen is a burgeoning threat of class-action lawsuits over the use evaluations in personnel decisions, despite the expert consensus against doing so.
"Evaluations have a disparate impact on protected groups and disadvantage them," he said.
Ken Ryalls, president of IDEA, a popular vendor of student feedback instruments, said that his organization generally supports the association's position.
Calling the "overuse" of student ratings data in the faculty review process an “epidemic," Ryalls said that IDEA offers an evaluation instrument focusing only observable teaching methods and what students feel they’ve learned. The company also counsels institutions to ask students only those questions they’re qualified to answer, and recommends using student feedback as part of a holistic analysis of teaching effectiveness with faculty development in mind. The idea is not to use student ratings "as some sort of club to wield against faculty."
IDEA does use “comparative baselines," Ryalls said, "so that users might have accurate numbers against which they can make judgments.” But it counsels clients “to not focus only on numbers, but instead use the baseline as a guide to interpret the data in a holistic manner.”
Baseline numbers “generated from a national database can be useful in making sense of the feedback," he added.