Evaluating Faculty Quality, Randomly

A new study uses one institution's unusual method of assigning students to courses to test the link between teacher quality and student evaluations.
July 11, 2008

The question of how to measure the quality of college teaching continues to vex campus administrators. Teaching evaluations, on which many institutions depend for at least part of their analysis, may be overly influenced by factors such as whether students like the professors or get good grades. And objective analyses of how well students learn from certain professors are difficult because, for one, if based on a standardized test or grades, one could run into problems because professors “teach to the test.”

A new paper tries to inject some rigorous analysis into the discussion of how well students learn from their professors and how effectively student evaluations track how well students learn from individual instructors.

James West and Scott Carrell co-wrote the study, which was released by the National Bureau of Economic Research. “Does Professor Quality Matter? Evidence from Random Assignment of Students to Professors” examines students and professors at the U.S. Air Force Academy from fall 1997 to spring 2007 to try to measure the quality of instruction.

The Air Force Academy was selected because its curricular structure avoids many of the pitfalls of traditional evaluation methods, according to the report. Because students at the Air Force Academy are randomly assigned to sections of core courses, there is no threat of the sort of "self-selection" in which students might choose to study with easier or tougher professors. "Self-selection," the report notes, makes it difficult to measure the impact professors have on student achievement because "if better students tend to select better professors, then it is difficult to statistically separate the teacher effects from the selection effects."

Also, professors at the academy use the same syllabus and give similar exams at about the same time. In the math department, grading is done collectively by professors, where each professor grades certain questions for all students in the course, which cuts down on the subjectivity of grading, according to the report. The students are required to take a common set of “follow-on” courses as well, in which they are also randomly assigned to professors.

The authors acknowledge that situating the study at the Air Force Academy may also raise questions of the "generalizability" of the study, given the institution's unusual student body. "Despite the military setting, much about USAFA is comparable to broader academia," the report asserts. It offers degrees in fields roughly similar to those of a liberal arts college, and because students are drawn from every Congressional district, they are geographically representative, the report says.

Carrell, an assistant professor economics at the University of California at Davis, attended the academy as an undergraduate and the University of Florida as a grad student, and has taught at Dartmouth as well as the Air Force Academy and Davis. "All students learn the same," he said.

For math and science courses, students taking courses from professors with a higher “academic rank, teaching experience, and terminal degree status” tended to perform worse in the "contemporaneous" course but better in the “follow-on” courses, according to the report. This is consistent, the report asserts, with recent findings that students taught by "less academically qualified instructors" may become interested in pursuing further study in particular academic areas because they earn good grades in the initial courses, but then go on to perform poorly in later courses that depend on the knowledge gained from the initial courses.

In humanities, the report found no such link.

Carrell had a few possible explanations for why no such link existed in humanities courses. One is because professors have more "latitude" in how they grade, especially with essays. Another reason could be that later courses in humanities don't build on earlier classes like science and math do.

One of the major points of the study was its look at the effectiveness of student evaluations. Although the evaluations can accurately predict the performance of the student in the "contemporaneous" course -- the course in which the professor teaches the student -- they are "very poor" predictors of the performance of a professor's students in later, follow-up courses. Because many universities use student evaluations as a factor in decisions of promotion and tenure, this "draws into question how one should measure professor quality," according to the report.

"It appears students reward getting higher grades," Carrell said.


Back to Top