Search News


Browse Archives

News

Sweetening the Deal

October 18, 2007

Share This Story

FREE Daily News Alerts

Advertisement

Few would argue that the student evaluation of faculty is an exact science, but even the most hardened skeptics might be surprised by the latest findings on how easy it is for professors to influence students who are filling out the ratings forms.

A new study shows that giving students chocolate leads to improved results for professors. “Fudging the Numbers: Distributing Chocolate Influences Student Evaluations of an Undergraduate Course,” is set to be published in an upcoming edition of the journal Teaching of Psychology.

While they were graduate students at the University of Illinois at Chicago, the paper's authors, Benjamin Jee and Robert Youmans, became interested in what kind of environment instructors created right before handing out the evaluations. Their theory: Outside factors could easily play a role in either boosting or hurting a professor's rating.

The experiment involved about 100 students in three different lecture sections taught by the same instructor. In each section, half the students were given chocolate on the day of the mid-semester evaluation, and half were not. Jee, a postdoctoral fellow in psychology at Northwestern University, said he had noticed that professors distributing candy around the time of evaluations was "a fairly common practice."

In all three cases, the groups that received the chocolate gave their professor a higher rating than did those in the control group, even though the instructor wasn't the one handing out the sweets. In fact, the person doing the distribution identified himself as being unaffiliated with the course. Students were told the chocolate was left over from a prior event.

Still, the researchers surmise that the simple gesture of the offer, from no matter the source, was enough to increase the students' perceptions of the instructor. For instance, on the question "Teacher is Enthusiastic About Conducting the Course," students in the group offered chocolate responded with an average score of 3.92, versus the control group's 3.58 (on a five-point scale). On a "friendliness" question, the instructor received a 4.2 from the chocolate group and 3.9 from the control group.

The findings are significant given that student evaluations can be used in a professor's tenure evaluation, and that many responses are now published online for anyone to view. Research has shown that factors such as a professor's "hotness" or perception as an easy grader can positively influence the ratings.

"Some professors live and die by these things," said Youmans, now an assistant professor of psychology at California State University at Northridge. "Frankly, professors are judged by these because they are quantifiable. Obviously they are imperfect, but that's balanced against the fact that they give you some measure of teaching effectiveness."

Youmans said the study isn't an attempt to criticize the student evaluation of faculty as a valueless exercise. But by showing how easily manipulation can occur, the researchers say they want colleges to keep the evaluation results in context.

"There are lots of factors that affect a student's evaluation of a professor," Jee said. "Some will be hard to change and may be legitimate. Our argument is not that instructors should benefit by giving chocolate but that all evaluations should be given in more standardized ways to limit the effects of extraneous things like how they are handed out."

Youmans said colleges should consider instructing those who distribute evaluations not to give anything out along with them. Faculty might also get a more accurate read of students' opinions if they give out the forms a few weeks before the end of a term, he said.

"It's a worry that evaluations are given out at the end, when maybe a student has just walked out of a bad exam in another class and come in with a negative mindset," Youmans said.

As a rule, he doesn't hand out chocolate at evaluation time. But before students take tests? That's a different story.

See all postings »
Advertisement
Advertisement

Matching Jobs

Comments on Sweetening the Deal

  • Posted by Bloom on October 18, 2007 at 8:20am EDT
  • Does it have to Godiva or will Mars do the trick?

  • Chocolates and evaluations
  • Posted by Former faculty member on October 18, 2007 at 8:20am EDT
  • No surprise here! My wife, who taught 4th grade for many years, found that giving gave her students donuts on the morning of the state required assessments greatly improved performance.

  • Posted by Professor Cadbury on October 18, 2007 at 8:45am EDT
  • At my university its gotten to the point that faculty compete with bribes for evaluations. Chocolate might not get you that great ratings since other faculty give out donuts, etc. right before evaluations. One professor even bakes cookies to boost his evaluations. I never would have thought my lack of baking talent would affect the amount of my salary increase, but I guess that's what happens when institutions place so great an emphasis on student evaluations which ironically, may have little to do with how good or bad a teacher you are.

  • Posted by Hoosier Prof on October 18, 2007 at 9:05am EDT
  • And this is news?

  • Ethical?
  • Posted by T-bone on October 18, 2007 at 9:25am EDT
  • How far might this go before it becomes unethical? How about handing out cash to students?
    I've heard of stories where professors agree to decrease students' workload in exchange for positive reviews.
    Student evaluations of teaching are easily manipulated and should only constitute one very small piece of the evaluation of teaching practice. I would suggest that demonstrations of student learning gained from that course should play a much larger role.

  • Posted by TBD on October 18, 2007 at 9:55am EDT
  • This tears it. Let's just do away with student evaluations.

  • Posted by Jim on October 18, 2007 at 9:55am EDT
  • "I would suggest that demonstrations of student learning gained from that course should play a much larger role"

    I agree! Sounds great! How would you propose to do that? Grades at the end of a semester do not tell us about 'learning gained', unless we assume all students started the course with equally low levels of knowledge regarding the course content.

    I have no problem with course evaluations. I think it is important to give the students opportunity to provide feedback. What I do have a problem with is using those course evaluations as an indicator of teaching effectiveness. That is where the problem is.

  • Posted by Angelo , Professor at Liberal arts College on October 18, 2007 at 10:40am EDT
  • In my liberal arts college, it is expressly forbidden to hand out any treats on the day of the student evaluations. It is also forbidden for the instructor to say that s/he enjoyed the class, or to tell students that evaluations affect his/her salary, promotion, etc. I still would never hand out evaluations on a day when my best students are absent and the worst students are all in attendance.

  • Practical significant?
  • Posted by Inquiring Mind on October 18, 2007 at 10:40am EDT
  • Should narrow range of the responses with the given measurement scale (likely a 5-point ordinal scale which tends to be skewed and skewed left to the higher score) and the statistical treatment for the data type being used in this exploration (ordered categorical and may not be continuous) be reported and discussed in this case?

    Hypothetically, would the average scores of 3.92 versus 3.58 with standard deviations around 1.00 (the standard deviations shown here is only my hypothetical value and they were not given by the author) really help decision makers to tell any practical difference or significance between the "average" scores?

    This seems to require lots of assumptions to "sweetening the deal."

  • Posted by Kathy , prof at Georgia State on October 18, 2007 at 10:40am EDT
  • Our university does all course evaluations on line, a move which perhaps mitigates the influence of chocolate. Could I bake the students virtual cookies?

  • Posted by Professor Tweed on October 18, 2007 at 11:05am EDT
  • Faculty Evaluations are as unreliable as course grades as a measurement of effective learning. Students, especially at the undergrad level, rate professors based on their interpersonal skills, skills as an entertainer, and their compassion. It may not be until years later that a student truly values a particular class experience. Cookies, candy, doughnuts and other incentives are certainly going to sway results.... why is this news? Do we really think that freshmen in core courses have enough maturity in their reflective learning to disregard tangible incentives?

    How would it go over if this technique was used for k-12 teachers - and their salary increase or continued employment was based upon their students' opinions?

  • Posted by Anne on October 18, 2007 at 11:15am EDT
  • The teacher with the highest ratings from students on RateMyTeacher.com is the same one who bakes for her classes every week. The students even mention this in their rating comment.

  • What a great idea
  • Posted by Manny on October 18, 2007 at 11:15am EDT
  • In order to eliinate any bias in the evaluations due to some profs giving treats and others not, how about the college giving chocs to EVERY student on evaluation day. That way, no unfair advantage vis-a-vis evaluations to anyone!

    Or better still, print the evaluations on chocolate flavored paper so the students can either eat them or fill them out.

  • Posted by Young Prof on October 18, 2007 at 11:15am EDT
  • "It is also forbidden for the instructor to say that s/he enjoyed the class, or to tell students that evaluations affect his/her salary, promotion, etc."

    I always tell my students what evaluations are used for, because I find that they have absolutely no idea, and will not take the evaluations seriously as a result.

  • Posted by CMH on October 18, 2007 at 11:20am EDT
  • Before we get all excited about getting rid of student evaluations, there are two points to consider:

    1. they are consistent over time. that is, although there will be instances where one course rates a professor higher, in general, student evaluations stay consistent throughout one's career.

    2. student evaluations correlate with faculty evluations of other faculty teaching. That is, if a faculty member watches another faculty member teach, they rate the faculty member similarly to how the students rate the faculty member.

    I think there are problems with student evaluations, no doubt. However, I also think there are some really good reasons for doing them and we should continue to use them as a COMPONENT of faculty evaluation.

  • To reward or not
  • Posted by Fred Flener , Retired on October 18, 2007 at 11:25am EDT
  • There are lots of studies which show that extrinsic rewards improve performance. One teacher I know places a small statue of "the thinker" on a student's desk when that student contributes something very insightful to the class discussion. I once gave four micro-brews from my stash to a group that came up with the most "interesting" contribution to our study of cyclic groups in an algebra class. When went back to teach high school one year (to see if I could do what I told my college students to do when they were teaching) I gave rewards to kids all the time. However, I gave these only, only, only if they showed some insight, positive understanding, etc. Sometimes the reward need not be tangible, but simply a positive comment. When I taught a geometry class that year we studied tesselation (those Escher type art stuff, rotating lizzards, etc.) and the students created their own. I had an exceptionaly artistic student in class (with absolutely no help from me). When we were going to hang the art work for parents, I asked for a volunteer. "I need someone who is willing to hang their tesselations next to 'Pauline's' (the artistically gifted one) because hers are so good anyone else will be embarassed in contrast." Now Pauline was a very mediocre math student who couldn't put together a solid proof if her life depended on it, but enjoyed the "extra curricular" ventures (tesselations, golden rectangles, etc.) and when the course ended I received an unanticipated visit from her parents who claimes that this course was Pauline's favorite math course ever. They said it was because of all the compliments I gave her regarding her artistic talent. Certainly, it was not for her success in learning geometry proofs.

    On the other hand, when I completed the year of teaching I asked students to "evaluate" the things I did and when I asked about the projects one student wrote, "I still have nightmares about those damned projects." I am not sure who wrote that but I have a guess, and it was probably by one of the solid A students who did learn about geometry proofs. Course evaluations should be used to guide us as we modify our teaching, not for tenure, promotions, etc.

    I remember when student evaluations first came into the performance evaluations of faculty. It came under the guise of helping us "improve" our teaching. Yeah, right. I sat on a gazillion faculty committees evaluating faculty with respect to the big three, teaching, research and service. The latter two seemed reasonable (although I must admit some of the "research" was so removed from my knowledge base I might just as well have thrown darts), but teaching seemed to evolve into a single numerical document--student evaluations. "This guy got a 3.0 out of 5, so he is obviously a poor teacher." So we have done this to ourselves. As a retired math prof, I appreciate the reverence given to numerical data, but in my heart I believe it is a simple cop-out from making the hard decisions about whether someone teaching really is the "best" for the students in a course. Our department used a "standard" student evaluation form, and the first item was to rate whether they thought "the instructor's objectives are clear." When I taught a math education course (in contrast to a math course) I said to them, "If you rate me high on that item, you don't understand the question." My "goals" in a math education course were fluid, always bending with the talent before me. In a math class it was a bit more clear as to what the students were expected to learn, but even then "things changed" whenever it became apparent that prior knowledge among them was not a well defined constant. So when we began using student evaluations, I did use them for the purpose intended (luckily I was tenured--and I believe, a full prof when we first started using them), and for a couple of decades before retiring fought constantly over what I believed was the stupidity of measuring teaching performance on the basis of how high the students rate someone.

  • Clarification and Comment From the Lead Author of Study
  • Posted by Rob Youmans at California State University, Northridge on October 18, 2007 at 11:40am EDT
  • Perhaps I can respond to some of these interesting comments...
    "Hypothetically, would the average scores of 3.92 versus 3.58 with standard deviations around 1.00 (the standard deviations shown here is only my hypothetical value and they were not given by the author) really help decision makers to tell any practical difference or significance between the “average” scores?"
    There are a couple of ways to analyze this data, but first remember that this is a report about the peer-reviewed manuscript which will appear in the peer-reviewed journal Teaching of Psychology, so please do read the article when it appears this month to address concerns. We borrowed our questions from the 9 official university evaluation form. Nearly every question received higher ratings in the chocolate conditions. To avoid the risk of conducting multiple stats tests on each question, each with a risk of type I error, the statistical approach we ended up taking was to take the average of all 9 official student questions per condition, and those findings were significantly different at the .05 (or more strict .01) alpha level.

    To paraphrase several other comments to the effect of "Duh..why does this matter...of course...etc." I just want to clarify to those who may have skimmed the article: THE INSTRUCTOR DID NOT GIVE OUT ANY CHOCOLATE, it was the administrator of the evaluations that gave out chocolate, and did so in a way that made it clear that it was not from the instructor. The implications are that events entirely unrelated to course or instructor still affect deliberate evaluations of instructors. So, our work does not speak to deliberate bribes from instructors, payments, talks from the instructors, or any other direct manipulation (although I am sure these also affect evaluations, this is NOT the effect we report in Teaching of Psychology). Pretty sweet!

  • Posted by peter biesemeyer on October 18, 2007 at 11:45am EDT
  • Seems like the smart move would be a platter of fresh-baked toll house cookies for the tenure committee!

  • "Balancing the 'Sweets'"
  • Posted by Dee Fink , National Consultant in Higher Education on October 18, 2007 at 12:30pm EDT
  • People in the field of instructional evaluation have known for some time that student ratings of teachers - when used alone - are subject to manipulation. This is not news; however, the degree of effect by a non-teacher source of the chocolates is somewhat surprising.

    For a view of what might be used to balance student ratings, for both a more meaningful and probably more reliable assessment, people might want to look at an essay of mine that is being published this month in "To Improve the Academy", an annual collection of articles by the POD Network in Higher Education. Title: "Evaluating Teaching: A New Approach to an Old Problem." In this essay I basically argue for using multiple sources of information about multiple aspects of teaching.

    The article is on my website at: http://www.finkconsulting.info/publications.html (see first item under "Evaluating College Teaching").

  • Chocolate evaluations
  • Posted by cjprof on October 18, 2007 at 12:45pm EDT
  • I see a real problem with students who may have a chocolate allergy, or those who would prefer Gummie Bears, and of course, the law suits filed against the faculty and University for contributing to the over all decline of health and the increase in waist (waste) size! What about the Vegan students and those who really want a tofu bar?

    Academic institutions which haunt us about assessments, and reliability and validity studies should take their own advise. Eat Chocolate!

  • Chocolate correction
  • Posted by cjprof on October 18, 2007 at 1:15pm EDT
  • I also suggest we proof read before submitting!!
    ADVICE is often unsolicited, so I advise you to solicit more than one.

    Taking my own advice!

  • (Making it more Objective)
  • Posted by George Martinez on October 18, 2007 at 1:30pm EDT
  • At our college, an outside staff or faculty member actually administers the student evaluations. That would be a way to eliminate the 'bias' or 'chocolate effect' to some degree. The staff member arranges a time to go the class BEFORE the faculty member arrives and gives the evaluation. Then the faculty member begins his/her class AFTER the evaluation process is completed. The only snag might be if the faculty member rewarded the class the class meeting BEFORE the evaluation with chocolate but it may have already worn off by the time that the actual evaluation is administered (smile). Just a suggestion to remove any hint of ethical violation.

  • Posted by jenny franklin at University of Arizona on October 18, 2007 at 2:15pm EDT
  • In a well crafted, comprehensive faculty performance appraisal system (e.g. see Arreaola's "Designing Comprehensive Faculty Evaluation Systems") ratings are but one source among many, which would mitigate to a considerable degree transient sources of bias over time.

    Having a standard administration policy that makes it clear that (1) the teacher must leave the room, (2) students may not discuss their ratings while they are completing them, and that the teacher reads as simple standard state, the ratings are anonymous and not reported until after grades have been filed and (4) offers no other discussion or goodies combined with a signed student monitor document attesting to the fact the ratings were collect according to the rules also helps to some degree.

    However, objectively the effect size of even well-documented sources of systematic variation such as academic discipline, course size and level, etc pale in comparison to the actual effects of errors of interpretation, in my opinion. Having heard over the course of my career the questions that many faculty and administrators raised about specific ratings results (or failed to raise in some cases)-- I would be a lot more concerned about (1) the effects of erroneous or unfounded interpretations of ratings data or (2) inappropriately reported data (e.g. means reported with double digit precision and no indication of the margin of error) or (3) inadequate or unrepresentative samples resulting from low response rates.

  • ...and pizza
  • Posted by Jen on October 18, 2007 at 2:15pm EDT
  • We have one 30-year verteran who orders pizza for his classes. Frankly, chocolate would influence me before saturated fats.

  • Posted by TBD on October 18, 2007 at 3:55pm EDT
  • Most faculty evaluations are qualitative, so they don't "correlate" with anything.

  • Posted by Jeff on October 18, 2007 at 3:55pm EDT
  • George, if you read Rob's comment, you'll notice that it is not a "bribe" from the unethical instructor that is at issue here but rather outside influence in general-a force or forces over which we have no control. If a student can be influenced to provide positive feedback from a "feel-good" situation of simply being offered chocolate, then how many students would provide negative feedback for instructors based on an equally unrelated experience (say, a traffic jam).

    In all, the system is seriously flawed even before addressing the problems associated with word choice in the questions and debates over which elements of teaching are the "core" elements about which to inquire.

    Finally, let me nod to Professor Tweed who points out that students are not necessarily experts in their answers. So while it is important to continue to collect student evaluation data, to use it as significantly as many institutions are using it is irresponsible.

  • Author Response to Questions Raised Here...
  • Posted by Rob Youmans at CSUN on October 18, 2007 at 3:55pm EDT
  • I think Jenny raised some really interesting points that I’ll respond to,

    Jenny wrote: “Having a standard administration policy that makes it clear that (1) the teacher must leave the room, (2) students may not discuss their ratings while they are completing them, and that the teacher reads as simple standard state, the ratings are anonymous and not reported until after grades have been filed and (4) offers no other discussion or goodies combined with a signed student monitor document attesting to the fact the ratings were collect according to the rules also helps to some degree.”

    To clarify, in our study the teacher was not in the room, students were not permitted to discuss ratings, and the ratings were anonymous.

    Jenny continues, “However, objectively the effect size of even well-documented sources of systematic variation such as academic discipline, course size and level, etc pale in comparison to the actual effects of errors of interpretation, in my opinion.”
    The effect size of chocolate in our study was d = 0.33; by Cohen’s (1992) standards, this result amounts to a small-to-medium effect. Could you post the effect size that you allude to for errors of interpretation?
    Jenny concludes, “Having heard over the course of my career the questions that many faculty and administrators raised about specific ratings results (or failed to raise in some cases)— I would be a lot more concerned about (1) the effects of erroneous or unfounded interpretations of ratings data or (2) inappropriately reported data (e.g. means reported with double digit precision and no indication of the margin of error) or (3) inadequate or unrepresentative samples resulting from low response rates.”
    Well, you probably hear lots of these comments because of your choice of career (you work in the field of assessment, not teaching per se, correct?). I agree with your points regarding how data should be published, which is why we choose to publish our full findings in a peer-reviewed journal, not here. But, you will be relieved to know that in that article we publish means with quadruple-digit precision, indications of the margins of error, a well-thought-out statistical analysis, and with other analyses to satisfy other measures of external validity that peer-reviewers thought of and questioned (e.g., there were no difference between participants’ final grades in the course by condition, etc.).
    I suppose that the scientist in my can’t help but disagree with your overall point that evaluations are on completely solid ground and that the errors are all in our interpretation of them; here we have a well-controlled study suggesting otherwise!

  • make them write
  • Posted by ddp on October 18, 2007 at 3:55pm EDT
  • I think the solution is to do away with the numerical ranking and make student evaluations of teaching exclusively based on written responses to questions. I doubt that any student would write that he or she like the course and the professor because he/she got candy (or maybe not). Given that few students actually take the time to include written comments, this would make the evaluations a far more reliable measurement too. I know that this would be hard for schools with large classes. However, I know that administrations across the country would pay to have staff deal with this cumbersome data, because they truly care about quality teaching

  • The Easier Method- grade inflation
  • Posted by rufus on October 18, 2007 at 7:50pm EDT
  • Chocolate aside, we had an assistant prof in our department who told her TAs to make sure that all her students got As or Bs until she got tenure. I have no idea how long this went on because I never TAed for her. It worked in that her course evaluations were outstanding. But, it didn't work in the end because eventually one of her TAs felt guilty and talked to the department head.

  • Posted by Marvin McConoughey on October 19, 2007 at 3:30am EDT
  • I think Rob Youmans is onto something important, and that is the power of seemingly small influences to materially change the assessments made by individuals. The study, which I have not yet read, tells of the outcome of an intentional intervention, albeit not from the professor. We have political and business leaders making weighty decisions every day and it is informative to realize that those decisions may be affected by influences that neither we nor they are aware of having an impact.

  • Response to CMH
  • Posted by Christian Nelson on October 19, 2007 at 3:30am EDT
  • CMH writes that we shouldn't be in a rush to get rid of student evaluations. In defense of them s/he claims that they are "consistent over time." One's looks, hotness and general presentational style are also consistent and multiple studies show that these factors have a HUGE effect on student evaluations of teaching. CMH also claims that when a faculty member rates a colleague's teaching it mirrors the students' ratings of that colleague. But of course they are. Most professors have as much knowledge of teaching as most students do (i.e., practically none). Further, appearance and presentational style don't have such a powerful effect on students' evaluations of professors because students are exceptionally shallow, but because ALL people make long lasting judgments of others based on first impressions of appearance and presentational style. Social psychologists have amply demonstrated this.

  • Consistent?
  • Posted by Bazooka on October 20, 2007 at 3:00pm EDT
  • If my evaluations were, as CMH claims, "consistent over time," I would be extremely worried. I was a mediocre teacher when I started out as a TA. I cared, and I tried, but I had a lot to learn about my craft. I got mediocre student reviews.

    Now I know a great deal more about teaching, and I get better reviews. I wouldn't say that any of my reviews, then or now, are fully accurate, but I would venture to claim that they have generally risen along with my experience level. And if I had to start teaching a class I've never taught before, I would expect my ratings to dip some until I got a handle on the material and felt fully comfortable with it. And when I change schools, I won't be at all surprised to see different (and possibly lower) ratings across the board until I assimilate the culture of the school and the department.

    I am always striving to improve as a teacher. I'm not so naive that I expect my ratings to continue to rise as I learn more and more, and I certainly don't believe that my ratings rate only my teaching ability, but I would be concerned if my ratings were consistent over my career.

    I think that perhaps CMH means that once an instructor is established in his or her career, the average evaluations every year remain relatively consistent?

  • To heck with chocolate; give them dictionaries!
  • Posted by Killer Komposition Klown on October 20, 2007 at 3:00pm EDT
  • We all know that there are lots of problems with student evaluations as they now stand. I find this study disturbing but not particularly surprising.

    In my experience, some of my first-term freshman comp students have had trouble understanding some of the questions on the evaluations. On a few occasions I have been asked (after the evals were finished and I came back into the room), what a particular word on the eval forms meant. Once the evals were in and I could see the results, I realized that all my students had answered all the questions, including the one(s) that they didn't understand. I would think that if they don't understand a question, they ought to leave it blank. But many seemed to approach it as they would a multiple-choice test--answer all questions at any cost! I wish I had kept a dictionary in the room, just in case the students needed it.

    And when a particular question didn't apply at all, they answered that one, too, even though I told them not to. I once taught small-group tutorials all semester and didn't have office hours that term. I received extremely low ratings for all of the questions pertaining to office hours, and my department did not strike those questions from the survey. I looked like an idiot. Now, perhaps if I'd brought chocoloate to those nonexistent office hours...

  • Chocolate coateings—bitter pills?
  • Posted by Prof Ed , Director, Faculty Development at California State University Channel Islands on October 24, 2007 at 5:20am EDT
  • Rob Youmans conjectured to Jenny Franklin:

    "Well, you probably hear lots of these comments because of your choice of career (you work in the field of assessment, not teaching per se, correct?)."

    Rob, a literature search on student evaluations should have introduced you to Dr Franklin already. She's the author/editor on two of the major books and a load of papers on student evaluations that span well over a decade. Most of the researchers of student evaluations are professors who teach. Likewise, those who study evaluation are not necessarily the experts in assessment of student learning.

    I had no impression from either her post here or her well known publications that she argued for evaluations being "on completely solid ground and that the errors are all in our interpretation of them." Her papers I have read note that the damage that leads to a certain hatred of student evaluations is largely caused from misuse of a tool based upon inaccurate assumptions of what it should, can and cannot provide.

    The trend that students generally know when they are learning is on solid ground. This is one quality that makes it a useful measure worth taking. The problem is that this general trend established on large populations is not one of high predictability. Therefore, the trend doesn't apply well to evaluating small subsets or individuals. As a result, student ratings aren't safe to use by themselves to make decisions of the importance that affect any individual's livelihood.

    Another trend on solid ground is that student ratings derive largely from the affective domain and not only from supposedly pure cognitive considerations about pedagogy or content learning. If the tool only told us the degree to which students were satisfied, that alone would make it worth administering. Why shouldn't we want to know that? Who else should we ask other than the students?

    We have known generally of the importance of affect in education since 1964, when the seldom-read second companion volume to Bloom's Taxonomy of the Cognitive Domain affirmed the importance of the affective domain. It is a powerful influence in teaching, learning and life. Those who ignore the affective domain's power are unlikely to be very successful in any of the three.

    Nonverbal communication is surely a factor, as shown by thin-slices research where participants who viewed less than 30 seconds of silent videos of professors teaching gave ratings that correlated very strongly with those final ratings of students who actually took the classes of these professors. Meta-analyses show instructor enthusiasm as a dominant factor in student ratings--which brings me to the issue of chocolate.

    Did those who administered the survey with chocolate behave exactly as those who administered without it? Was there possibility of eye contact, smiles, etc. that are part of the act of distributing a treat possibly driving up the ratings rather than the chocolate? Parts of good communication might be associated with affirming behavior of giving a treat and acknowledging individuals in the class.

    In 2007, do we need a peer reviewed article to confirm that we can measurably change the atmosphere in a room with others by how we act in that room? How do we know that this effect isn't what was really measured?

  • Comment to Ed
  • Posted by Rob Youmans on October 25, 2007 at 5:00am EDT
  • Ed,
    Even if you are right, and how the administrator of the evaluations acted in the room changed evaluations, and not the chocolate that was passed out, it still shows just how vulnerable student evaluations are to variables that are completely external to the class. Chocolate or body language, what's the difference? Neither has a thing to do with how well the class was taught. The work on thin slices suggests that energy in teaching correlates with students' overall ratings of teaching effectiveness, but my guess is that energy in teaching also correlates with teachers’ actual performance: teachers who care enough to make their teaching great are usually energetic, and students have learned this association. But I don't see the connection between that and my work, except that both have to do with evaluating a professor. My study shows that the mood in a classroom, brought on by something that has nothing to do with that class (either the chocolate, or if you prefer the less-likely-but-impossible-to-control-for body language of the person who gave out the chocolate) affects evaluations, and I think that is pretty interesting. The thin slicing literature is also very intersting, but I think the the research in that area has very different implications.

    As for whether or not we need a study in 2007 on this topic, I guess I’d say that based on the range and number of comments on this webpage from faculty, we do!

  • The best conclusions are not based on exceptional data
  • Posted by Prof Ed , Director, Faculty Development at California State University Channel Islands on October 26, 2007 at 4:10am EDT
  • The fact that different administrators' manners might trigger different evaluation responses is important, because administrators are always present and chocolate isn't. It is important because, over large populations, varied administrators' behaviors are randomized. Over large populations, the trend established by meta-analyses of student performance (as measured by tests of known reliability) and student ratings of satisfaction is positive with r about equal to 0.5. That includes whatever random variations were introduced during administration of surveys. The fact that you could increase ratings with chocolate is not random variation, it is systematic manipulation and just not that surprising to those of us familiar with the ratings literature. Manipulation has been reported in many papers.

    Your ability to manipulate ratings with chocolate seems little different from the famous Dr. Fox study where participants gave high instructional ratings to emptiness delivered with great eloquence. The book Generation X Goes to College further reveals that one can systematically manipulate ratings by affective influence for a period long enough to get tenure. Anyone familiar with the literature need not guess that "energy in teaching" (enthusiasm) is a factor. In almost all significant studies, enthusiasm and promoting interest repeatedly stand out as the most important of teaching behaviors that garner high student ratings. In short, people made to feel good are more optimistic and more prone to give more optimistic responses than are those in a bad or indifferent mood. However, feeling good about a course does not mean that most got that feeling by being manipulated.

    I surely believe that you found a significant difference between evaluation of those manipulated and those not. What I would quarrel with is the basesless extrapolation, which seems to state (based on the tone of the IHE column): "I deliberately manipulated some student evaluations. All other evaluations done are represented by my study, and we should get rid of student evaluations."

    Creation scientists use the same reasoning to argue for an age of our planet of a few thousand years—point to a score of erroneous radiometric dates out of several tens of thousands that establish a clear pattern; ignore the pattern, and base a global conclusion on the exceptions. Thats not science; it is delusion born of advocacy.

    Despite the many documented exceptions, the trend noted above seems unassailable. Every researcher with a huge data base has gotten about the same r=0.5 correlation. Your exception can't change the pattern established: in general, students do know when they are learning.

  • Authority is a Poor Substitute for Science
  • Posted by Rob Youmans on October 26, 2007 at 4:00pm EDT
  • Ok, Ed writes:

    “The fact that different administrators’ manners might trigger different evaluation responses is important, because administrators are always present and chocolate isn’t. It is important because, over large populations, varied administrators’ behaviors are randomized.”

    That’s right Ed, if random administrators were always who gave out evaluations, then over time you would expect the administrators of evaluations to act in some ‘average’ way because of random sampling. It’s a good argument for having random administrators. Unfortunately, I bear witness to at least two universities where evaluations are NOT given out by administrators; they are given out by the professor of the class. You reason from a very false assumption when you assume that there is a dedicated team of administrators at every university that handles evaluations. For example, at my current institution (in your same CSU system), evaluation forms are left in my box, and I pass them out. So, one cannot rely on the power of random selection in that case because the administrator of the evaluation is not random.

    “Your ability to manipulate ratings with chocolate seems little different from the famous Dr. Fox study where participants gave high instructional ratings to emptiness delivered with great eloquence. The book Generation X Goes to College further reveals that one can systematically manipulate ratings by affective influence for a period long enough to get tenure. Anyone familiar with the literature need not guess that “energy in teaching” (enthusiasm) is a factor.”

    Yes, and my point is that enthusiasm is a VALID factor upon which to rate an instructor, whereas chocolate, or the demeanor of the administrator, is not.

    “I surely believe that you found a significant difference between evaluation of those manipulated and those not. What I would quarrel with is the basesless extrapolation, which seems to state (based on the tone of the IHE column): “I deliberately manipulated some student evaluations. All other evaluations done are represented by my study, and we should get rid of student evaluations.”

    One famous problem with email or other typed messages is that, in fact, they have no tone. I think you have greatly mistaken my meaning in reporting my findings. But rather than your interpretation of my tone, let me refresh your memory using the above article itself (This is a direct quote from the article, just look up):
    “Youmans said the study isn’t an attempt to criticize the student evaluation of faculty as a valueless exercise. But by showing how easily manipulation can occur, the researchers say they want colleges to keep the evaluation results in context.”
    And here is a direct quote from my co-author, just scroll to the top of this page to see it for yourself...
    “There are lots of factors that affect a student’s evaluation of a professor,” Jee said. “Some will be hard to change and may be legitimate. Our argument is not that instructors should benefit by giving chocolate but that all evaluations should be given in more standardized ways to limit the effects of extraneous things like how they are handed out.”
    So, I really just think you somehow did not read those lines, because they are pretty clearly not saying what you interpreted as our meaning.
    You end with the unkind comparison:
    “Creation scientists use the same reasoning to argue for an age of our planet of a few thousand years—point to a score of erroneous radiometric dates out of several tens of thousands that establish a clear pattern; ignore the pattern, and base a global conclusion on the exceptions. Thats not science; it is delusion born of advocacy.”

    I agree, but we are not guilty of this! Again, did you read the article above? However, I might add that another delusion of science, one you appear to be someone in danger of committing, is reasoning from authority. I would hope that young scientists do not abandon their research because “the famous studies of Dr. Fox” or the famous Dr. Franklin, or the famous Prof. Ed and company who were not surprised and who knew all of this already said so. As I recently taught my research methods class, science is about data and scholarship (reading helps too), but not about intimidation. I am aware that there are many other very great scientists who have much to say on this topic. On the other hand, the earth is not the center of the solar system, in spite of famous people who said so.

  • statistical misunderstandings
  • Posted by david , Assistant Prof on November 5, 2007 at 12:15pm EST
  • The meaning of statistical "significance" is unclear here. Why are multiple tests anything about which to worry? Why mention two values for alpha? In any case, why would you suspect that your nil null hypothesis could be exactly true?

    The reference to Cohen's standards is equally problematic. Of what relevance are the findings of Cohen's literature review to the substantive interpretation of the observed difference (standardized or otherwise) between classes in this study?

    When will the thoughtless and mechanical application of inferential statistical tools end?