In a rare moment of inattention a couple of years ago, I let myself get talked into becoming the chair of my campus’s Institutional Review Board. Being IRB chair may not be the best way to endear oneself to one’s colleagues, but it does offer an interesting window into how different disciplines conceive of research and the many different ways that scholarly work can be used to produce useful knowledge.
It has also brought home to me how utterly different research and assessment are. I have come to question why anyone with any knowledge of research methods would place any value on the results of typical learning outcomes assessment.
IRB approval is required for any work that involves both research and human subjects. If both conditions are met, the IRB must review it; if only one is present, the IRB can claim no authority. In general, it’s pretty easy to tell when a project involves human subjects, but distinguishing nonresearch from research, as it is defined by the U.S. Department of Health and Human Services, is more complicated. It depends in large part on whether the project will result in generalizable knowledge.
Determining what is research and what is not is interesting from an IRB perspective, but it has also forced me to think more about the differences between research and assessment. Learning outcomes assessment looks superficially like human subjects research, but there are some critical differences. Among other things, assessors routinely ignore practices that are considered essential safeguards for research subjects as well as standard research design principles.
A basic tenet of ethical human subjects research is that the research subjects should consent to participate. That is why obtaining informed consent is a routine part of human subject research. In contrast, students whose courses are being assessed are typically not asked whether they are willing to participate in those assessments. They are simply told that they will be participating. Often there is what an IRB would see as coercion. Whether it’s 20 points of extra credit for doing the posttest or embedding an essay that will be used for assessment in the final exam, assessors go out of their way to compel participation in the study.
Given that assessment involves little physical or psychological risk, the coercion of assessment subjects is not that big of a deal. What is more interesting to me is how assessment plans ignore most of the standard practices of good research. In a typical assessment effort, the assessor first decides what the desired outcomes in his course or program are. Sometimes the next step is to determine what level of knowledge or skill students bring with them when they start the course or program, although that is not always done. The final step is to have some sort of posttest or “artifact” -- assessmentspeak for a student-produced product like a paper rather than, say, a potsherd -- which can be examined (invariably with a rubric) to determine if the course or program outcomes have been met.
On some levels, this looks like research. The pretest gives you a baseline measurement, and then, if students do X percent better on the posttest, you appear to have evidence that they made progress. Even if you don’t establish a baseline, you might still be able to look at a capstone project and say that your students met the declared program-level outcome of being able to write a cogent research paper or design and execute a psychology experiment.
From an IRB perspective, however, this is not research. It does not produce generalizable knowledge, in that the success or, more rarely, failure to meet a particular course or program outcome does not allow us to make inferences about other courses or programs. So what appears to have worked for my students, in my World History course, at my institution, may not provide any guidance about what will work at your institution, with your students, with your approach to teaching.
If assessment does not offer generalizable knowledge, does assessment produce meaningful knowledge about particular courses or programs? I would argue that it does not. Leaving aside arguments about whether the blunt instrument of learning outcomes can capture the complexity of student learning or whether the purpose of an entire degree program can be easily summed up in ways that lend themselves to documentation and measurement, it is hard to see how assessment is giving us meaningful information, even concerning specific courses or programs.
First, the people who devise and administer the assessment have a stake in the outcome. When I assess my own course or program, I have an interest in the outcome of that assessment. If I create the assessment instrument, administer it and assess it, my conscious or even unconscious belief in the awesomeness of my own course or program is certain to influence the results. After all, if my approach did not already seem to be the best possible way of doing things, as a conscientious instructor, I would have changed it long ago.
Even if I were the rare human who is entirely without bias, my assessment results would still be meaningless, because I have no way of knowing what caused any of the changes I have observed. I have never seen a control group used in an assessment plan. We give all the students in the class or program the same course or courses. Then we look at what they can or cannot do at the end and assume that the course work is the cause of any change we have observed. Now, maybe this a valid assumption in a few instances, but if my history students are better writers at the end of the semester than they were at the beginning of the semester, how do I know that my course caused the change?
It could be that they were all in a good composition class at the same time as they took my class, or it could even be the case, especially in a program-level assessment, that they are just older and their brains have matured over the last four years. Without some group that has not been subjected to my course or program to compare them to, there is no compelling reason to assume it’s my course or program that’s causing the changes that are being observed.
If I developed a drug and then tested it myself without a control group, you might be a bit suspicious about my claims that everyone who took it recovered from his head cold after two weeks and thus that my drug is a success. But these are precisely the sorts of claims that we find in assessment.
I suspect that most academics are either consciously aware or at least unconsciously aware of these shortcomings and thus uneasy about the way assessment is done. That no one says anything reflects the sort of empty ritual that assessment is. Faculty members just want to keep the assessment office off their backs, the assessment office wants to keep the accreditors at bay and the accreditors need to appease lawmakers, who in turn want to be able to claim that they are holding higher education accountable.
IRBs are not supposed to critique research design unless it affects the safety of human subjects. However, they are supposed to weigh the balance between the risks posed by the study and the benefits of the research. Above all, you should not waste the time or risk the health of human subjects with research that is so poorly designed that it cannot produce meaningful results.
So, acknowledging that assessment is not research and not governed by IRB rules, it still seems that something silly and wasteful is going on here. Why is it acceptable that we spend more and more time and money -- time and money that have real opportunity costs and could be devoted to our students -- on assessment that is so poorly designed that it does not tell us anything meaningful about our courses or students? Whose interests are really served by this? Not students. Not faculty members.
It’s time to stop this charade. If some people want to do real research on what works in the classroom, more power to them. But making every program and every faculty member engage in nonresearch that yields nothing of value is a colossal, frivolous waste of time and money.