Search News


Browse Archives

News

Assessing the Assessments

November 5, 2009

Share This Story

FREE Daily News Alerts

Advertisement

When the country's two major associations of public universities were trying to craft a new accountability system three years ago, they found that many of their member institutions (and especially their faculties) were deadset against the idea of choosing one measure of student learning outcomes.

"Their reaction was, we don't want a single test along the lines of No Child Left Behind -- we want multiple tests from which to choose," said David Shulenburger, vice president for academic affairs at the Association of Public and Land-Grant Universities, which designed the Voluntary System of Accountability along with its partner, the American Association of State Colleges and Universities.

In response, the groups settled on three possible options that institutions could use to fulfill the "student learning outcomes" portion of the VSA (the Council for Aid to Education's Collegiate Learning Assessment, the Educational Testing Service's Measure of Academic Proficiency and Progress, and ACT, Inc.'s Collegiate Assessment of Academic Proficiency), thereby avoiding the single test problem.

But it created another potential issue, Shulenburger says: uncertainty about whether the results on one test (chosen by one institution) would be the comparable to the results for another institution that chose a another of the three tests, and the possibility that institutions would try to game the system by seeking to use a test on which they thought they might perform better.

On Tuesday, the groups released a federally funded analysis of a "test validity study" conducted by the makers of the three tests showing that the three tests produced comparable outcomes at the institutional level, based on having been administered at a diverse range of 13 institutions, big and small, public and private.

In other words, a college that ranked in the 95th percentile for critical thinking using one of the tests would rank in roughly the same place using the critical thinking component of one of the other two tests, and vice versa.

The study, which was part of a larger $2.4 million grant financed by the Fund for the Improvement of Postsecondary Education and led by the Association of American Colleges and Universities, (link to 2007 story), doesn't necessarily mean that the tests measure exactly the same thing, given their differences, but that an institution will fare (or "rank") essentially the same no matter which measure they use.

The significance of that finding, in Shulenburger's view, is that "it means that within the VSA, we can offer some diversity in measurement" to satisfy faculty and other concerns about a one-size-fits-all approach "and still be able to say that we're using consistent measurement from school to school."

The study may have solved that political problem for VSA, but it did nothing to ease concerns among those (including some leading psychometricians and researchers) who question the accountability system's underlying dependence on tests that purport to measure student learning, and especially an institution's role in driving that improvement among its students.

"Even if the tests do measure the same thing, there is no evidence that they measure learning and, more specifically, learning that is the result of what the student has experienced in college," Victor Borden, associate vice president for university planning, institutional research, and accountability at Indiana University at Bloomington, said in an e-mail message.

While he acknowledged that the study released Tuesday was not intended to prove the tests' ability to measure student learning, and that the creators of the Voluntary System of Accountability cited previous validity studies in embracing the CLA, MAPP and CAAP three years ago, Borden is unpersuaded. "The research conducted to date does not demonstrate that these exams measure any aspect of college learning."

Shulenburger, in reply, said only that the public university groups had been satisfied by the existing evidence that the three tests can be used to measure the "value added" student learning that colleges and universities contribute.

That argument is unlikely to be settled for some time. But the release of the study on the three tests' comparability raises some other, more immediate issues.

By eliminating the tests' predictive powers as a reason for choosing one over another, since institutions would fare comparably whichever they chose, colleges can now focus on other factors in deciding which of the three exams to use, the researchers said. "[T]he decision about which measures to use will probably hinge on their acceptance by students, faculty, administrators and other policy makers. There also may be trade-offs in costs, ease of administration, and the utility of the different tests for other purposes, such as to support other campus activities and services," the VSA analysis says.

What that may mean for the three tests and their providers is unclear. Backers of the Collegiate Learning Assessment have been viewed (and resented) in some quarters for arguing, often none too subtly, that their test is better than the others at measuring value-added learning. With that advantage arguably wiped away by the comparability study, Steve Klein, director of research at the Council for Aid to Education, focused on what students might learn -- and what institutions might want them to gain -- from taking the CLA. Unlike the more standardized CAAP and MAPP tests, the CLA focuses on giving students problems to solve.

"The skills that you would need to do one are different from the skills you'd need to do the other," Klein said in a telephone interview Tuesday. "When we look at the mission statements of colleges, they emphasize the kinds of things we're testing. What message do you want to send to students and faculty about the skills you think are important? Is it about regurgitation, or the kinds of analysis you'd have to do to take a test like the CLA?"

But some testing experts speculated that the finding that the tests predict equivalently could hurt the CLA, which is significantly more time consuming, and somewhat more costly, for colleges to administer (though its protocols call for fewer students to be tested than do those of its competitors).

Jim Sconing, who directs ACT's statistical research department and represented it on the FIPSE study, said there was no doubt that "some colleges prefer the open-ended type of questions" contained in the CLA, because they "think it has more face validity with their faculty." But "other people are drawn to the fact that multiple choice tests tend to have higher reliability," Sconing said -- an assertion challenged by the FIPSE validity study, Klein said.

And many colleges, Sconing added, are increasingly likely to "base their choice of test on other things, such as ease of use and, yes, cost."

The validity test also could end up opening the way for more competition for all three of the tests that are already considered VSA-worthy, Shulenburger noted.

"We now have a benchmark for considering the addition of other measures of value added learning outcomes," he said. "If folks come up with other value added measures that correlate highly with one or two of these, then perhaps we have a candidate for adding other measures."

See all postings »
Advertisement
Advertisement

Matching Jobs

Comments on Assessing the Assessments

  • More needed...
  • Posted by David Eubanks on November 5, 2009 at 7:15am EST
  • The standardized tests are convenient, but largely dissociated from actual program-level learning outcomes. Some of the latter can be found by patient searchers who sift through accreditation reports online (see: http://highered.blogspot.com/2009/11/404-learning-outcomes.html ) but it would be great to have something like the VSA that focuses on program-level outcomes. I've only scanned the validity report so far, but the idea that a bunch of standardized tests tend to agree with one another isn't surprising. It also doesn't mean that the results are valid (i.e. useful) for the classroom instructor or program coordinator, or a potential student who wants to know how good the chemistry program is. It's wishful thinking, if that's the intent.

  • Measurement of Gen Ed outcomes
  • Posted by Bob Kallgren , VP Institutional Effectiveness at Columbia International University on November 5, 2009 at 8:30am EST
  • The previous comment article is true for program-level outcomes measurement, but don't these instuments serve a useful purpose for general education outcomes measurement on the institutional level? I'm sure many of us are hoping so! :)

  • Comparable Results does not Value Added Make
  • Posted by Victor Borden , University Planning, Institutional Research and Accountability at Indiana University on November 5, 2009 at 8:30am EST
  • David Shulenburger's Comment, 'the public university groups had been satisfied by the existing evidence that the three tests can be used to measure the "value added" student learning that colleges and universities contribute' is at the heart of the matter. None of the research to date has provided any direct evidence to back this assertion and the indirect evidence is weak and inconsistent. The intentions of Shulenburger and his colleagues involved with the VSA and other such efforts are earnest, well intentioned, and address a very important objective. Moreover, the test-makers have done due diligence to ensure that these tests differentiate among examinees in a way that demonstrates variation in skills and abilities that are in the domain of critical thinking and other general learning outcomes. However, we are doing a great disservice to assessment for both accountability and improvement purposes if we skirt or mask the most important aspect of measure validation as posited by such leaders in the field of measurement as Messick and Cronbach: The validity of a measure is based on evidence regarding the inferences and assumptions that are intended to be made and the uses to which the measure will be put. Showing that the three tests in question are comparable does not support Shulenburger's assertion regarding the value-added measure as a valid indicator of institutional effectiveness. The claim that public university groups have previously judged the value-added measure as appropriate does not tell us anything about the evidence upon which this judgment was based nor the conditions under which the judgment was reached. As someone familiar with the process, I would assert that there was no compelling evidence presented that these instruments and the value-added measure were validated for making this assertion (no such evidence was available at the time), which is the intended use in the VSA.

  • criterion validity
  • Posted by Dan Bernstein , Center for Teaching Excellence at University of Kansas on November 5, 2009 at 9:45am EST
  • Before we endorse any short-cut instrument to measure learning, it should be demonstrated that scores on the short-cut are highly related to a richer measure of understanding, not just to other brief instruments. I recently heard a presentation (at Indiana University) by researchers from the University of Cincinnati who have looked for such a correlation between CLA scores and highly reliable judgments of the same skills as demonstrated in course-embedded assessments. In this first study (there are others underway), there was little or no relation between CLA scores and course-embedded assessment. More research needs to be done, but we should all be following this line of inquiry to address the criterion validity of CLA and its cousins. This is a much more important question than whether the three standardized tests agree with each other. Unless they are a good short-cut instrument to measure something faculty and communities care about, they can't be offered as a substitute for reading and evaluating student work.

  • Sure, but
  • Posted by David Longanecker , President at WICHE on November 5, 2009 at 11:30am EST
  • All of the comments posted prior to mine are legitimate, but I am nonetheless very pleased to see the results of the FIPSE study and pleased that the VSA is using these instruments to look at student leaning. I am so pleased for two reasons. First, assessment of student learning's time has come and we can't let the perfect be the enemy of the good. Yes, we have work to do to improve our assessment of student learning. But while we enjoy arguing about external versus internal validity, reliability, significance of differences, value added, etc., the rest of the World, including those policy makers upon whom we rely for both public and financial support, see us as the Emperor with no clothes. As Pirsig so aptly put it in Zen and The Art of Motorcycle Maintenance, "if you can't say what quality is, . . . then for all practical purposes it doesn't exist." So, we have evidence now that these are pretty good measures of whether students are learning what they need to know and be able to do, and we should use them. Second, because they are not perfect but are going to be used, we as an academic community have both the responsibility and incentive to improve upon the measures; not just talk about their limits. So, bravo to VSA and FIPSE.

    Dave Longanecker

  • The Easy Way Out
  • Posted by Cliff Adelman , Senior Associate at Institute for Higher Education Policy on November 5, 2009 at 11:45am EST
  • What does your daughter or brother-in-law talk about learning in college at the Thanksgiving dinner table? "Critical Thinking" (whatever that is)?

    Making and Breaking Arguments (half of the CLA)? Really? I don't know about you, buy I learned a lot about Ferro-liquids from a son majoring

    in Chemistry, and about undercurrents in the French Englightenment from another son majoring in History, and I might have come out with

    a better understanding of the Physicians' Desk Reference from a child majoring in Nursing or the difference between FASB and GASB from

    an accounting major. Eubanks is right on: these tests do not assess the reasons our children or other relatives go to college to study. They measure

    only what is indirectly taught (maybe), and their results have no impact whatsoever on the lives and learning of students. Too, with small samples of

    voluntary test-takers (however much the sellers of these tests tell you that those samples are "representative"), they provide an easy way out for academic

    administrators who want to avoid the time-and-effort consuming but incredibly valuable task of developing detailed major program learning outcome

    statements (even the specialized accrediting bodies don't get down to the level of discrete, operational statements that guide faculty toward appropriate

    assessment design). We now have, instead, the model of the European "Tuning" projects in the disciplines, under an exploratory trial in the U.S. in three

    state systems (Indiana, Minnesota, and Utah). Watch for the results of this first exploration and where the process subsequently seeds itself in

    distrinctly U.S. contexts. When you get done with Tuning, you won't need to buy those other exams---and, not so by-the-way, the principal beneficiaries

    are all the students in a major program.

  • assessment, the hard way
  • Posted by OldProf on November 5, 2009 at 12:45pm EST
  • This week my students handed in essays on a topic arising out of the readings in my course. I spent a very long time reading each one of them. I labelled the strong points "good" and briefly wrote a note about why they were good. I indicated places where the individual student could have done things better, or really got things awfully wrong. I made a brief suggestion at the end of each essay about what to concentrate on in their next written assignment. Several students came to see me and discuss my comments and suggestions.

    That's education, folks. That's assessment. That's what you're paying for when you send your kids to a college with a decent-sized student-faculty ratio.

    If somebody really cared about "value added," they could look at each student's first essay in this course, and compare it with that same student's last essay in this course. This person could then evaluate each individual student's increased mastery of the subject-matter in the course (there's a lot) and also the increased writing skill, if any.

    We teach sophisticated subject-matter in college. Most students know relatively little about, say, anthropology, or history of science, or organic chemistry, or Japanese painting, when they come to college, and taking a standardized test on these subjects at the beginning of their college career would produce relatively low scores. But we also use and build on more general skills, like critical thinking and the ability to write clearly. Students are admitted to selective colleges because they have these skills to a greater extent than students who are rejected. These skills cannot be separated out from student success in learning sophisticated subject-matter, because understanding anthropology, or history of science, or organic chemistry, or Japanese painting, is not a matter of absorbing individual facts, but learning facts and ways of thinking about them in a seamless, synthetic way. No assessment scheme that neglects these obvious facts about higher education is going to do anybody any good, and we'll be wasting valuable intellectual and financial resources if we try to design one.

    The most important assessment instrument is called Exam Week. Check it out.

  • Efficient, Valid, Cost-Effective, Respected - CT Measures
  • Posted by Peter Facione , Research Director on November 5, 2009 at 1:45pm EST
  • No question finding a critical thinking testing tool that is valid, reliable, cost-effective, efficient to use and student friendly is tough enough. Finding one that also has the face validity to pass muster with a skeptical faculty committee not versed in measurement science, adds additional challenges. For twenty years our research group has been addressing these issues. Today we have academic, governmental, business and NGO clients using our critical thinking assessment tools on-line or in paper-and-pencil form in English or in one of their many authorized translations.

    The California Critical Thinking Skills Test (CCTST), used by hundreds of colleges and universities throughout the US and worldwide, has quietly become one of the leading measures of critical thinking skills. Its companion tool, the California Critical Thinking Disposition Inventory (CCTDI), assesses the habits of mind which incline one to use critical thinking in real world problem solving and decision making. Research shows that having the skills to think well and having the disposition to use those skills in key judgment situations is not highly correlated and yet few educational settings are taking this into account.

    The validity and utility our measurement tools have been affirmed by independent researchers and their studies are published in peer reviewed journals. Today the tests are used for a variety of purposes including learning outcomes assessment, curriculum evaluation, educational research and program admissions data gathering.

    Professional fields use versions of the CCTST designed to assess critical thinking skills in their practice and educational contexts. The Health Sciences Reasoning Test, the Business Critical Thinking Skills Test, the Legal Studies Reasoning Profile and the Military and Defense Critical Thinking Inventory all are versions of the CCTST tailored to these various professional groupings. These tools have been proven to predict professional licensure, and academic and workplace success. The are known to be user-friendly, well-designed, valid, efficient and cost-effective.

    So, why do we not hear more about the California Critical Thinking family of instruments in the ongoing national discussions about college learning outcomes assessment? Frankly, we wonder about this ourselves. But, for whatever reason, the CCTST and the CCTDI have made their own way without the same level of national attention.

    These critical thinking measures are published by a for-profit company that gives deep discounts for non-profit users and that shares our deep commitment to fostering critical thinking in academic and workplace settings. That company, Insight Assessment, has been around for many years, serving clients and supporting faculty, deans and doctoral students across the full spectrum of disciplines with testing tools that work.

  • abandon this project
  • Posted by Chris , BMF at HKU on November 5, 2009 at 3:30pm EST
  • I agree with the tone of most commenters. Assessment using standardized tests is bad enough in K-12 (and there are strenuous disagreements over "value added" due to self-selection into schools, a problem that would only be worse in college studies), but it is ludicrous at the college level. Before the cries of "you are just scared of accountability" emerge, I am accountable to my peers who are competent to judge the quality of my courses.

  • Assessment missing
  • Posted by RBG on November 5, 2009 at 5:30pm EST
  • How long before one of these experts will make an effort to test how much the student wantsto learn. There is an old saw about a horse and water, but all we ever see considered is the water -- never the horse.

  • Reply to President Longanecker
  • Posted by Dan Bernstein on November 5, 2009 at 7:15pm EST
  • You wrote: Second, because they are not perfect but are going to be used, we as an academic community have both the responsibility and incentive to improve upon the measures; not just talk about their limits.

    Like the other earlier commentors, I agree with you, and that is what we are trying to do. One of the partners with APLU in the FIPSE grant is the Association of American Colleges and Universities, and its LEAP and VALUE projects are doing just what you ask for. They are national in scope, and their products can be used reliably with prepared readers. Why should we accept a short-cut assessment of learning just because our faculty colleagues think it is not worth their time? If learning is important, then we should be willing to consider a measuring system that gets at what we care about, even if it means having faculty time devoted to it. One of the loudest arguments in favor of standardized tests is pure convenience. Asking for real faculty judgment is not asking for perfection, just asking for some commitment to academic success.

  • It does matter which test is selected
  • Posted by Jeremy Penn on November 6, 2009 at 10:00am EST
  • The article states, "In other words, a college that ranked in the 95th percentile for critical thinking using one of the tests would rank in roughly the same place using the critical thinking component of one of the other two tests, and vice versa."

    In practice, no institution administers all three of these exams in the same year.

    The institutional scoring process used to determine whether or not your institution is "above, below, or at" expected is based on comparisons to all other institutions who completed that specific exam (e.g., CLA). Therefore, the institution's choice of which exam to use can make a big difference in your institution's final result because it is relative to the performance of all of the other institutions who used that specific exam. An institution trying to 'game' the system could select the test they believe gives them the best chance to outperform the other institutions who select the same test.

  • Outcomes Assessment
  • Posted by Mildred Hopenzsche, EHD on November 8, 2009 at 3:15pm EST
  • The Buddha says the best results will come if you detach from outcomes.

  • I agree with Mildred Hopenzsche
  • Posted by DFS on November 8, 2009 at 10:00pm EST
  • And, if assessment were assessed accurately -- and I realize that's a difficult concept for assessors -- they would assess that their product is shit.

    In fact, they may even realize that 'assessment' is not the way to go. Let's instead return to what worked before man landed on the moon.

    We wouldn't be able to get there now, I'm afraid, without further casualties.

    Despite all of our present Bells & Whistles.

  • Self-reflecting validation?
  • Posted by PQuincy , Prof at a middling R1 on November 11, 2009 at 5:15am EST
  • Given the lack of shared specific 'learning goals' in much of college education, and the utter vagueness of the goals that could be generally shared ('critical thinking'), the assessment movement has not gotten very far in convincing us faculty that their work enjoys the kind of rigor they claim.

    That's not to say that self-reflection as well as examination of graduates and near-graduates is not valuable, and that curricula, teaching methods and teachers themselves should not be subject to review.

    Still, I have to wonder: how much of the 'success' recorded by the fans of assessment results simply from the fact that they are paying attention. I'm reminded of the early time-motion studies in factories that seemed to be resulting in splendid improvements...until someone thought to run a control in which researchers in white coats entered the factory floor, listened carefully to the workers with respect, and instituted utterly random changes -- which also led to splendid improvements. The acts of testing and measuring and caring may generate 'improvements' -- but that does not mean that the changes instisted because of the testing are causing the improvements.

    When dealing with humans -- teachers and students both -- reflexive effects matter a lot. Again, that isn't to say that both empirical measurement (of what, though?) and systematic assessment, feedback, and adjustment in light of the measurements isn't sometimes a valuable exercise. But I have yet to be persuaded that the novel apparatus arising to do so is any better at it than previous generations' novel apparatuses, which we still continue to apply.