The Assessment Impasse

You have /5 articles left.
Sign up for a free account or log in.

The new U.S. secretary of education, Arne Duncan, will be a fresh and welcome breeze for those in higher education who have struggled so mightily to be understood in the past eight years. He is known for his collaborative and inclusive style, and he brings at least a little actual experience, if not a degree, in education.

It should not be expected, however, that Secretary Duncan’s tenure at the Education Department will automatically diffuse the longstanding pressure that advocates of assessment for the purpose of accountability have placed on advocates of assessment for program and institutional improvement. After all, he relied heavily on the results of standardized testing to drive reform in his previous position as superintendent of the Chicago Public Schools. Second, forces remain strong in Congress and among business interests for measurements of learning outcomes that can be used to compare institutions and hold them accountable. And third, faculty are not going to change their opinions any time soon about standardized tests.

The impasse is alive and well.

At the core of the problem is an issue that hasn’t been discussed much when it comes to measuring learning outcomes in the higher cognitive skills: We literally don’t know what we’re talking about. Oh, we’re fairly specific and certain when it comes to outcomes of content knowledge in the disciplines, and there doesn’t seem to be much of a snag in assessing the kind of skills that require students to arrive at a single right answer, such as calculating the trajectory of a missile or transposing a musical score to another key.

Higher cognitive skills, however, are hard to measure precisely, because it is impossible to measure something well if one doesn’t know exactly what it is, and frankly, that part has been largely ignored. The first step in assessment, according to most authorities, is to define what students are expected to learn. There is widespread agreement that “critical thinking,” for example, is terribly important to teach: the term pops up in nearly every curriculum guide and college catalog. There is no agreement, however, about what critical thinking is.

Hundreds of definitions and even a few good theoretical constructs float through the literature, but none of them has gained enough currency to serve as the foundation for widespread assessments. A few common threads run through the various notions, of course, but assessments require scorers to make fine judgments based on specific criteria. A slightly different emphasis or turn of phrase in the criteria will result in widely variable judgments.

The fundamental difficulty in developing definitions of critical thinking precisely enough to use in assessments of student work is the fact that no such definitions exist as concrete reality, out there, as they say, in a positivist sense. By contrast, if we were measuring gravity, we could probably rely upon gravity existing and acting the same way regardless of whether it is being investigated by a physicist in Indiana or a physicist in India.

Furthermore, it seems likely that gravity would carry on, dragging every bouncy thing back to earth, even if the human race were wiped out by aliens.

Could the same thing be said of critical thinking? If there were no humans to think, would critical thinking exist? (Please don’t bring up chimpanzees -- that’s different.) Critical thinking probably exists only as we humans think it up, and it is therefore socially constructed, highly dependent upon specific social, historical, and cultural contexts, and doomed forever to evolve as the people who use it evolve. Definitions of critical thinking have meaning to the persons who use them communally in everyday discourse, thereby developing common understandings of them based in real-life situations over time, but the definitions are not portable from Indiana to India in the same way gravity is.

How, then, can critical thinking be assessed? Many standardized tests purport to measure critical thinking, but testmakers fail to explain exactly what is being measured and how it is measured. One can read the definitions or descriptions of critical thinking found in their promotional literature of course, but those should be actively ignored since they do not necessarily bear a close resemblance to what is actually being tested.

To discern exactly what definitions or constructs are actually tested, one would have to examine not only the test itself, but the procedures for scoring the student responses. Who is hired to do the scoring, how are they trained, and what models of student work are used in the process of norming the scorers? Policy makers and the public generally trust the technical expertise of test developers to answer such questions.

They should not. These questions are tricky and they go to very heart of what it is we value and attempt to teach. Let’s say, for example, that we hope to teach students to make “logical inferences.” What counts as a logical inference? If Franklin examines the information in a test question and concludes that “a bailout is necessary to save the financial system,” and Eleanor reads the same material and concludes that “a bailout is a violation of free market principles,” which one has made a logical inference? Which one earns a score of six and is admitted to Harvard, and which one earns a score of one and goes to community college?

Test makers, of course, avoid such controversial, value-laden, and messy questions and stick with something easier to score. And there’s the rub.

The critical thinking skills students actually need pertain to real-life, ill-structured, very messy problems, while the tests require something far simpler so that reliable scoring can be obtained. Franklin and Eleanor can answer the question with almost any opinion, the test makers say, as long as they can support it.

Test-preparation companies, who along with testing companies profit richly from the current reliance on standardized tests, suggest that supporting paragraphs on essay questions consist of one example from literature, one from history, and one from personal experience, with a couple big words thrown in somewhere. In other words, “critical thinking” consists of supporting whatever opinion a student holds when walking in the door for the test with whatever evidence comes to mind. That is what the scoring mechanism rewards. Even the much-touted Collegiate Learning Assessment, which at least presents a problem to be solved by examining some documents, utilizes highly-structured, artificial materials with carefully planted faux pas of logic for students to catch.

The scoring mechanism, indeed the entire testing regimen, fails to reward many of the intellectual habits associated with critical thinking that we would most like students to develop: obtain as much information as possible; read or discuss the problem with people who hold a variety of perspectives to understand its complexity; evaluate the credibility and trustworthiness of sources carefully; double-check accuracy of facts, data, quotations (and spelling!); reject snap judgments and easy conclusions based on ideology; take time to reflect deeply upon how one’s own values, motives, and assumptions are influencing one’s thinking; consider the long-term ramifications of opinions in the light of fundamental values; and be willing to modify opinions and decisions when warranted by new information and thinking. Under test conditions, there is no time and no opportunity for that sort of thing, short of cheating.

When high stakes are attached to the results of simplistic tests that reward only superficial thinking, faculty must focus on those lower-level, facile skills that will improve student scores, while the most important goals of a liberal education in the classic sense, facing ill-structured problems with mature thinking, are neglected. The consequence is a dumbing-down of the entire educational enterprise. Although faculty have not generally been persuasive in articulating it, that is the primary reason for their resistance to standardized testing. If Duncan can comprehend this single idea, it would be the beginning of a dramatic breakthrough in the assessment impasse.

Before serious progress can be made in measuring higher cognitive skills such as writing and critical thinking, higher education will have to work long and hard on developing clear definitions of what the skills are. Here are several suggestions for that work:

1) Make the conversation broad-based and inclusive. While a few experts can certainly write some nice-sounding phrases, they will lack meaning and precision to those who have not been party to the conversation.

2) Rather than a top-down process in which a small number of people control the conversation, institute a bottom-up process that engages all willing faculty at the local level in developing their own unique and meaningful definitions, and later work gradually to find commonalities among them and develop a variety of national definitions that meet the needs of our varied institutions.

3) Put the conversation online so that all faculty responsible for teaching the skills can participate in constructing their definitions and access the reasoning and examples on which they are based.

4) Frame the definitions with both the theoretical constructs and empirical research from the relevant disciplines. This provision meets the requirement of the definition of validity jointly adopted by the American Psychological Association (APA), the National Council on Measurement in Education (NCME), and the American Education Research Association (AERA):“Validity refers to the degree to which evidence and theory support the interpretation of test scores entailed by proposed uses of tests.”

5) Accompany each part of the definition with samples of actual student work that illustrate the level of achievement expected. Without such grounding in real examples, abstract terms such as “convincing,” “appropriate,” or “logical” have little meaning in the task of assessing student work.

6) View the work as an ongoing project rather than a discrete event. Times change, people change, cultures change, and the kinds of critical thinking and communication skills students need in the future will be different from what is required now, just as the requirements of Eisenhower’s day, or Lincoln’s, or Washington’s, are outmoded for our time.

7) Be patient. Quality work takes time. The Bologna process has been in the works for 10 years, and they are just now “fine tuning” standards. Allow two to three years for local startups to function fully and another two years for campuses to visit each others’ Web sites, find peer institutions in the region whose missions and students are similar, and unite their “online assessment communities” (accreditors can facilitate this process). Then allow another few years for regional groups, first by visiting and then participating in each others’ online “assessment communities” to work toward national standards for that kind of peer institution.

If Secretary Duncan hopes to institute accountability in higher education more successfully than his predecessor, he should start by tackling the problem of developing clear constructs of higher cognitive skills, supported by theory and research, generated by widespread faculty participation, and grounded in samples of actual student work. Creating Web space and technical assistance for assessment communities would be a great contribution of the federal government to local work. Only after general agreement has been reached about what higher cognitive skills are can they be assessed meaningfully. And by then, standardized tests won’t be needed at all, because faculty will be assessing them quite well, from Indiana to India as if they were gravity, independently of any external intervention.