How to Reform Testing
Many educators believe that standardized tests, such as the SAT and the ACT, do not fully measure the spectrum of skills that are relevant for college and life success, and yet, year after year, the same tests hang around with only cosmetic changes. What is to be done?
First, one needs to understand the persistence of standardized tests, and then one can address what can be done. Consider each in turn.
In the beginning of the 20th century, tests like the SAT made reasonably good sense. Most test-takers were upper-middle class or upper-class white males, and so there was little variation in socioeconomic status. Differences in test scores thus largely represented differences in achieved academic skills rather than differences in opportunities to achieve those skills. But as the testing population broadened, test scores more and more reflected differences in opportunity — socialization in the home, school quality, and willingness and ability of parents to invest in the education of their children.
Standardized tests have great staying power for several reasons, described in more detail in my 2010 book, College Admissions for the 21st Century. Here are some of them:
- The Appearance of Quantitative Precision. The tests give an exact set of numbers as scores. They thus appear to be very precise. And people are enchanted with the appearance of precision. Many end-users do not fully take into account the concept of standard error of measurement — that a score represents not a pinpoint-accurate level but rather a fairly wide range of levels of possible achievement. End-users may take into account even less the standard error of estimate — that test scores predict only quite imprecisely the criteria they are supposed to predict.
- Similarity. To get to a position in which you make decisions about applicants, in part on the basis of their test scores, you had to do well enough yourself on the tests to get into that position of decision-making authority. The fundamental principle of interpersonal attraction is that we are attracted to others like ourselves. So the decision-makers select those who, like themselves, did well on the tests.
- Accountability. Admissions officers may be afraid that if they admit students with lower scores and the students do not perform well in college, it will make them — the admissions officers — look bad. If they admit students with high scores and the students do not do well, they always can blame the tests.
- Publication Mania. Colleges are evaluated by U.S. News & World Report and other media in part on the basis of standardized test scores. So the system becomes self-perpetuating: Colleges admit students to get high ratings, and to keep their high ratings (or to rank higher), they need to rely ever more on the tests.
- Sunk Costs. Society has invested so much in testing — whole admissions and scholarship systems are built around it -- that it is hard to admit that something that once, long ago, may have worked fairly well is today far from optimal.
- Rewards for True Believers. As is the case with pharmaceutical companies, those researchers and practitioners who show they are true believers or, at least, supporters, can reap from testing companies financial rewards through funding of research, research awards, prestigious committee memberships, invitations to conferences, and the like. (Disclosure: My colleagues and I were funded by the College Board for the Rainbow Project described below, and by the College Board and the Educational Testing Service for another project on Advanced Placement, funding for which we remain grateful; we could not have done the research without such funding. The results of the both projects were positive — showing that supplementary tests could provide substantial benefits -- and the results were originally published in leading journals -- Intelligence [one article] and Contemporary Educational Psychology [two articles], with secondary publications elsewhere. The College Board stopped funding the research once we got those positive results because, we were told, the research would not lend itself to upscaling. The research has since been upscaled using modified measures, with funding from other sources.)
- Superstition. College personnel may see that everyone or almost everyone who succeeds at their college has scores above a certain level. So they conclude that, to succeed, students need scores at or above that level. If they do not admit students with lower scores, they may never find out if the students with lower scores could have succeeded.
- Faith. To some people, standardized test are virtually a religion. They believe! They have faith in the tests and do not react well to and may become enraged by those who question their faith, although they might not admit this to themselves. Moreover, the true believers are organized and may become angry, en masse. For some of them, tests are measures of their value as a person: they are their test scores. Because these individuals view themselves as smart — they typically did well on the tests — they see their position as reasoned, not as based on faith.
- Free (for institutions)! Students pay for the tests, not colleges and universities, so the institutions get information at no cost.
- Tests Work to Some Degree. The standardized tests are predictive, at low to moderate levels, of many outcomes for many groups under fairly diverse circumstances. Whether they add statistically and practically significant prediction of college success over and above that provided by high school grade-point average is less clear and varies from one institution to another. End-users may decide the levels of prediction the tests achieve are good enough for their purposes.
There are sound reasons to wish for more than scores on the standardized tests as they now exist. The tests measure academic knowledge and analytical reasoning skills. They do not measure many of the skills that are truly important for success, broadly defined, in college and life: ethical reasoning, creative thinking, practical problem solving, leadership skills, emotional intelligence, motivation, initiative, self-discipline, ability to delay gratification, belief in self-modifiability, character, belief in one’s ability to succeed in one’s work, and the like. It is understandable that, when the tests were created in the early 20th century, these and other attributes would not be measured: psychologists did not yet know how reasonably to measure them.
Today we have come a long way, but nevertheless are using measures that are roughly the same as those used a century ago. Changes are mostly cosmetic. Imagine if contemporary medical tests still resembled those of the early 1900s! Tests of these other complementary constructs are not as well-developed as conventional tests of knowledge and analytical thinking — but tests of at least some of these other constructs could add to power of prediction beyond what one can obtain merely from conventional tests and potentially reduce ethnic and other group differences. They could provide useful supplements to existing measures.
In our own research on the Rainbow and Kaleidoscope Projects (conducted on participants across the country while I was IBM Professor of Psychology and Education at Yale University and then when I was dean of arts and sciences at Tufts University), my colleagues and I have shown in various refereed publications how tests of creative, practical, and wisdom-based skills can improve prediction of academic and extracurricular success above what one obtains from SAT or ACT alone, and also decrease ethnic-group differences. We found that such tests also increase applicant goodwill toward the assessment process, because applicants feel and report that the assessments measure more than narrow test-taking skills. These broader assessments make a public statement that a college values much more than the academic skills assessed by conventional standardized tests. At Oklahoma State University, we are implementing broader assessments of creative, analytical, practical, and wisdom-based skills in our Panorama Project to predict which undergraduate applicants have the leadership skills potentially to make the world a better place in which to live.
What’s to be done? First, we can dispense with the notion that testing companies represent some kind of "evil empire." They don’t. They are businesses like any other, and their primary end-goal is to make money. They are no better or worse than other typical large companies in the United States. For-profit companies use the profits in part to pay shareholders; nonprofits plow the money back into the business. But they all try to maximize their bottom line. If people keep using their tests as they now exist, test-makers will keep making the same old same old. Why should they change what they believe to be a winning game? And that is the basis for the solution.
Colleges must demand more. When the University of California system threatened to stop using the SAT unless changes were made, the College Board made the changes that the University of California (in particular, Richard Atkinson, at the time president of the UC system) wanted. The College Board presumably did not want to lose such a valuable customer. If end-users demanded more of testing companies, they would get more. If they take what they get and don’t act, they will continue to get what they are getting. Testing companies will change when their consumers demand change — the same as any other kind of company. Organizations, like individuals, can be modifiable if they need to be.
Most colleges are not as large as the California system. So they need to organize, at least, informally. Businesses listen to their customers, or they cease to be businesses. Testing companies are no exception. Like all businesses, what they listen to most is the metaphorical ringing of the cash register — the effect of customer actions on their bottom line. Customers can make all the differences, but they have to act, not just grouse. When they talk, companies may or may not listen; when they act collectively, companies listen. If the companies don’t change the tests, colleges should go test-optional, or introduce supplementary measures, or abandon the tests. It may require a year or two or three to change, but so what? Many universities have taken one or the other path, and they are still surviving and even thriving. The world will not end. There are other measures of academic skills that institutions can use, most notably school grades, which typically are the best single predictor of college academic success.
My hope, of course, is that the testing companies will act, but their past record is not encouraging. At the same time, they have not experienced much consumer pressure at the cash register, except in the past through the University of California system. They need to hear a message that is clear, direct, and that makes clear the consequences of their endless surface-structural change without deep-structural change.
There is one metaphorical fly in the ointment: Many people and organizations have a vested interest in testing as it now exists. They may write items for the tests, publish the tests, publish articles on the tests, feel comfortable using the tests, feel that their self-esteem is threatened if the tests are deemed not to be the measures of human worth they believe them to be, run test-preparation companies, etc. Some of them form a vocal "amen chorus" for tests as they now exist.
The existing standardized tests are quite decent for the kind of limited assessment they represent. But these tests are narrow and incomplete in what they measure. They show high correlations with socioeconomic status and IQ. They thus produce large socioeconomic (as well as ethnic and other) group disparities. Moreover, scores on the tests frequently are over-interpreted — they often are used as though they indicate far more about an applicant than they really do. These tests could be and should be supplemented with modules, chosen by students or colleges, measuring additional characteristics that are important for college and life success.
Why not get rid of the tests altogether? Because they provide potentially valuable, if limited, information. That is why they were created in the first place. The issue is not whether the tests are any good — they are. The issue is that our colleges, and our society, deserve better. And we can get better if we act to force the testing companies to give us the better products we need and deserve.
Robert J. Sternberg is provost, senior vice president, and Regents Professor of Psychology and Education at Oklahoma State University. He was president of the American Psychological Association in 2003 and is president-elect of the Federation of Associations of Brain and Behavioral Sciences.
Search for Jobs
Popular Job Categories