Many educators believe that standardized tests, such as the SAT and the ACT, do not fully measure the spectrum of skills that are relevant for college and life success, and yet, year after year, the same tests hang around with only cosmetic changes. What is to be done?
First, one needs to understand the persistence of standardized tests, and then one can address what can be done. Consider each in turn.
In the beginning of the 20th century, tests like the SAT made reasonably good sense. Most test-takers were upper-middle class or upper-class white males, and so there was little variation in socioeconomic status. Differences in test scores thus largely represented differences in achieved academic skills rather than differences in opportunities to achieve those skills. But as the testing population broadened, test scores more and more reflected differences in opportunity — socialization in the home, school quality, and willingness and ability of parents to invest in the education of their children.
Standardized tests have great staying power for several reasons, described in more detail in my 2010 book, College Admissions for the 21st Century. Here are some of them:
The Appearance of Quantitative Precision. The tests give an exact set of numbers as scores. They thus appear to be very precise. And people are enchanted with the appearance of precision. Many end-users do not fully take into account the concept of standard error of measurement — that a score represents not a pinpoint-accurate level but rather a fairly wide range of levels of possible achievement. End-users may take into account even less the standard error of estimate — that test scores predict only quite imprecisely the criteria they are supposed to predict.
Similarity. To get to a position in which you make decisions about applicants, in part on the basis of their test scores, you had to do well enough yourself on the tests to get into that position of decision-making authority. The fundamental principle of interpersonal attraction is that we are attracted to others like ourselves. So the decision-makers select those who, like themselves, did well on the tests.
Accountability. Admissions officers may be afraid that if they admit students with lower scores and the students do not perform well in college, it will make them — the admissions officers — look bad. If they admit students with high scores and the students do not do well, they always can blame the tests.
Publication Mania. Colleges are evaluated by U.S. News & World Report and other media in part on the basis of standardized test scores. So the system becomes self-perpetuating: Colleges admit students to get high ratings, and to keep their high ratings (or to rank higher), they need to rely ever more on the tests.
Sunk Costs. Society has invested so much in testing — whole admissions and scholarship systems are built around it -- that it is hard to admit that something that once, long ago, may have worked fairly well is today far from optimal.
Rewards for True Believers. As is the case with pharmaceutical companies, those researchers and practitioners who show they are true believers or, at least, supporters, can reap from testing companies financial rewards through funding of research, research awards, prestigious committee memberships, invitations to conferences, and the like. (Disclosure: My colleagues and I were funded by the College Board for the Rainbow Project described below, and by the College Board and the Educational Testing Service for another project on Advanced Placement, funding for which we remain grateful; we could not have done the research without such funding. The results of the both projects were positive — showing that supplementary tests could provide substantial benefits -- and the results were originally published in leading journals -- Intelligence [one article] and Contemporary Educational Psychology [two articles], with secondary publications elsewhere. The College Board stopped funding the research once we got those positive results because, we were told, the research would not lend itself to upscaling. The research has since been upscaled using modified measures, with funding from other sources.)
Superstition. College personnel may see that everyone or almost everyone who succeeds at their college has scores above a certain level. So they conclude that, to succeed, students need scores at or above that level. If they do not admit students with lower scores, they may never find out if the students with lower scores could have succeeded.
Faith. To some people, standardized test are virtually a religion. They believe! They have faith in the tests and do not react well to and may become enraged by those who question their faith, although they might not admit this to themselves. Moreover, the true believers are organized and may become angry, en masse. For some of them, tests are measures of their value as a person: they are their test scores. Because these individuals view themselves as smart — they typically did well on the tests — they see their position as reasoned, not as based on faith.
Free (for institutions)! Students pay for the tests, not colleges and universities, so the institutions get information at no cost.
Tests Work to Some Degree. The standardized tests are predictive, at low to moderate levels, of many outcomes for many groups under fairly diverse circumstances. Whether they add statistically and practically significant prediction of college success over and above that provided by high school grade-point average is less clear and varies from one institution to another. End-users may decide the levels of prediction the tests achieve are good enough for their purposes.
There are sound reasons to wish for more than scores on the standardized tests as they now exist. The tests measure academic knowledge and analytical reasoning skills. They do not measure many of the skills that are truly important for success, broadly defined, in college and life: ethical reasoning, creative thinking, practical problem solving, leadership skills, emotional intelligence, motivation, initiative, self-discipline, ability to delay gratification, belief in self-modifiability, character, belief in one’s ability to succeed in one’s work, and the like. It is understandable that, when the tests were created in the early 20th century, these and other attributes would not be measured: psychologists did not yet know how reasonably to measure them.
Today we have come a long way, but nevertheless are using measures that are roughly the same as those used a century ago. Changes are mostly cosmetic. Imagine if contemporary medical tests still resembled those of the early 1900s! Tests of these other complementary constructs are not as well-developed as conventional tests of knowledge and analytical thinking — but tests of at least some of these other constructs could add to power of prediction beyond what one can obtain merely from conventional tests and potentially reduce ethnic and other group differences. They could provide useful supplements to existing measures.
In our own research on the Rainbow and Kaleidoscope Projects (conducted on participants across the country while I was IBM Professor of Psychology and Education at Yale University and then when I was dean of arts and sciences at Tufts University), my colleagues and I have shown in various refereed publications how tests of creative, practical, and wisdom-based skills can improve prediction of academic and extracurricular success above what one obtains from SAT or ACT alone, and also decrease ethnic-group differences. We found that such tests also increase applicant goodwill toward the assessment process, because applicants feel and report that the assessments measure more than narrow test-taking skills. These broader assessments make a public statement that a college values much more than the academic skills assessed by conventional standardized tests. At Oklahoma State University, we are implementing broader assessments of creative, analytical, practical, and wisdom-based skills in our Panorama Project to predict which undergraduate applicants have the leadership skills potentially to make the world a better place in which to live.
What’s to be done? First, we can dispense with the notion that testing companies represent some kind of "evil empire." They don’t. They are businesses like any other, and their primary end-goal is to make money. They are no better or worse than other typical large companies in the United States. For-profit companies use the profits in part to pay shareholders; nonprofits plow the money back into the business. But they all try to maximize their bottom line. If people keep using their tests as they now exist, test-makers will keep making the same old same old. Why should they change what they believe to be a winning game? And that is the basis for the solution.
Colleges must demand more. When the University of California system threatened to stop using the SAT unless changes were made, the College Board made the changes that the University of California (in particular, Richard Atkinson, at the time president of the UC system) wanted. The College Board presumably did not want to lose such a valuable customer. If end-users demanded more of testing companies, they would get more. If they take what they get and don’t act, they will continue to get what they are getting. Testing companies will change when their consumers demand change — the same as any other kind of company. Organizations, like individuals, can be modifiable if they need to be.
Most colleges are not as large as the California system. So they need to organize, at least, informally. Businesses listen to their customers, or they cease to be businesses. Testing companies are no exception. Like all businesses, what they listen to most is the metaphorical ringing of the cash register — the effect of customer actions on their bottom line. Customers can make all the differences, but they have to act, not just grouse. When they talk, companies may or may not listen; when they act collectively, companies listen. If the companies don’t change the tests, colleges should go test-optional, or introduce supplementary measures, or abandon the tests. It may require a year or two or three to change, but so what? Many universities have taken one or the other path, and they are still surviving and even thriving. The world will not end. There are other measures of academic skills that institutions can use, most notably school grades, which typically are the best single predictor of college academic success.
My hope, of course, is that the testing companies will act, but their past record is not encouraging. At the same time, they have not experienced much consumer pressure at the cash register, except in the past through the University of California system. They need to hear a message that is clear, direct, and that makes clear the consequences of their endless surface-structural change without deep-structural change.
There is one metaphorical fly in the ointment: Many people and organizations have a vested interest in testing as it now exists. They may write items for the tests, publish the tests, publish articles on the tests, feel comfortable using the tests, feel that their self-esteem is threatened if the tests are deemed not to be the measures of human worth they believe them to be, run test-preparation companies, etc. Some of them form a vocal "amen chorus" for tests as they now exist.
The existing standardized tests are quite decent for the kind of limited assessment they represent. But these tests are narrow and incomplete in what they measure. They show high correlations with socioeconomic status and IQ. They thus produce large socioeconomic (as well as ethnic and other) group disparities. Moreover, scores on the tests frequently are over-interpreted — they often are used as though they indicate far more about an applicant than they really do. These tests could be and should be supplemented with modules, chosen by students or colleges, measuring additional characteristics that are important for college and life success.
Why not get rid of the tests altogether? Because they provide potentially valuable, if limited, information. That is why they were created in the first place. The issue is not whether the tests are any good — they are. The issue is that our colleges, and our society, deserve better. And we can get better if we act to force the testing companies to give us the better products we need and deserve.
Robert J. Sternberg
Robert J. Sternberg is provost, senior vice president, and Regents Professor of Psychology and Education at Oklahoma State University. He was president of the American Psychological Association in 2003 and is president-elect of the Federation of Associations of Brain and Behavioral Sciences.
Submitted by Ryan Craig on October 11, 2011 - 3:00am
If you’ve seen “Moneyball,” the new baseball film about the unlikely success of the Oakland A’s and their out-of-the-box-thinking general manager, Billy Beane, you may have already drawn parallels to the current state of higher education. If not, we’re pleased to do it for you.
Early in "Moneyball" there’s a funny scene of Billy sitting around a table with his scouts, wise old men of America’s pastime. The scouts jaw on about players’ arms, legs and bodies and their potential. One scout insists that an ugly girlfriend means that a player doesn’t have confidence. The scouts are entranced by the obvious. And when it comes to metrics, the scouts focus on what’s easy to measure. The scouts love high school pitchers: “High school pitchers had brand-new arms, and brand-new arms were able to generate the one asset scouts could measure: a fastball’s velocity,” Michael Lewis writes in the book on which the movie was based.
But Billy isn’t fooled. He decides to bring data to the table in the form of Peter Brand, a Yalie with an economics degree and a statistics-spewing laptop ready at hand.
It turns out that high school pitchers are much less likely to go on to successful major league careers than are comparable pitchers who have attended college. And when you try to correlate a range of statistics to runs scored, batting average is a poor indicator, whereas on-base percentage (OBP) is highly correlated. So Billy and the A’s eschew high school pitchers and focus on OBP; the A’s begin to value and acquire players with a knack for getting on base any way they can, especially by taking walks.
The result, chronicled in the entertaining film based on Lewis's book, is an unlikely group of major leaguers who, during the 2002 season, win 20 games in a row -- still a record -- and make the playoffs.
“My only question is if he’s that good a hitter, why doesn’t he hit better?”
-- Billy Beane
Like baseball 10 years ago, higher education is focused on what’s easy to measure. For baseball it may have been body parts, batting averages and the numbers on the radar gun. For higher education, it’s the 3Rs: research, rankings and real estate. Each of these areas is easily quantified or judged: research citations or number of publications in Nature and Science;U.S. News ranking (or colleges choose from a plethora of new entrants to the ranking game, including the international ranking by Shanghai Jiao Tong University); and in terms of real estate, how much has been spent on a new building and how stately, innovative and generally impressive it appears.
Unfortunately, the 3Rs correlate about as closely to student learning and student outcomes as batting average or fastball velocity, which is to say, not at all. Buildings are the “ugly girlfriend” of higher education.
Universities that continue to focus on the 3Rs in the wake of the seismic shifts currently roiling higher education (state budget cuts, increased sticker shock, technology-based learning) are either not serious about improving student learning and student outcomes, or they’re like the baseball fan who has lost her car keys in the stadium parking lot at night. Where does she look for them? Not where she lost them, but under the light because that’s where she can see.
“A young player is not what he looks like, or what he might become, but what he has done.”
-- Billy Beane
Similarly, a university is not what its buildings look like, or what its reputation or rankings say, but what it has done. And by done, we don’t mean research. The link between research and instructional efficacy is unproven at best. We define instruction of students to mean producing measurable outcomes in terms of student learning and employment.
The first step will be to get the data; before we find the Billy Beane of higher education, we first need to find Bill James. With his famous Baseball Abstract, Bill James revolutionized how data was tracked, and which metrics were most important to the success of teams and individual players. James jump-started a movement, called sabermetrics, that collected data that had never before been systematically collected: the pitch count at the end of at-bats, pitch types and locations, the direction and distance of batted balls.
A report issued last month by Complete College America, an organization funded by the Bill & Melinda Gates Foundation and the Lumina Foundation for Education, demonstrates just how ripe higher education is for sabermetrics. While the report was sobering in the data it did present (e.g., of every 100 students who enroll in a public college in Texas, 79 enroll in a community college -- of these 79, only seven have completed a program in four years’ time), more fundamental are the huge holes in the data – larger than the holes in the Houston Astros infield! According to Stan Jones, president of Complete College America, the data are incomplete because students who enroll part-time or who transfer are not tracked: “We know they enroll, but we don’t know what happens to them,” he said. “We shouldn’t make policy based on the image of students going straight from high school to college, living on campus, and graduating four years later, when the majority of college students don’t do that.”
“The great thing about college players: they had meaningful stats. They played a lot more games, against stiffer competition, than high school players. The sample size of their relevant statistics was larger, and therefore a more accurate reflection of some underlying reality. You could project college players with greater certainty than you could project high school players.”
-- Michael Lewis, Moneyball
How ironic that we may be doing a better job gathering baseball statistics at colleges than we are at gathering education statistics. It is essential that we begin to track persistence data on part-time and transfer students on a systematic basis. The Department of Education should lead this initiative. Failing that, Gates, Lumina and others undoubtedly will pick up the slack.
Just as the Moneyball approach has narrowed the gap between teams with $40 million payrolls and teams with payrolls three times higher (see, e.g., Tampa Bay Rays storming back in the month of September and taking the American League wild card berth away from Boston with a payroll of $41 million, 25 percent of the Red Sox payroll), finding and tracking the OBP of higher education will do the same for data-driven institutions of all stripes, including those that do not receive state subsidies, and those that pay taxes.
With the right data, dozens of would-be Billy Beanes will spring up across the country arguing what the on-base percentage equivalent for higher education is, coalescing on persistence and completion metrics that are meaningful for all students (i.e., traditional/adult, full-time/part-time, on-ground/online) and helping their institutions reform and restructure to increase “wins.”