Many educators believe that standardized tests, such as the SAT and the ACT, do not fully measure the spectrum of skills that are relevant for college and life success, and yet, year after year, the same tests hang around with only cosmetic changes. What is to be done?
First, one needs to understand the persistence of standardized tests, and then one can address what can be done. Consider each in turn.
In the beginning of the 20th century, tests like the SAT made reasonably good sense. Most test-takers were upper-middle class or upper-class white males, and so there was little variation in socioeconomic status. Differences in test scores thus largely represented differences in achieved academic skills rather than differences in opportunities to achieve those skills. But as the testing population broadened, test scores more and more reflected differences in opportunity — socialization in the home, school quality, and willingness and ability of parents to invest in the education of their children.
Standardized tests have great staying power for several reasons, described in more detail in my 2010 book, College Admissions for the 21st Century. Here are some of them:
The Appearance of Quantitative Precision. The tests give an exact set of numbers as scores. They thus appear to be very precise. And people are enchanted with the appearance of precision. Many end-users do not fully take into account the concept of standard error of measurement — that a score represents not a pinpoint-accurate level but rather a fairly wide range of levels of possible achievement. End-users may take into account even less the standard error of estimate — that test scores predict only quite imprecisely the criteria they are supposed to predict.
Similarity. To get to a position in which you make decisions about applicants, in part on the basis of their test scores, you had to do well enough yourself on the tests to get into that position of decision-making authority. The fundamental principle of interpersonal attraction is that we are attracted to others like ourselves. So the decision-makers select those who, like themselves, did well on the tests.
Accountability. Admissions officers may be afraid that if they admit students with lower scores and the students do not perform well in college, it will make them — the admissions officers — look bad. If they admit students with high scores and the students do not do well, they always can blame the tests.
Publication Mania. Colleges are evaluated by U.S. News & World Report and other media in part on the basis of standardized test scores. So the system becomes self-perpetuating: Colleges admit students to get high ratings, and to keep their high ratings (or to rank higher), they need to rely ever more on the tests.
Sunk Costs. Society has invested so much in testing — whole admissions and scholarship systems are built around it -- that it is hard to admit that something that once, long ago, may have worked fairly well is today far from optimal.
Rewards for True Believers. As is the case with pharmaceutical companies, those researchers and practitioners who show they are true believers or, at least, supporters, can reap from testing companies financial rewards through funding of research, research awards, prestigious committee memberships, invitations to conferences, and the like. (Disclosure: My colleagues and I were funded by the College Board for the Rainbow Project described below, and by the College Board and the Educational Testing Service for another project on Advanced Placement, funding for which we remain grateful; we could not have done the research without such funding. The results of the both projects were positive — showing that supplementary tests could provide substantial benefits -- and the results were originally published in leading journals -- Intelligence [one article] and Contemporary Educational Psychology [two articles], with secondary publications elsewhere. The College Board stopped funding the research once we got those positive results because, we were told, the research would not lend itself to upscaling. The research has since been upscaled using modified measures, with funding from other sources.)
Superstition. College personnel may see that everyone or almost everyone who succeeds at their college has scores above a certain level. So they conclude that, to succeed, students need scores at or above that level. If they do not admit students with lower scores, they may never find out if the students with lower scores could have succeeded.
Faith. To some people, standardized test are virtually a religion. They believe! They have faith in the tests and do not react well to and may become enraged by those who question their faith, although they might not admit this to themselves. Moreover, the true believers are organized and may become angry, en masse. For some of them, tests are measures of their value as a person: they are their test scores. Because these individuals view themselves as smart — they typically did well on the tests — they see their position as reasoned, not as based on faith.
Free (for institutions)! Students pay for the tests, not colleges and universities, so the institutions get information at no cost.
Tests Work to Some Degree. The standardized tests are predictive, at low to moderate levels, of many outcomes for many groups under fairly diverse circumstances. Whether they add statistically and practically significant prediction of college success over and above that provided by high school grade-point average is less clear and varies from one institution to another. End-users may decide the levels of prediction the tests achieve are good enough for their purposes.
There are sound reasons to wish for more than scores on the standardized tests as they now exist. The tests measure academic knowledge and analytical reasoning skills. They do not measure many of the skills that are truly important for success, broadly defined, in college and life: ethical reasoning, creative thinking, practical problem solving, leadership skills, emotional intelligence, motivation, initiative, self-discipline, ability to delay gratification, belief in self-modifiability, character, belief in one’s ability to succeed in one’s work, and the like. It is understandable that, when the tests were created in the early 20th century, these and other attributes would not be measured: psychologists did not yet know how reasonably to measure them.
Today we have come a long way, but nevertheless are using measures that are roughly the same as those used a century ago. Changes are mostly cosmetic. Imagine if contemporary medical tests still resembled those of the early 1900s! Tests of these other complementary constructs are not as well-developed as conventional tests of knowledge and analytical thinking — but tests of at least some of these other constructs could add to power of prediction beyond what one can obtain merely from conventional tests and potentially reduce ethnic and other group differences. They could provide useful supplements to existing measures.
In our own research on the Rainbow and Kaleidoscope Projects (conducted on participants across the country while I was IBM Professor of Psychology and Education at Yale University and then when I was dean of arts and sciences at Tufts University), my colleagues and I have shown in various refereed publications how tests of creative, practical, and wisdom-based skills can improve prediction of academic and extracurricular success above what one obtains from SAT or ACT alone, and also decrease ethnic-group differences. We found that such tests also increase applicant goodwill toward the assessment process, because applicants feel and report that the assessments measure more than narrow test-taking skills. These broader assessments make a public statement that a college values much more than the academic skills assessed by conventional standardized tests. At Oklahoma State University, we are implementing broader assessments of creative, analytical, practical, and wisdom-based skills in our Panorama Project to predict which undergraduate applicants have the leadership skills potentially to make the world a better place in which to live.
What’s to be done? First, we can dispense with the notion that testing companies represent some kind of "evil empire." They don’t. They are businesses like any other, and their primary end-goal is to make money. They are no better or worse than other typical large companies in the United States. For-profit companies use the profits in part to pay shareholders; nonprofits plow the money back into the business. But they all try to maximize their bottom line. If people keep using their tests as they now exist, test-makers will keep making the same old same old. Why should they change what they believe to be a winning game? And that is the basis for the solution.
Colleges must demand more. When the University of California system threatened to stop using the SAT unless changes were made, the College Board made the changes that the University of California (in particular, Richard Atkinson, at the time president of the UC system) wanted. The College Board presumably did not want to lose such a valuable customer. If end-users demanded more of testing companies, they would get more. If they take what they get and don’t act, they will continue to get what they are getting. Testing companies will change when their consumers demand change — the same as any other kind of company. Organizations, like individuals, can be modifiable if they need to be.
Most colleges are not as large as the California system. So they need to organize, at least, informally. Businesses listen to their customers, or they cease to be businesses. Testing companies are no exception. Like all businesses, what they listen to most is the metaphorical ringing of the cash register — the effect of customer actions on their bottom line. Customers can make all the differences, but they have to act, not just grouse. When they talk, companies may or may not listen; when they act collectively, companies listen. If the companies don’t change the tests, colleges should go test-optional, or introduce supplementary measures, or abandon the tests. It may require a year or two or three to change, but so what? Many universities have taken one or the other path, and they are still surviving and even thriving. The world will not end. There are other measures of academic skills that institutions can use, most notably school grades, which typically are the best single predictor of college academic success.
My hope, of course, is that the testing companies will act, but their past record is not encouraging. At the same time, they have not experienced much consumer pressure at the cash register, except in the past through the University of California system. They need to hear a message that is clear, direct, and that makes clear the consequences of their endless surface-structural change without deep-structural change.
There is one metaphorical fly in the ointment: Many people and organizations have a vested interest in testing as it now exists. They may write items for the tests, publish the tests, publish articles on the tests, feel comfortable using the tests, feel that their self-esteem is threatened if the tests are deemed not to be the measures of human worth they believe them to be, run test-preparation companies, etc. Some of them form a vocal "amen chorus" for tests as they now exist.
The existing standardized tests are quite decent for the kind of limited assessment they represent. But these tests are narrow and incomplete in what they measure. They show high correlations with socioeconomic status and IQ. They thus produce large socioeconomic (as well as ethnic and other) group disparities. Moreover, scores on the tests frequently are over-interpreted — they often are used as though they indicate far more about an applicant than they really do. These tests could be and should be supplemented with modules, chosen by students or colleges, measuring additional characteristics that are important for college and life success.
Why not get rid of the tests altogether? Because they provide potentially valuable, if limited, information. That is why they were created in the first place. The issue is not whether the tests are any good — they are. The issue is that our colleges, and our society, deserve better. And we can get better if we act to force the testing companies to give us the better products we need and deserve.
Robert J. Sternberg
Robert J. Sternberg is provost, senior vice president, and Regents Professor of Psychology and Education at Oklahoma State University. He was president of the American Psychological Association in 2003 and is president-elect of the Federation of Associations of Brain and Behavioral Sciences.
Submitted by Ryan Craig on October 11, 2011 - 3:00am
If you’ve seen “Moneyball,” the new baseball film about the unlikely success of the Oakland A’s and their out-of-the-box-thinking general manager, Billy Beane, you may have already drawn parallels to the current state of higher education. If not, we’re pleased to do it for you.
Early in "Moneyball" there’s a funny scene of Billy sitting around a table with his scouts, wise old men of America’s pastime. The scouts jaw on about players’ arms, legs and bodies and their potential. One scout insists that an ugly girlfriend means that a player doesn’t have confidence. The scouts are entranced by the obvious. And when it comes to metrics, the scouts focus on what’s easy to measure. The scouts love high school pitchers: “High school pitchers had brand-new arms, and brand-new arms were able to generate the one asset scouts could measure: a fastball’s velocity,” Michael Lewis writes in the book on which the movie was based.
But Billy isn’t fooled. He decides to bring data to the table in the form of Peter Brand, a Yalie with an economics degree and a statistics-spewing laptop ready at hand.
It turns out that high school pitchers are much less likely to go on to successful major league careers than are comparable pitchers who have attended college. And when you try to correlate a range of statistics to runs scored, batting average is a poor indicator, whereas on-base percentage (OBP) is highly correlated. So Billy and the A’s eschew high school pitchers and focus on OBP; the A’s begin to value and acquire players with a knack for getting on base any way they can, especially by taking walks.
The result, chronicled in the entertaining film based on Lewis's book, is an unlikely group of major leaguers who, during the 2002 season, win 20 games in a row -- still a record -- and make the playoffs.
“My only question is if he’s that good a hitter, why doesn’t he hit better?”
-- Billy Beane
Like baseball 10 years ago, higher education is focused on what’s easy to measure. For baseball it may have been body parts, batting averages and the numbers on the radar gun. For higher education, it’s the 3Rs: research, rankings and real estate. Each of these areas is easily quantified or judged: research citations or number of publications in Nature and Science;U.S. News ranking (or colleges choose from a plethora of new entrants to the ranking game, including the international ranking by Shanghai Jiao Tong University); and in terms of real estate, how much has been spent on a new building and how stately, innovative and generally impressive it appears.
Unfortunately, the 3Rs correlate about as closely to student learning and student outcomes as batting average or fastball velocity, which is to say, not at all. Buildings are the “ugly girlfriend” of higher education.
Universities that continue to focus on the 3Rs in the wake of the seismic shifts currently roiling higher education (state budget cuts, increased sticker shock, technology-based learning) are either not serious about improving student learning and student outcomes, or they’re like the baseball fan who has lost her car keys in the stadium parking lot at night. Where does she look for them? Not where she lost them, but under the light because that’s where she can see.
“A young player is not what he looks like, or what he might become, but what he has done.”
-- Billy Beane
Similarly, a university is not what its buildings look like, or what its reputation or rankings say, but what it has done. And by done, we don’t mean research. The link between research and instructional efficacy is unproven at best. We define instruction of students to mean producing measurable outcomes in terms of student learning and employment.
The first step will be to get the data; before we find the Billy Beane of higher education, we first need to find Bill James. With his famous Baseball Abstract, Bill James revolutionized how data was tracked, and which metrics were most important to the success of teams and individual players. James jump-started a movement, called sabermetrics, that collected data that had never before been systematically collected: the pitch count at the end of at-bats, pitch types and locations, the direction and distance of batted balls.
A report issued last month by Complete College America, an organization funded by the Bill & Melinda Gates Foundation and the Lumina Foundation for Education, demonstrates just how ripe higher education is for sabermetrics. While the report was sobering in the data it did present (e.g., of every 100 students who enroll in a public college in Texas, 79 enroll in a community college -- of these 79, only seven have completed a program in four years’ time), more fundamental are the huge holes in the data – larger than the holes in the Houston Astros infield! According to Stan Jones, president of Complete College America, the data are incomplete because students who enroll part-time or who transfer are not tracked: “We know they enroll, but we don’t know what happens to them,” he said. “We shouldn’t make policy based on the image of students going straight from high school to college, living on campus, and graduating four years later, when the majority of college students don’t do that.”
“The great thing about college players: they had meaningful stats. They played a lot more games, against stiffer competition, than high school players. The sample size of their relevant statistics was larger, and therefore a more accurate reflection of some underlying reality. You could project college players with greater certainty than you could project high school players.”
-- Michael Lewis, Moneyball
How ironic that we may be doing a better job gathering baseball statistics at colleges than we are at gathering education statistics. It is essential that we begin to track persistence data on part-time and transfer students on a systematic basis. The Department of Education should lead this initiative. Failing that, Gates, Lumina and others undoubtedly will pick up the slack.
Just as the Moneyball approach has narrowed the gap between teams with $40 million payrolls and teams with payrolls three times higher (see, e.g., Tampa Bay Rays storming back in the month of September and taking the American League wild card berth away from Boston with a payroll of $41 million, 25 percent of the Red Sox payroll), finding and tracking the OBP of higher education will do the same for data-driven institutions of all stripes, including those that do not receive state subsidies, and those that pay taxes.
With the right data, dozens of would-be Billy Beanes will spring up across the country arguing what the on-base percentage equivalent for higher education is, coalescing on persistence and completion metrics that are meaningful for all students (i.e., traditional/adult, full-time/part-time, on-ground/online) and helping their institutions reform and restructure to increase “wins.”
Much attention has been directed at college completion rates in the past two years, since President Obama announced his goal that the United States will again lead the world with the highest proportion of college graduates by 2020. The most recent contribution to this dialogue was last month’s release of "Time Is the Enemy"by Complete College America.
Much in the introduction to this report is welcome. Expanding completion rate reporting to include part-time students, recognizing that more students are juggling employment and family responsibilities with college, acknowledging that many come to college unprepared for college-level work -- such awareness should inform our policy choices. All in higher education share the desire expressed by Complete College America that more students complete their programs, and do so in less time.
The graduation rates for two-year institutions included in "Time Is the Enemy" show, however, just how inadequate our current measures are for assessing community college student degree progress -- a shortfall also acknowledged by the appointment of the federal Committee on Measures of Student Success, which is charged with making recommendations to the U.S. education secretary by April. Our current national completion measures for community colleges underestimate the true progress of students, presenting a misleading picture of the performance of these open-admissions institutions.
The following suggestions might inform a new set of national metrics for assessing student performance at two-year institutions.
Completion Rates for Community Colleges Should Include Transfers to Baccalaureate Institutions. Although community colleges usually advise students aiming for a bachelor’s degree to complete their associate degree before transferring, to reap the benefits of additional tuition savings and attain a credential, transferring before attaining the associate degree is, for many students, a rational decision. Accepting admission and assimilating into competitive baccalaureate programs and institutions, establishing mentorships with professors in the intended baccalaureate major, or embracing the residential college experience may all lead students to transfer before completing the associate degree. In addition, for a variety of reasons, universities may delay admission of incoming freshmen to the spring semester and advise them to start in the fall at a community college. These students are not seeking degrees at the community college, and will transfer after one semester. Thus, for two-year institutions, preparing students for transfer to a four-year institution should be considered an outcome as favorable as a student earning an associate degree.
The appropriate completion measure for community colleges is a combined graduation-transfer rate. The preferred metric is the percentage of students in the initial cohort who have graduated and/or transferred to a four-year institution. It is important to include transfers to out-of-state institutions in these calculations. In Maryland, a fourth of the community college transfers to baccalaureate institutions enroll in colleges and universities outside of Maryland. Reliance on state reporting systems that do not utilize national databases such as the National Student Clearinghouse to report this metric results in serious underestimates of student success. The need to track transfers across state lines is a major reason for the so-far-unsuccessful push for a national unit record system.
Comparisons of completion rates at community colleges and four-year institutions, where transfer is not included in the community college measure, are inappropriate. Reports such as "Time Is the Enemy" that report graduation rates for community colleges, with table labels such as “Associate Degree-seeking Students,” are misleading in that these calculations include many students who are pursuing baccalaureate transfer programs with no intention of earning the associate.
Completion Rate Calculations Should Exclude Students Not Seeking Degrees. Community colleges serve many students not seeking a college degree, and these students should be excluded from the calculation of completion rates. A student’s stated intent at entry is not adequate to identify degree-seekers, since students may be uncertain about their goals and goals may change. Enrollment in a degree program is not adequate, since students without a degree goal must declare a program in order to be eligible for financial aid, and many colleges force students to choose a major in order to gather gauge student interest for advising purposes.
A better way to define degree-seeking status is based on student behavior. Have students demonstrated pursuit of a degree by enrolling in more than two or three classes? A minimum number of attempted hours is the preferred way of defining the cohort to study. In Maryland, to be included in the denominator of graduation-transfer rates, a student must attempt at least 18 hours within two years of entry. Hours in developmental or remedial courses are included. This way of defining the cohort has several benefits. It does not exclude students beginning as part-time students, as IPEDS does. It eliminates transient students with short-term job skill enhancement or personal enrichment motives. By using attempted hours as the threshold, rather than earned credits as in some other states, this definition does not bias the sample toward success. Students who fail all their courses and earn zero credits will still be in the cohort if they have attempted 18 hours. And finally, it seems reasonable that students show some evidence of effort to persist if institutions are to be held accountable for their degree attainment.
Recognize that Community College Students Who Start Full-time Typically Do Not Remain Full-time. A number of studies suggest that the majority of community college students initially enrolling full-time switch to part-time attendance. This contrasts with students at most four-year institutions, who start and remain full-time. For example, 52 percent of students at community colleges that participate in the Achieving the Dream project began as full-time students. Yet only 31 percent attended full-time for the entire first year. Studies of Florida’s community colleges find similar results. Most students end up with a combination of full-time and part-time attendance, regardless of their initial status. Among students enrolled at least three additional semesters, only 30 percent of Florida’s “full-time” community college students enrolled full-time every semester. As a Florida College System report concludes, “Expecting a ‘full-time’ student to complete an associate degree in two years or even three assumes that the student remains full-time and this is most often not the case. As a result, students will progress at rates slower than assumed by models that consider initial full-time students to be full-time throughout their time in college.” Thus, comparisons of completion rates at 2-year and 4-year institutions, even controlling for full-time status in the first semester, are misleading. Studies at my college suggest that completion rates of community college students who start full-time and continuously attend full-time without interruption are comparable to completion rates attained at many four-year institutions.
Extend the Time for Assessing Completion to at least Six Years. “Normal Time” to completion excludes most associate degree completers.Due to part-time attendance, interrupted studies, and the need to complete remedial education, most associate degree graduates take more than three years to complete. Completion rates calculated at the end of three or four years will undercount true completion. It is not uncommon for a third of associate degree completers to take more than four years to complete their degree. At my institution, fully 5 percent of our associate degree recipients take 10 or more years to complete their “two-year” degree. These students are not failures; they are heroes. Yes, we would all like students to complete their degrees more quickly. But if life circumstances dictate a slower pace, let us support these students in their remarkable persistence. And, in our accountability reporting, recognize that our completion rate statistics are time-bound and fail to account for all who will eventually succeed in their degree pursuit.
When Comparing Completion Rates, Compare Institutions with Similar Students. Differences in completion rates among institutions largely reflect differences in student populations.Community college students who are similar to students at four-year institutions in academic preparation, and in their ability to consistently attend full-time, achieve completion rates comparable to those at many four-year institutions. In Maryland, if you include transfer as a community college completion, community colleges have four-year completion rates equal or higher than the eight-year bachelor’s degree graduation rates at a majority of the state’s four-year institutions with open or low-selectivity admissions. And the completion rate of college-ready community college students -- those not needing developmental education — is similar to all but the most selective four-year schools. At my college, 88 percent of the students in our honors program have graduated with an associate degree in two years. This graduation rate is comparable with that of Johns Hopkins and above that of the flagship University of Maryland at College Park.
Students at four-year institutions who are similar in profile to the typical community college student have completion rates similar to those attained at community colleges. This is not a new finding. A March 1996 report, "Beginning Postsecondary Students: Five Years Later," identified the following “risk factors” affecting bachelor’s degree completion: delayed enrollment in higher education, being a GED recipient, being financially independent, having children, being a single parent, attending part-time, and working full-time while enrolled. Fifty-four percent of the students who had none of these risk factors earned the bachelor’s degree within five years. The graduation rate for students with just one of these risk factors fell to 42 percent. For students with two risk factors the bachelor’s degree graduation rate was 21 percent, and for those with three or more the graduation rate was 13 percent.
Readers of this essay who work at community colleges are probably smiling to themselves. For most community colleges, the majority, if not the overwhelming majority, of students are coping with several of these risk factors. And this list does not account for the need of most community college students for developmental or remedial education. The comparability of completion rates at two- and four-year institutions, when student characteristics are controlled for, should not be a surprising finding.
If we must compare completion rates, it is incumbent upon analysts to account for differences in the academic preparation and life circumstances of student populations. This can be done by sophisticated statistical analysis, or in the selection of peer groups of institutions with similar admissions policies and student body demographics.
Support Hopeful Signs at the Federal Level. The work to date of the Committee on Measures of Student Success authorized by the Higher Education Act of 2008 is encouraging. The committee is to make recommendations to the Secretary of Education by April 2012 regarding the accurate reporting of completion rates for community colleges.
A number of the recommendations in the committee’s draft report issued September 2, 2011 would greatly improve reporting of completion statistics for community colleges:
Defining the degree-seeking cohort for calculating completion rates by looking at student behavior, such as a threshold number of hours attempted.
Recognizing that “preparing students for transfer to a four-year institution is an equally positive outcome as a student earning an associate’s degree.”
Reporting a combined graduation-transfer rate as the primary outcome measure for degree-seeking students.
Creating an interim, persistence measure combining lateral transfer with retention at the initial institution.
These recommendations show an understanding of the student populations served by community colleges. Inclusion of these definitions and measures in federal IPEDS reporting would provide more meaningful peer, state, and national benchmarks for all community colleges.