The Flaw of Overall Rankings

January 24, 2011

Many college administrators are uncomfortable with rankings of colleges and universities, such as those found in U.S. News & World Report. Perhaps they don’t like the idea of measuring the quality of an institution of higher learning, or they don’t like the way the measurements are done. But from a psychological point of view — psychology is my field — there is a more fundamental problem. Overall rankings obscure what is most interesting about an institution. Consider an analogy to the assessment of human intellectual qualities.

In 1904, Charles Spearman, a British psychologist, proposed that quality of mind, at least as characterized by human intelligence, could be summarized as a single attribute, which he referred to as "general ability," or g. His assertion was based on his observation that various tests of quality of mind — for example, verbal, mathematical, spatial — correlated positively with each other, suggesting to him that they were different measures of the same thing, except for the relatively uninteresting aspects of thinking that were wholly particular to each kind of test.

Spearman’s view was eventually challenged. By 1938, an American psychologist, Louis Thurstone, suggested that Spearman’s view was an oversimplification — that the more variegated qualities actually were important in their own right. Thurstone labeled qualities such as verbal ability, mathematical ability, and spatial ability as "primary mental abilities." For example, you might care more about verbal ability for an English major or future journalist or novelist, more about mathematical ability for a finance major or future accountant or actuary, and more about spatial ability for an engineering major or future civil engineer or air-traffic controller. It might be nice to have an air-traffic controller with a good command of the English language, but in the end, what passengers and airport officials likely most care about is whether the controller can visualize the trajectories of airplanes in a way that prevents their infringing on each other’s airspace, so long as the controller can communicate this information to pilots.

Spearman and Thurstone got into a bitter argument over which of their theories was correct. But as often happens in science, the two theorists represented a Hegelian thesis and antithesis in a dialectical argument. What was needed was a synthesis.

The argument was largely resolved in 1993 when American psychologist John B. Carroll built on previous work and showed that general and more specific qualities of mind could be understood hierarchically, with general ability at the top, so-called "primary mental abilities" beneath them, and still more specific abilities beneath those. Carroll’s hierarchical theory is widely accepted today, although certainly not by everyone. There is still some dispute about just how general "general" ability is. For example, psychological theorists such as Howard Gardner and I have suggested that "general ability" may not, in fact, be as general as some have claimed. For example, so-called "general ability" might be more useful in predicting performance of a pupil in primary school than in predicting performance of a pianist, plumber, politician, or poet. In college admissions, "general ability" would correspond loosely to a composite ACT or summed SAT score.

If we now return to institutional assessments, we see that roughly the same logic can be applied to assessments of the quality of colleges and universities. At some general level, colleges and universities near the top of the U.S. News ratings, such as Harvard and Yale Universities, probably excel in some meaningful way over those institutions near the bottom of such rankings, just as people with higher composite ACTs have certain academic skills that are more developed than those in people with lower composite ACTs. But such global assessments miss the qualities that make institutional differences, like individual differences, interesting. They actually can fool people into missing what is most important in distinguishing entities, whether individuals or institutions. For example, the University of California at Los Angeles and the University of Virginia, tied for the second rank among public universities in recent U.S. News ratings, would provide very different experiences to undergraduates (as anyone who has visited UCLA and UVA likely would notice). They differ in the roles of undergraduate versus graduate students, social traditions, and, of course, campus ambiance, among other things.

There is no definitive list of the analogues to the primary mental abilities for institutions of higher learning. But administrators pretty much know what some of the major ones are: quality of research, quality of teaching, quality of extracurricular programs, quality of leadership development, amount of attention individual students receive, effectiveness with which the institution is led, and so on. These differential primary qualities matter greatly in institutions, just as they do in individuals. At the individual level, employers conduct interviews in large part because they realize that job applicants can score high on tests of cognitive ability and yet have poor or, in some cases, sorely deficient social and emotional skills. Similarly, the financial crisis of 2008 was in part the result of the work of people with impressive quantitative skills who nevertheless lacked common sense and an ethical compass. Those selecting an institution of higher learning at which to study or work need to do the same kinds of "job interviews."

When students (or faculty or staff, for that matter) select an institution of higher learning, overall rankings may obscure the information individuals most need to make an informed choice. Some of the best research institutions in the country show relatively little concern with teaching and some of the best teaching institutions put only modest emphasis on research. Of course, there are institutions that care about both and even those that care about neither (so long as they meet their projected bottom line). If one were to select an institution solely on overall quality, one would miss these important differences and many others, such as size, view of undergraduate versus graduate versus professional students, kind of campus life, role of religion on campus, salience of athletics on campus, availability of particular degree programs, pride in traditions, and so forth. In the case of my own institution, Oklahoma State University, the rankings would not take into account its fidelity to its land-grant mission of serving the state of Oklahoma, the nation, and the world.

As with individual qualities of mind, one can and probably should become even more specific when evaluating institutions. With regard to research, one institution may excel in basic research, another in applied research. With regarding to teaching, one institution might have fine teaching in large lectures but few small seminars because of a large ratio of students to teaching faculty; another institution might have excellent teaching, most of it through small seminars. Such differences are consequential for students because they provide different kinds of education. Overall ratings fail to take these differences into account. For example, college students may think that higher rankings mean a better education, when in fact they may mean that professors are less available to college students, not more.

At some level, students applying to college and scholars looking for jobs know all this. But they, or their parents or other family members, may be willing to overlook the particular qualities that make institutions differentially great in favor of some overall prestige or reputational factor, possibly the result of evaluations by people who know little or even nothing about the schools they are evaluating. In the human-abilities field, one can see the same problem when some institutions become enamored of composite standardized test scores while largely ignoring the individual qualities that make a particular applicant unique; similarly, applicants may become enamored of the overall ranking and ignore important differences in institutions. But if people — whether students or employees — are unhappy with their institutional choice, it will not likely be because of overall prestige but rather because the mismatch of the institution to their personal preferences and goals.

A further obvious problem is that the presidents, provosts, and admissions deans making certain ones of the ratings generally know very little about the large majority of the institutions they are rating, and are likely to fall into the trap of giving ratings on the basis of stereotypes. So, at least in “reputational ratings,” one may learn more about prevailing stereotypes than about institutional qualities. It is perhaps embarrassing that people in top positions would fall into the trap of providing ratings based on stereotypes — or that anyone would ask them to do these ratings — but we all are susceptible to falling into these traps in various aspects of our lives.

If evaluators of institutions truly wanted to perform a service for seekers of instruction and of jobs, they would rate the schools in terms of the dimensions that matter to potential applicants and pay little or no attention to overall ratings, which may mean relatively little. Indeed, providing such an overall rating may tempt people to simplify their evaluations in a way that ultimately hurts both the institution and themselves. Evaluators of institutions could learn from psychological research that often the most interesting information is not at the top level of a hierarchy, but somewhere in the middle. It is at this middle level that the information is to be found that will matter most in decision making about institutions of higher learning.


Robert J. Sternberg is provost, senior vice president and professor of psychology at Oklahoma State University.


