You're Not No. 1

WASHINGTON -- The advance briefing for reporters covering Tuesday's release of the National Research Council's ratings of doctoral programs may have made history as the first time a group doing rankings held a news conference at which it seemed to be largely trying to write them off.

September 29, 2010

WASHINGTON -- The advance briefing for reporters covering Tuesday's release of the National Research Council's ratings of doctoral programs may have made history as the first time a group doing rankings held a news conference at which it seemed to be largely trying to write them off.

While the NRC committee that produced the rankings defended its efforts and the resulting mass of data on doctoral programs now available, no one on the committee endorsed the actual rankings, and committee members went out of their way to say that there might well be better ways to rank -- better than either of the two methods unveiled.

And choose between those two methods? The committee declined to do so. In fact, its members insisted that the five-year process, costing well over $4 million (much of it paid by universities and not counting enormous time spent by graduate schools slicing and dicing the data), could not yield precise rankings of anything.

Rankings have been criticized in the past for suggesting false levels of precision, but that isn't a criticism you'll hear about this process.

"We can't say this is the 10th best program. We can say it's probably between 5th and 20th," said Jeremiah P. Ostriker, chair of the NRC committee that prepared the rankings, and a professor of astronomy and former provost at Princeton University. The approach used is "a little bit unsatisfactory, but at least it's honest," he said. When one of the reporters on a telephone briefing about the rankings asked Ostriker and his fellow panelists if any of them would "defend the rankings," none did so.

As Ostriker suggested, these rankings are much more complicated than what most people are used to. There are two overall rankings -- with different methodologies -- each calculated as ranges that should give 90 percent confidence. There are sub-rankings in a variety of topics. And there is a wealth of data, covering 5,000 programs in 62 fields at 212 universities.

Past NRC rankings -- while always criticized by some -- have been taken seriously by academic leaders. For a number of reasons, many involved in graduate education have been particularly anxious for this new version to come out. The last rankings were released in 1995, based on 1993 data, and were considered seriously out of date. In the intervening time, much has changed in graduate education -- and departments are feeling particular pressure now to demonstrate their value. Doctoral programs tend to be expensive for institutions to operate, but are also highly prestigious -- so having valid comparisons of programs is theoretically something everyone wants.

But many graduate education experts say that these rankings are arriving well past their sell-by date, given that the process started in 2005, data were collected in the 2005-6 academic year, and the original release date was supposed to be 2007. The time lapse is particularly troublesome, many university officials say, because the years that have passed have seen huge economic turmoil in higher education -- with many top universities adopting early retirement programs, and some up-and-coming universities pushing hard to raid talent from elsewhere.

Further, the highly sophisticated methodology is seen as too complicated by many, and differences in some fields between the two separate rankings issued by the NRC have some wondering whether either one is valid. And some departments -- notably at the University of Washington -- are this morning charging the NRC with publishing inaccurate data.

Who Fares Well
Concerns about methodology are
unlikely to squelch interest in which
universities the NRC ranks highly.
Download a table of top-performing
institutions here.

Despite all of these concerns, there are plenty of individual departments that will be cheering today (or that have been cheering in recent weeks as graduate deans have shared some of the advance information). Boston University didn't even wait for the embargo to be lifted to boast about its results. Some departments have already been analyzing how they can use the data to recruit new faculty stars or top grad students. At universities that have kept information tightly held, chairs have been reaching out to anyone (including this reporter) who might be able to reassure them that they ended up about where they believe they should be.

And even among the many graduate education experts who are not happy with the rankings methodologies or presentation or timeline, many are hopeful about the resulting database, and say that -- if properly updated -- it could become a valuable tool to improve programs.

For those wondering about who came out on top (to the extent there is a top), Inside Higher Ed has prepared this downloadable table showing all the departments that could be as high as first, second or third using each of the two methodologies. In most cases, more than one department could be as high as first. In some cases, the same university is listed under more than one list for the discipline -- because that university has multiple departments that fit into the discipline and both qualify for top ratings (for example, the University of Wisconsin at Madison appears multiple times under animal science in the two methodologies, reflecting high scores for its departments of animal science, dairy science and zoology, all of which are deemed animal science departments by the NRC).

While individual disciplines will feature surprises, the universities most people would expect -- Harvard, Princeton and Stanford Universities and the University of California at Berkeley -- turn up time and again among the top-ranked departments. In the technology fields and some social sciences, the Massachusetts Institute of Technology and California Institute of Technology are all over the place. Land-grant universities, not surprisingly, dominate the agricultural disciplines (although many also do well elsewhere).

Those looking at universities with departments ending up at the high end of the ranges will see, as expected, other Ivies, the Universities of Chicago and Michigan, and so forth. Some long-term trends are also evident -- such as MIT, for example, showing excellence not only in its historic areas of strength, in the physical sciences, but also the biological sciences; or the spread of the highest ranks of research excellence in the University of California beyond Berkeley and UCLA to other UC campuses.

By collecting data from so many graduate programs, the NRC is also able to provide information on broad trends in doctoral education -- comparing departments considered in both the 1995 and 2010 rankings. Comparing the two, some of the highlights:

  • The number of students enrolled has increased in engineering by 4 percent and in the physical sciences by 9 percent, but has declined in the social sciences by 5 percent and the humanities by 12 percent.
  • Female enrollments are up across all disciplines, with the largest percentage gains coming in fields, such as engineering, where their overall share of the student population remains small. (More recent data released this month for 2008-9 by the Council of Graduate Schools indicate that women have now overtaken men in doctoral degrees awarded in the United States.)
  • The percentage of Ph.D.s awarded to students from underrepresented minority groups has increased for all fields. Minority Ph.D.s increased from 5.2 percent to 10.1 percent in engineering, and from 5 percent to 14.4 percent in the social sciences.
  • "Time to degree" varies widely for doctoral programs. More than 50 percent of students in the agricultural sciences and engineering complete their degrees in six years or less, while only 37 percent of those in the social sciences do -- the same percentage of humanities students who complete by eight years.

Generally those statistics won't shock those who have been tracking graduate education, and in many cases the data on which they are based are already old, although the trends have most likely continued (or even accelerated). Still, graduate officials say that the magnitude and uniformity of the data collected could be a great model if departments and the NRC find a way to update the database and keep it current.

The Methodologies

The methodologies used to produce the rankings are complicated enough that many graduate deans -- including those for whom regression analysis is an everyday activity -- are calling it dizzying.

For the "S" ratings, the methodology was released last year, and although it has been revised in some ways, the basic approach has stayed the same. Faculty members in the disciplines studied were surveyed on a range of factors that might go into judging a program to see which they would value most highly. Then the formula for judging each discipline was tweaked to give more weight to the factors that a discipline values. So, for example, in fields where winning outside grants is considered a key measure of quality (as is the case in the biological and physical sciences), that counts for a larger share than it does in humanities fields, where many top scholars don't receive much outside support. The idea is that fields are so different -- to give another example, some care about citation index measures and others don't -- that judging them all by any one standard would be wrong.

For the "R" ratings, a different methodology was used. Faculty members were surveyed broadly on which programs they thought were the best and then an analysis was done of the qualities those programs had -- so the R methodology favors the programs that have the characteristics that were attractive to faculty members when they picked the best programs. Generally, larger programs did better with R than S, suggesting that while faculty members don't necessarily say that they value large programs, they tend to prefer them to smaller programs.

For both the S and the R ratings, additional calculations were done to determine a range of ratings that would project 90 percent confidence. This was done, NRC officials said, because of variability in any given year, such that a faculty member may land a big grant one year but not the next, or a program may enroll its best students one year and not another. In some fields, there is considerable overlap between S and R ratings, but in others there isn't so much. Originally, the NRC planned to release ratings with only a 50 percent confidence level, not 90 percent. Ostriker said at the press briefing that the change was made in part because there was too much divergence -- at 50 percent confidence -- between the S and R ratings.

Ostriker characterized the R and S approaches as two equally valid methods of judgment. He noted, for example, that a researcher studying diet could take multiple approaches. "Suppose you want to know what people eat -- you can watch them eat or ask them what they eat and both would have errors," he said.

Pressed on various criticisms of the rankings methodology, Ostriker and others didn't get into defending their choices, although they did defend the need to make some choices. "Any such effort is based on values," he said, so the question is "where do the values come from." In many rankings efforts, he said, those doing the rankings gather in a room and the relative weights for different measures are "off the top of their heads." NRC officials also said that there was enough "stability" in universities to make the rankings valid even five years after much of the data were collected.

Richard Wheeler, a member of the NRC committee who is interim vice chancellor for academic affairs at the University of Illinois at Urbana-Champaign, said that the results of the rankings "would have been quite different had the methodology been tweaked in different ways," and "we don't want to claim that these are the only possible results."

The Reactions

Criticisms offered of the methodologies cover a range of issues. From the start of the process, some have, for example, questioned whether a disciplinary approach is the best way to judge doctoral education when so many of the hot ideas are interdisciplinary.

But now that the rankings are out, many are finding additional flaws. Janet A. Weiss, graduate dean and vice provost at the University of Michigan, said that "parts of this are going to be helpful," and she cited the actual data as being that part. But Weiss is fairly dubious of the ratings -- and that may be significant because Michigan has many programs that did quite well.

She noted that the methodology is "nonintuitive" and likely to confuse. And she questioned why the weighting was done in both systems entirely on faculty views of what matters. "Faculty are a very important constituency and what they think is very important, but other groups might have come up with very different weights," she said, citing both students and deans as people who might have relevant views. For example, she said that at Michigan, many people from all groups might place a higher value on interdisciplinary work than do those who assigned weights in the national faculty survey. Likewise, she said Michigan places a high value on student diversity in its graduate programs -- something given relatively little weight in the national surveys. But for Michigan to compare its departments in the ratings systems, the university would accept weights it doesn't necessarily agree with, she said.

Even if Michigan liked the weights, she added, it is hard to have confidence in the very wide ranges in some of the overall R and S rankings -- and the way those ranges differ so radically from field to field.

In some fields, for instance, the R and S ratings are fairly close and are in ranges that suggest proximate conclusions of quality. For Michigan's mathematics program, for example, the R range is 4-12 and the S range is 7-20, roughly conveying the message that Michigan may not be at the very top of the heap, but that it has a notably strong program.

Shift to communications, however, and there Michigan has an R range of 2-58 and an S of 7-22, raising questions, Weiss said, about both why one range is so much larger than the other, and what a range means when it runs from the top of the field to very much the middle of the pool. And Michigan is far from unique in having very wide ranges in this and other fields. Sticking to communications, the University of Texas at Austin has a program that could be No. 1 or could be 69th (in its R rating). Cornell University's similar range is from 2 to 67. North Dakota State University could be as high as 7 or low as 78.

Why, Weiss asked, do some fields have narrow ranges and others "gigantic ranges?"

David E. Shulenburger, vice president of academic affairs at the Association of Public and Land-Grant Universities and former provost of the University of Kansas, agreed. "The S and R rankings are where the complexity is, and I'm not sure there was much utility in doing the rankings, and particularly with the discrepancies," he said. "It's just going to leave people scratching their heads, arguing for one or the other."

Many experts commented on the issue of complexity, combined with the age of data. John V. Lombardi, president of the Louisiana State University System, has himself written extensively on how to compare research universities and has helped devise systems for comparing them. (Lombardi has blogged for Inside Higher Ed.) He said that the NRC methodology illustrates "the quixotic nature of any effort to measure research university performance in this way."

Lombardi said that, for measuring research performance, existing data from the National Science Foundation are "reasonably comparable" and provide "much more useful results." If the NRC efforts show anything, he said, it may be about "the challenge of the task -- but the result must have been exceptionally frustrating to all the experts who participated in producing this report."

Asked if he would use the rankings in the kind of work facing LSU and many other university systems these days of deciding which programs to build up and which to eliminate, Lombardi said "probably not" because "the data are old" and the focus ignores the way a doctoral program's faculty may also contribute to undergraduate education. He noted that the research performance of a graduate department "may have nothing at all" with a department's undergraduate role or state research priorities.

'Weird and Not Very Reliable'

Brian Leiter, the John P. Wilson Professor of Law and director of the Center for Law, Philosophy & Human Values at the University of Chicago, who writes frequently about ratings of academic departments (especially in philosophy and law), was not impressed with the NRC effort. He was particularly critical of the R rankings (the one in which faculty were first asked to rate departments over all, and then the characteristics of those programs were used to rank everyone." The roots of the R rankings are thus "a secret reputational survey" that wasn't actually published, with little guidance to those who were surveyed. "The NRC insists the R ranking is not a reputational survey, and that is right," he said. "The R rating is essentially a weird and not very reliable approximation of a reputational survey of an unknown group of evaluators."

Leiter also noted the impact of "the huge time lag" on the S rankings -- and said that given that various measures of "research activity" that are fundamental to the S rankings are based on faculty productivity, this is too long a passage of time for the movement of faculty members not to have a major impact. In Leiter's blog, he frequently notes the movement of faculty members because -- especially in a field like philosophy, which does not have mammoth departments -- one or two departures matter a great deal.

He cited some examples. The high end of Yale University's ranges for philosophy would be 25th (R ranking) and 39th (S ranking, which would be influenced by faculty research accomplishments). In 2005-6, it didn't yet have two of the most "highly decorated and recognized senior philosophers" around, who are now there -- Stephen Darwall (who moved from Michigan) and Thomas Pogge (who moved from Columbia). In programs with small departments, "it's hard to see how just these two, even by the NRC's criteria, would not have changed the results significantly."

Similarly, he said that in 2005-6, Chicago's faculty roster would have included John Haugeland (a Guggenheim winner who has since died), Charles Larmore (a fellow of the American Academy of Arts and Sciences, who left for Brown University) and William Wimsatt (an influential figure in philosophy who has since retired).

Leiter said these examples could be "multiplied in both directions," making it foolish to count on measures of faculty accomplishment and reputation that are so out of date.

Whether the impact of the time lag matters across the board or especially in smaller departments is a matter of some debate in the graduate school world. Many who are familiar with departments like philosophy argued that those programs are particularly vulnerable. But Weiss of Michigan said she viewed this as a serious issue across the board. She said that among Michigan's arts and science faction, 25 percent of the faculty has turned over since the NRC collected information.

Another issue some have raised concerns potential errors. In one case noted in the NRC's report on the ratings, a well-known department (not identified by the council) emerged with unexpectedly low ratings. Upon closer examination, the NRC found that the GRE scores submitted (as part of the formula for the S ratings) were of all applicants, not admitted applicants, and thus were much lower than those of peer institutions. When that mistake was fixed, there was a significant shift in the department to roughly where people expected it to end up.

Accuracy Questioned(This section has been updated from earlier version.)

It isn't clear all changes were caught in time. The University of Washington's computer science department on Thursday posted a statement alleging the use of "erroneous data" in the rankings, such as the use of an incorrect faculty list, and wrong figures on faculty awards and other factors. Washington's engineering college is expressing concerns about "clear inaccuracies." The NRC posted some revisions to the computer science calculations, in part due to the complaints from Washington, and also noted that it made some other fixes (such as reclassifying the University of Miami as a private institution.)

The National Research Council issued a statement noting that its officials had written to the University of Washington and "acknowledged that an error in computer science was found in the student placement variable called 'Percent with Academic Plans,' which was corrected in the data sheet and the illustrative rankings for computer science programs released on September 28. They also said that it was unfortunate that faculty lists for several programs at the University of Washington were not submitted correctly to the NRC. Other universities had corrected similar mistakes in their submissions during the data validation process."


But the University of Washington is standing by its statements that the instructions from the NRC were not clear. And the Computing Research Association released a statement backing the university and suggesting that the problems are significant. "CRA has serious concerns about the accuracy and consistency of the data being used in the evaluation of the computer science discipline," says the statement. "CRA has identified a number of instances in which data were reported under different assumptions by institutions, leading to inconsistent interpretation of the associated statistical factors. CRA has further identified a number of instances where the data is demonstrably incorrect – sometimes very substantially – or incorrectly measures the intended component.

Hoping for the Best From the Data

Debra Stewart, president of the Council of Graduate Schools, said that she thought the "primary contribution" of the NRC project would be data, not rankings. She said that at a time that doctoral programs in the United States remain tops in the world, but also face competition, the availability of comparable data can help programs use benchmarking to improve. Graduate deans and graduate program directors have "a very strong spirit of believing in quality assessment" so this data -- if updated consistently -- could help them.

But she was much more skeptical of the rankings. Stressing that she was offering only her personal opinion, she said that "mindless obedience to a ranking someone else constructs is not necessarily the best way to improve programs," especially given the different missions of departments in the same disciplines. She said she hoped the rankings and methodology debate would encourage discussion. "You really need to think about what you are trying to achieve and how to measure it before getting into the rankings game," she said.

Michigan's Weiss said: "This process was hugely expensive and time-consuming for the institutions and the faculty. If we are to do it again, I think it would make much more sense to focus on collecting a core set of data that we could use to benchmark ourselves and abandon this process for calculating rankings."


Back to Top