How Ph.D. Programs Will Be Judged

National Research Council releases long-awaited methodology for still-awaited rankings. Is the system too complicated?
July 10, 2009

WASHINGTON -- After years of delays and debates, the National Research Council on Thursday released the methodology for its rankings of doctoral programs. The rankings themselves still have not been released, but for the first time graduate schools know definitively how they will be judged when that report comes out in the next few months.

As NRC officials have promised, the new system is data heavy, and does not rely on surveys of reputation, as the last rankings did in 1995. Instead, the NRC rankings rely on a series of measures of faculty quality, the student experience, and diversity -- and will include rankings on those broad areas (and many more specific ones) as well as an overall ranking for each discipline evaluated at various universities. The new system attempts to let the rankings reflect disciplinary values, so that the weighting of different criteria will vary from discipline to discipline.

Further, to stress the subtleties of the analysis, no single ranking of top to bottom will be given for programs; rather, programs will end up with a range of possible scores, based on their numbers, so a physics department might find out that it is most likely between No. 15 and No. 25, or a Spanish department might find out it is somewhere between No. 1 and No. 5.

According to NRC officials, this approach will encourage people to avoid the oversimplification of some rankings and to focus on a wealth of data, parts of which may be more important to some than others. A prospective Ph.D. student might pay more attention to time to degree figures, while a female faculty member might look at gender mix in departments, and a research agency might pay attention to citation ratios. And because many of the figures are on a per capita basis, the council hopes that its figures will not punish small departments, as some have said was a problem before.

But whether this complete revised methodology will be popular remains to be seen. While an unscientific sampling of experts on graduate education and rankings had praise for some parts of the methodology -- and expressed relief that it was finally out -- several said that they feared it was far too complicated in ways that would confuse people and undercut the credibility of the project.

Others expressed fear that with the NRC taking years to finish the work (despite having collected much of the data three years ago), changes in departments will make the rankings questionable. Notably, this skepticism was clear Thursday even though the rankings themselves still aren't out -- when that happens, you can expect more criticism from those whose departments fared poorly.

The NRC rankings are important for many reasons. Most rankings of colleges focus on undergraduate and professional education, and most rankings aren't held in high regard by educators. The NRC rankings, with a narrow focus, have had more clout. And that's especially the case because doctoral programs are both prestigious and expensive. These rankings will arrive at a time when many universities are giving close attention to programs that are relatively low in enrollments and high in expenses -- conditions that apply to even the best of doctoral programs.

"These rankings are going to be viewed through the lens of tremendous budget constraints," said Susan Herbst, executive vice chancellor and chief academic officer of the University System of Georgia. "These may have more weight than we ever thought."

So how do these rankings weight factors?

The 1995 rankings were largely based on reputation, with faculty surveys determining programs of quality in various fields.

Jeremiah P. Ostriker, chair of the NRC committee that produced the new system, said that the research council wanted to move past reputation, seeing it as a flawed approach. Many people assume departments at outstanding universities must be outstanding as a result, even if that's not the case, or people who associate certain stellar researchers with a department may not know that they have retired. "I've certainly seen cases where I knew the rankings were out of date, where there were famous universities with very highly ranked departments last time which shouldn't have been because their best people have already left and retired," said Ostriker, an astrophysicist at Princeton University who formerly was provost there.

So the idea the committee developed was to come up with a series of measures in different categories. In the area of faculty research activity, the list includes average publications per faculty member, average citations per publication, percentage of faculty holding grants, percentage of faculty receiving outside honors or awards and so forth. In the area of student support, criteria include the percentage of students fully funded, time to degree, placement of graduates in academic positions. In diversity, criteria include the percentages of faculty members and students from underrepresented groups, the percent who are female, and the percentage of international students.

But then, two measures were taken to see how much each of these factors should count, both within the subcategories and over all. Faculty members by discipline were asked to rank the relative importance of these factors, with the idea that different disciplines may prize different qualities more or less. Some science disciplines, for example, tend to pay a lot of attention to citations of papers, but that isn't a key factor in the humanities. But as Ostriker noted, there is no guarantee that what faculty members say counts in creating a top doctoral program is accurate. "You can ask people what they eat or you can watch them eat," he said by way of comparison.

So to "watch them eat," the NRC asked faculty members to rank the quality of doctoral programs -- and then ran analyses to see which qualities were associated with success. The results generally matched those done by the first method. And the faculty weights are not only different by discipline for the categories of research, students and diversity, but the relative importance of those categories is different. Generally, faculty paid the most attention to research, followed by students.

So the final system features significantly different weights for each discipline. Looking at similar groups of disciplines, the humanities disciplines value books and honors, but don't weight grant-winning capacity as a key measure of departmental success. The health sciences and agricultural sciences place a very high value on grant-winning.

Ostriker -- who has seen some modeling of how the actual rankings would appear -- said that he has done no analysis of how many programs moved significantly up or down. He predicted that the range for "a substantial fraction" of programs will include their ranking in 1995, but that there will be others with significant movement. He said that, even if the rankings are similar, they will be more accurate this year, as there will no longer be a danger of the perceptions being based on departments of 20 years ago.

He said that he didn't have one model for how the rankings based on the new methodology should be used. He noted that in Britain, rankings of this sort are used to provide more support to programs that are already successful. But he said that "you could take the opposite point of view," adding that "someone could say, in our university, all our health sciences are good except this one department -- that's an embarrassment, so we should improve it."

Personally, Ostriker said he hoped departments and graduate schools and universities would avoid formulaic approaches, and that desire is related to the approach of the new methodology. "To me, you give the support to the departments that use it best," he said.

Jeffery Gibeling, graduate dean at the University of California at Davis, echoed many of his colleagues in saying that while he hadn't had time to carefully review the methodology, "the most positive change is that it's out, and it signals that the NRC is making real, concrete progress toward releasing the data" on individual programs. Gibeling said that deans who want to use the rankings constructively have been frustrated by not knowing when they would finally appear -- and have not known how to plan conversations as a result.

"The challenge has been to try to time the conversations correctly," he said. "I see the methodology guide as a signal that it's time to re-engage with the faculty on the rankings." Generally, he said he applauded the move away from "uniformity" in measuring program success. "We can't all be No. 1, but we should all strive for improvement," he said, and this approach may encourage that.

Many others aren't so sure -- especially on issues of clarity.

Herbst of the University System of Georgia has been critical in the past of the NRC approach to rankings. But she called the methodology "pretty fair. It's not perfect, but perfection is impossible." She said that the methodology shows that the committee members spent time listening to various experts in the field, and tried to be responsive.

But she said her reaction to the entirety of the methodology and the plans for presenting the rankings was that "it's awfully technical for many non-social scientists." (Herbst is a political scientist.) She said that the NRC needs to remember that "however many caveats are in there," many administrators "will use this as authoritative."

For that reason, she said she wished the methodology or additional materials provided tools for administrators to use to start discussions based on the numbers. "I think there is a responsibility to provide tools to enable the kind of informed discussion they would like to see come of this," she said.

For example, Herbst said that many departments that do not do well will say that the problem is one of resources. So administrators will want to know "if they invest X in a department to improve it, what kind of results should there be? How do the NRC rankings help us speak to return on investment?" Or if a department says that the problem was that it lost a few good faculty members, how does an administrator use these rankings to determine if that's true, and whether this suggests more funds are needed? Herbst said she wasn't sure that the methodology suggested answers to those kinds of questions, which are what senior administrators will be asking for.

Generally, Herbst said, she hoped the eventual rankings provided more than the numbers. "When you undertake a project like this, you need to think about these data are going to be used," she said.

Robert Morse, who runs the rankings operation at U.S. News & World Report, said that the methodology released by the NRC was "very sophisticated," and that although he realized its intended audience was more sophisticated than his, "I'm not sure that the people they think will understand it actually will understand it."

Surveys of reputation make up all of some U.S. News graduate program rankings and a significant portion of the rest, and Morse said the result was an "easily understandable" ranking. He was also critical of the time lag between the NRC faculty surveys and the rankings, which still have yet to appear. He noted that many of the NRC categories are based on faculty members. "How many faculty have changed jobs since this started? It seems to me that the information is losing its validity."

He said he understand the importance of peer review in all NRC decisions, but that this project seems to have taken so long that "there's a cost in timeliness."

Brian Leiter, the John P. Wilson Professor of Law and director of the Center for Law, Philosophy and Human Values at the University of Chicago, writes extensively about rankings in law and philosophy and created The Philosophical Gourmet Report, a popular ranking of graduate programs in philosophy.

He was not impressed with the NRC methodology. "It looks like the triumph of pseudo-science over good sense," he said via e-mail. "The staggering complexity -- 21 variables, multiple weightings of the variables, regression analyses and corrections based on them -- will make it very hard for all but the most ambitious readers to interpret the results," he said.

Leiter said that there may be considerable value in the data gathered and compared, category by category. "But the aggregation is likely to produce a meaningless 'nonsense number,' " he said.

The addition of per capita comparisons, he said would "reward small departments with a handful of very prominent faculty, and punish large departments with more prominent faculty but the same number of underperformers. Is program reputation a function of the best or the average? It's hard to know, but the NRC has apparently been captured by the advocates for the small departments. A shame."

Despite those concerns, Leiter -- like everyone interviewed for this article -- said that one of the best things the NRC can do at this point is finish up and release the gradually aging rankings. "The sooner the better," he said.


Be the first to know.
Get our free daily newsletter.


Back to Top