Today the National Research Council released its new rankings of doctoral programs — the first it has produced since 1995. You can read more about their report “here”:http://sites.nationalacademies.org/PGA/Resdoc/.

The NRC calculates these rankings two different ways, using 3 inputs that largely come from data gathered in 2006-2007 (see “here”:http://chronicle.com/article/After-Years-of-Delay-NRC-D/65918/). The 3 inputs are as follows:

1. Department level data collected by the NRC on 20 different indicators (e.g., publications, citations, proportion of students who graduate within six years, etc.).

2. Survey data that asked respondents to evaluate how important each of the 20 indicators are to graduate program quality.

3. Survey data that asked respondents to rate a random subsample of 15 programs on a six point scale.

In the NRC S rankings, S denotes “survey-based.” In the S rankings, the indicator data obtained in (1) is weighted by the survey data in (2). Since there is inherent uncertainty in these rankings, this was done 500 times on random half-samples of respondents, which generates percentile rankings. The NRC presents the range from the 5th to 95th percentile. In other words, the ranking for any school is not a single estimate (#1, #2, etc.) but a 90% confidence interval of sorts.

In the NRC R rankings, R denotes “regression-based.” In the R rankings, the ratings in (3) are regressed on the indicators in (1) to generate a regression based set of weights. These weights were then applied to the indicators in (1) on the full set of schools to generate rankings. Since there is uncertainty in these rankings, the regressions were estimated 500 times on random half samples of respondents, which generates the percentile rankings.

Eric Lawrence and I constructed graphs of the top 25 schools in each of the NRC’s rankings. This is an arbitrary cut-off, of course. We indulged in a bit of self-congratulation by putting GW in red.

Three things to note.

First, the NRC methodology changed a great deal since their 1995 rankings. The two sets of rankings are not comparable.

Second, the R and S rankings are not the same. Some schools are in both, some are not. The ordering of schools changes as well. What might account for differences across the S and R ratings? The R ratings weigh the 20 indicators so that the variables themselves can account for a program’s reputation. The traits that historically prestigious departments have will be more important in the R ranking to the extent that (1) perceptions of these departments remain high and (2) the weights obtained via regression differ from the weights stated by respondents in the survey. And in fact, the weights differ across the two sets of rankings. In the R method, publications per faculty member is the sixth most important criterion behind (in order) the number of Ph.D.s granted, average GRE score, cites per publication, awards per faculty, and number of student activities offered. According to the survey items used in the S method, publications per faculty member is the most important criterion.

Third, and most importantly, the ratings contain considerable uncertainty. We simply cannot statistically distinguish many departments from one another. This is a good thing. Too much is made about specific ranks, even though no one department is clearly the “best” and lots of departments are just pretty similar.

UPDATE: See also this “Inside Higher Ed story”:http://www.insidehighered.com/news/2010/09/28/rankings.