Horse-race political science

David Walker’s assessment of the performance of forecasting models of the 2008 election provides an occasion for me to raise a question that troubles me about these models, which, to use the old World War II phrase, is “Is this trip really necessary?” By “this trip,” I’m referring to Walker’s article only indirectly. My question is really directed toward the practice of political science-based election forecasting.

Others may tell the story differently, but I’ll begin with John Mueller’s oft-cited study, War, Presidents and Public Opinion. Mueller’s work lives on today in the cottage industry of analyses of presidential popularity that his study prompted. But Mueller also, albeit inadvertently, provided a basis for election forecasting by political scientists when, in the course of discussing the peculiarities of Gallup’s presidential popularity question, he offhandedly dismissed the president’s standing in the polls as “a very imperfect indicator of electoral success or failure for a president seeking reelection.” But could that really be true? He presented no data to support his supposition, and it seemed implausible that the president’s standing in the polls would have little bearing on his performance on Election Day.

Intrigued, other researchers — myself included — rushed in to put Mueller’s assertion to the test, and what they uncovered, contra Mueller, was a very high correlation between presidents’ job ratings in the final pre-election Gallup Poll and their share of the popular votes in general election. These reanalyses weren’t theoretically progressive or methodologically innovative. They simply took issue with a brief digression in Mueller’s wide-ranging consideration of presidential popularity. Point made, case closed, yes?

Well, no.

From these humble beginnings as well as from some other sources developed a spirited competition among political scientists, and between political scientists and economists, to determine whose model could provide the best forecasts of presidential and congressional election outcomes. A wave of new forecasting models soon appeared. By 2000 a dozen or more models were competing against one another and against the wholly different, market-based forecasting approach of the Iowa Political Stock Market. Since then the refinement of existing models and the development of new models have continued apace, with new analyses appearing with clocklike regularity at two- and four-year intervals timed to the electoral cycle.

But toward what end? In an important early election forecasting study, Steven Rosenstone (1983) argued that scholars should not regard the forecasting of election outcomes as a high-priority undertaking in and of itself. After all, rather than going to the trouble of trying to forecast an election outcome a few weeks before Election Day, researchers could simply wait and see how the election turned out – and what would be lost? Instead, Rosenstone argued, election forecasting should be treated as a convenient vehicle for addressing the more important question: “What determines election outcomes?” That is, election forecasting should be undertaken as a means of testing the empirical implications of theoretical models. Campaign consultants and others with a vested interest in seeing that one side or the other wins the election naturally place a high value on forecasting election outcomes. But why should this be considered a high priority for political scientists?

As time has passed, it has become increasingly evident that, notwithstanding Rosenstone’s demurrer, political scientists have approached election forecasting primarily as an end in itself rather than as a means of testing models derived from theory. What we have is a profusion of studies designed to see whether forecasting accuracy can be improved by some increasingly technical tweaks in model specification – typically ad hoc tweaks – and by adding one more election to the dozen or so on which most of these models are based. These models are generally built from the bottom up (that is, their logic is “Well, these predictors worked last time so let’s see if I can introduce some small changes — a slightly different specification, a new data point — that would make them do even better”) rather than from the top down (in which the logic would be “Aha! Here’s a good opportunity to test my theory of the determinants of electoral success”). Moreover, to complicate the scorekeeping, it is strikingly unclear what would constitute a good test of a model’s performance. The conventional standard is how close a certain model comes to “getting it right,” i.e., how accurate a particular out-of-sample forecast came to the actual outcome. But for a variety of reasons, “getting it right” in a single election is hardly a meaningful test

I’m not characterizing these exercises as worthless from a political science perspective. After all, it took such a model, albeit an extremely simple one, to overrule Mueller’s assertion that the president’s standing in the polls has little bearing on the outcomes of presidential elections. But I truly wonder what all the fuss is about, above and beyond the potential thrill of victory in showing that one’s own pet model has outperformed its rivals by some small margin in a particular election. Is there some important payoff there? Are these models being used in the way that Rosenstone – rightly, I think – advocated? Too often, I think, these exercises are triumphs of technique over theory, exemplifying the tendency of political scientists with high-level statistical skills to concentrate on the specifics of statistical estimation without paying due heed to the broader purposes that such modeling is supposed to serve. Political scientists often speak scornfully of what they call “horse-race journalism.” Perhaps we should be more careful about where we throw our rocks, lest our own windows get shattered.

Topics on this page

World War II Election day John Mueller

16

Horse-race political science

Topics on this page

Related