I’ll leave it to John to write the bigger post on how much better the election results support models based on aggregates of polls than the bloviations of the pundits who were flinging poo at them a couple of days ago. I have a smaller question – is the “Nate Silver” (or Simon Jackman, or Drew Linzer, or Sam Wang) equilibrium sustainable over the longer term? More precisely – might these models cannibalize the individual polls that they need to draw on for data?
Here’s the potential problem. These models need to crunch lots of polls, at the state and national level, if they’re going to provide good predictions. It doesn’t matter if these various polls work on different assumptions – indeed, this may sometimes be an advantage (if polls have different assumptions about likely voters etc, and these assumptions each capture different bits of the truth, then the aggregate prediction will be the better for it). These polls are carried out for a number of reasons. Some are commissioned by news organizations, who hope to use them to sell newspapers or attract eyeballs. Others are carried out more or less as loss leaders by polling organizations trawling for business.
But if politically interested consumers start paying more attention to the aggregate tracking models instead of triumphing or despairing in response to the vagaries of individual polls, then there is less incentive to produce these individual polls in the first place. If people want to read about the results of Nate Silver’s model rather than the one shot picture provided by e.g. the Washington Post‘s latest poll, then the Washington Post obviously has less incentive to pay for an expensive poll which will garner less readership. Similarly, if people aren’t interested in individual polls, then they are going to be much less effective as loss leaders for polling firms.
And this presents a problem, because Silver, Jackman and everyone else _need_ to feed their models with lots of individual polls. To put the problem more abstractly, individual polls are a necessary input into aggregate polling models. But the people putting these models together are not, in fact, the customers for these polls. And, from the perspective of the _actual_ final consumers, the outputs of the aggregate polling models are a good (and arguably superior) substitute for the individual polls. The models might, over the longer term, drive the individual polls out of the market, cannibalizing the conditions of their own existence unless someone figures out a new business model.
I don’t want to stretch this too far – we are a long way away from observing this kind of effect in real life. Still, it seems plausible that if aggregate models become _too_ popular, they may cannibalize the conditions of their own operation, unless they can figure out a different business model through which they can generate the necessary data.