Reading somebody else’s statistics rant made me realize the inherent contradictions in much of my own statistical advice.

Jeff Lax sent along this article by Philip Schrodt, along with the cryptic comment:

Perhaps of interest to you. perhaps not. Not meant to be an excuse for you to rant against hypothesis testing again.

In his article, Schrodt makes a reasonable and entertaining argument against the overfitting of data and the overuse of linear models. He states that his article is motivated by the quantitative papers he has been sent to review for journals or conferences, and he explicitly excludes “studies of United States voting behavior,” so at least I think Mister P is off the hook.

I notice a bit of incoherence in Schrodt’s position–on one hand, he criticizes “kitchen-sink models” for overfitting and he criticizes “using complex methods without understanding the underlying assumptions” . . . but then later on he suggests that political scientists in this country start using mysterious (to me) methods such as correspondence analysis, support vector machines, neural networks, Fourier analysis, hidden Markov models, topological clustering algorithms, and something called CHAID! Not to burst anyone’s bubble here, but if you really think that multiple regression involves assumptions that are too much for the average political scientist, what do you think is going to happen with topological clustering algorithms, neural networks, and the rest??

As in many rants of this sort (my own not excepted),** there is an inherent tension between two feelings**:

1. The despair that people are using methods that are too simple for the phenomena they are trying to understand.

2. The near-certain feeling that many people are using models too complicated for them to understand.

Put 1 and 2 together and you get a mess. On one hand, I find myself telling people to go simple, simple, simple. When someone gives me their regression coefficient I ask for the average, when someone gives me the average I ask for a scatterplot, when someone gives me a scatterplot I ask them to carefully describe one data point, please.

On the other hand, I’m always getting on people’s case about too-simple assumptions, for example analyzing state-level election results over a 50 year period and thinking that controlling for “state dummies” solves all their problems. As if Vermont in 1952 is the same as Vermont in 2002.

When it comes to specifics, I think my advice is ok. For example, suppose I suggest that someone, instead of pooling 50 years of data, instead do a separate analysis of each year or each decade and then plot their estimates over time. This recommendation of the secret weapon actually satisfied both criteria 1 and 2 above: the model varies by year (or by decade) and is thus more flexible than the “year dummies” model that preceded it; but at the same time the new model is simpler and cleaner.

Still and all, there’s a bit of incoherence in telling people to go more sophisticated and simpler at the same time, and I think people who have worked with me have seen me oscillate in my advice, first suggesting very basic methods and then pulling out models that are too complicated to fit. As I like to say, I always want to use the simplest possible method that’s appropriate for any problem, but said method always ends up being something beyond my ability to compute or even, often, to formulate.

To return to Schrodt’s article, I have a couple of minor technical disagreements. He writes that learning hierarchical models “doesn’t give you ANOVA.” Hold that thought! Check out chapter 22 of ARM and my article in the Annals of Statistics. And I think Schrodt is mixing apples and oranges by throwing in computational methods (“genetic algorithms and simulated annealing methods”) in his list of models. Genetic algorithms and simulated annealing methods can be used for optimization and other computational tasks but they’re not models in the statistical (or political science) sense of the word.

Finally, near the end of his paper Schrodt is looking for a sensible Bayesian philosophy of science. I suggest he look here, for a start.