Straight talk about polling: Probability sampling can be helpful but it's no magic bullet

Mark Blumenthal and Natalie Jackson of Huffpost Pollster write:

If you read polls in the news, you’re probably familiar with the term “margin of error.” What you may not know is that pollsters disagree fiercely about when it should be used. . . .
Based on the sample size (and some other factors) and utilizing statistics, the pollster can calculate the margin of sampling error. This describes how close the sample’s results likely come to the results that would have been obtained by interviewing everyone in the population — in theory — within plus or minus a few percentage points.
Over the years, the margin of sampling error has typically been provided to give readers a sense of a poll’s overall accuracy. That simple idea requires some critical assumptions, however: It presumes that the sample was chosen completely at random, that the entire population was available for sampling and that everyone sampled chose to participate in the survey. It also assumes that respondents understood the questions and that they answered in the desired way. For pre-election surveys, it assumes that pollsters have accurately defined and selected the population of likely voters.
In the real world, these assumptions are never fully satisfied. If some part of a population is not sufficiently covered or does not respond, for example, and that missing portion is different on some characteristic or attitude of interest, the survey results could be off in ways not reflected in the margin of sampling error. . . .
All of this brings us back to the often contentious debate among pollsters about whether it is appropriate to report a margin of error for Internet-based surveys that do not begin with a random sample. . . .
Online surveys typically start out with the convenient: They use nonrandom methods to recruit potential respondents for “opt-in” panels and then select polling samples from these panels. But professional Internet pollsters don’t stop there. To make the nonrandom sample look like the population, these pollsters use weighting and modeling techniques that are similar to, albeit more statistically complex than, the methods used with random-sample polls conducted by phone.
The argument against reporting a margin of error for opt-in panel surveys is that without random sampling, the “theoretical basis” necessary to generalize the views of a sample to those of the larger population is absent.

At this point I should probably give you my take on all this:

Just about any survey requires two steps:
1. Sampling.
2. Adjustment.
There are extreme settings where either 1 or 2 alone is enough.
If you have a true probability sample from a perfect sampling frame, with 100 percent availability and 100 percent response, and if your sampling probabilities don’t vary much, and if your data are dense relative to the questions you’re asking, then you can get everything you need—your estimate and your margin of error—from the sample, with no adjustment needed.
From the other direction, if you have a model for the underlying data that you really believe, and if you have a sample with no selection problems, or if you have a selection model that you really believe (which I assume can happen in some physical settings, maybe something like sampling fish from a lake), then you can take your data and adjust, with no concerns about random sampling. Indeed, this is standard in non-sampling areas of statistics, where people just take data and run regressions and that’s it.
In general, though, it makes sense to be serious about both sampling and adjustment, to sample as close to randomly as you can, and to adjust as well as you can.
Remember: just about no sample of humans is really a probability sample or even close to a probability sample, and just about no regression model applied to humans is correct or even close to correct. So we have to worry about sampling, and we have to worry about adjustment.

OK, back to Blumenthal and Jackson:

This ongoing debate creates a dilemma for The Huffington Post’s reporting of results from the opt-in online panel surveys we conduct in partnership with YouGov. Although YouGov calculates a “model-based margin of error” for each survey, we have not been reporting it when we discuss the survey results in HuffPost.
The problem: If we cite YouGov’s margin of error, we violate AAPOR’s [that’s the American Association of Public Opinion Research] Code of Ethics. If we leave out the margin of error, however, we fail to offer readers guidance on the random variation that’s present with this type of survey, which we believe is also an ethical lapse. As members and proponents of AAPOR, we consider neither situation satisfactory. And the margin of error does offer valuable information when you’re comparing two results from a survey or surveys — it tells you how large differences have to be in order to mean something.
So we’ve come up with this solution: We’ll add the following text to the methodological details we note when we report on HuffPost/YouGov surveys and link to the additional information prepared by YouGov:
“Most surveys report a margin of error that represents some, but not all, potential survey errors. YouGov’s reports include a model-based margin of error, which rests on a specific set of statistical assumptions about the selected sample, rather than the standard methodology for random probability sampling. If these assumptions are wrong, the model-based margin of error may also be inaccurate. Click here for a more detailed explanation of the model-based margin of error.”

If they want to do this, I think they should also write this:

We’ll add the following text to the methodological details we note when we report on Gallup, NYT, Pew, etc etc surveys and link to the additional information prepared by the survey organization:
“Most surveys report a margin of error that represents some, but not all, potential survey errors. [Survey organization] reports include a model-based margin of error, which rests on a specific set of statistical assumptions about the selected sample, rather than the standard methodology for random probability sampling. If these assumptions are wrong, the model-based margin of error may also be inaccurate.”

Blumenthal and Jackson also found this charming 2009 quote from Gary Langer, “then ABC News director of polling”:

By claiming sampling error, samples taken outside that framework try to nose their way out of the yard and into the house. They don’t belong there. I have yet to hear any reasonable theoretical justification for the calculation of sampling error with a convenience sample.

Speaking from the yard, or perhaps the doghouse, let me just say that I have yet to hear any reasonable theoretical justification for the calculation of sampling error with a survey with 90 percent nonresponse rate.
Speaking as a statistician, I hate hate hate hate hate when people throw around tough-guy phrases such as “I have yet to hear any reasonable theoretical justification.” I’m sure this guy has some practical expertise, and if he thinks this expertise is relevant to his evaluations, he can make the case. That makes more sense than just talking about “theoretical justification” as if he knows what he’s talking about.
P.S. Yes, this is an insiders’ discussion but it’s relevant to political science and politics, as a lot of what we know about public opinion comes from polls.

Topics on this page

ABC News American Association for Public Opinion Research Gallup, Inc.HuffPost The New York Times YouGov

29

Straight talk about polling: Probability sampling can be helpful but it's no magic bullet

Topics on this page

Related