Home > News > More on Data Availability
129 views 8 min 0 Comment

More on Data Availability

- January 27, 2009

In response to my original post, Doug Hess and Chris Kennedy — both with considerable experience doing research in an applied setting (for Project Vote and Rock the Vote, respectively) — make some useful comments, which I hope you’ll read. Andy gives his thoughts here. A few more of my own are below.

Originally, I suggested that various groups who were active in the 2008 campaign (e.g., doing vote contact) should make their data public. In a snotty fashion, I then expressed my doubt they would do so. Doug and Chris sought to illuminate the incentives these organizations have, why that makes my suggestion somewhat pie-in-the-sky, and what strategies scholars might pursue to gain access to these data.

All of these strategies are extremely useful. Doug says, in comments on Andy’s post, that the key is personal contact and relationships. From what I understand, that is what helped launch the partnership between CMAG and Ken Goldstein and the Wisconsin Advertising Project — a partnership that has made public some extraordinary data at a negligible cost to scholars, for which we should praise Professor Goldstein and others at Wisconsin on a daily basis. I’m sure there are other similar relationships. And I would say that the more such relationships exist, the easier it will be for other scholars to form new relationships. Trust begets trust.

But let’s leave aside what is realistic, given the status quo, and instead focus on what could be. What are the benefits and costs of organizations and advocacy groups making their data public? Doug and Chris point out some costs that I blithely ignored:

* The data could reveal strategies or other trade secrets. This could be a boon to competing organizations.

* It takes some time to prepare the data for public dissemination — e.g., write a codebook, etc.

* There is no guarantee that scholarly use of these data would benefit the parent organizations.

All of those points are well-taken.[1] But at the same time, all of these points apply to academics as well. We spend our time and resources gathering data. Providing those data reveals something about our “strategic” goals, entails costs in terms of data preparation, and generates the risk that others’ use of these data won’t help us. (In fact, often other scholars do the precise opposite: use someone’s data to prove them wrong!) And yet we subscribe — at least in theory, if not always in practice — to the norm of data dissemination.

I still think this norm would ultimately benefit organizations. First, in the realm of political campaigns (the focus here), most scholars are interested in precisely the same questions that these organizations care about — namely, what effects did campaign activity have on voters.

Second, by simply disseminating data to particular scholars who build the necessary relationship, these organizations limit what they can learn from their data. This may arise because scholars who have developed a relationship with an organization will feel pressure (if only unconsciously) to produce findings that confirm the organization’s goals (i.e., their campaign activity “mattered”) and thereby nurture the relationship. Moreover, there is no guarantee that any one scholar would analyze the data well. Even randomized field experiments aren’t so simple to analyze. Think of the debate about the original Green and Gerber research on voter turnout (here, here). Now, I think this debate hasn’t dislodged their central findings (see esp. this forthcoming paper by Ben Hansen and Jake Bowers), but still: undoubtedly more will be learned by releasing the data and letting various scholars have at it.[2]

If the costs of preparing the data for release are considerable, perhaps the organization could contract with one scholar or set of scholars who would help prepare the data and then have the right to take the first crack at it. The data could be released publicly at some specified time thereafter.

The only issue I can’t speak to, out of general ignorance, is the first issue above, that of proprietary and strategic value. I’ll only note that I think organizations will vary in how proprietary they feel; political campaigns will be more proprietary than some public interest groups, for example. It’s also quite possible that there are work-around solutions. CMAG imposed an embargo on certain of their campaign advertising data, since it was profitable for them to sell those data for a period after an election. Or, if aspects of an organization’s strategy are more “secret” than others, perhaps the data could exclude certain things. For example, I could imagine that a campaign is more comfortable releasing data about who was targeted than data about the specific messages used in doing so.

Ultimately, I hope this sketches a middle ground that is mutually beneficial. Thanks again to Doug, Chris, and Andy for their thoughts and feedback.

fn1. Chris also suggests that “the demand for publicly available data is merely an attempt to minimize the time costs for academics to conduct their own supplemental research.” Yes, it saves academics time and money, but at the same time, compared to most organization operating in the campaign setting (candidate organizations, labor unions, public interest groups, whomever), academics simply have fewer resources. And I mean vastly fewer resources. An career-making grant from the NSF is rounding error in the average presidential candidate’s budget. That’s something academics really can’t do anything about, unless the NSF is going to get its budget quadrupled anytime soon. So, if we’re going to allow these organizations their interests, let’s allow academics theirs.

fn2. An example involving the CMAG data: thanks to the availability of these data, other scholars have been able to point out their strengths (accurate reporting of ad airings) and weaknesses (overestimates of the costs of ad buys). See this paper by Michael Hagen and Robin Kolodny.

Topics on this page