Home > News > Data availability
160 views 5 min 0 Comment

Data availability

- January 25, 2009

John writes,

A great next step would be for Catalist, the SEIU, and other organizations to make their 2008 campaign data publicly available. I won’t hold my breath, of course.

To which Chris Kennedy of Rock the Vote replies,

If a political scientist is looking for organizations to publish their voter contact data, one of the most effective methods in my experience is to offer an in-kind evaluation/analysis of the voter contact program. As Doug hints, data publishing needs to be rational from the organization’s perspective given their limited time and money. . . .

Then John:

I can certainly understand that it takes money and resources to collect these data. But I think the scholarly norm of publicly releasing the original data (not just the results) should still be more operative than it is. For one, it’s not like it “costs” these organizations anything to release the data. Second, knowledge would accumulate faster were the data in the public domain, and thus these organizations would learn a lot more than they would from contracting with individual scholars. Let a thousand flowers bloom, etc. I actually think it’s a win-win situation for both these organizations and scholars.


In theory I agree with that releasing data publicly is good for organizations because they will receive future benefits. There are three issues that need to be addressed though. . . . First, it is nontrivial to release campaign field experiments . . . Second, publishing voter contact data can reveal targeting choices and other strategic, proprietary advantages of a campaign. . . . Third, there is no guarantee of any results of releasing data publicly. If I’m going to spend a week on data management so I can publish 15 experiments, 3 polls, and 4 side projects, there should be some guarantee of intellectual revenue, otherwise it was a waste of time. Given these constraints, I don’t see publicly available data as a no-brainer activity for political organizations.

I offer the counter-argument that the demand for publicly available data is merely an attempt to minimize the time costs for academics to conduct their own supplemental research, and that any truly beneficial analyses could be conducted nearly as easily by contacting the organization and making a case to gain access to the data. This method encourages knowledge generation while also catering to the concerns of political organizations.

Everything that John and Chris say above is reasonable. I just have a few comments to add:

1. I’ve had good experiences with the Census Bureau. For a government survey organization, it’s basically their job to help you with the data, no matter who you are.

2. There are many many national surveys done all the time, especially before an election, and it can be hard to get to them. What’s frustrating is that many of these surveys seem to exist only to supply a single day’s headline. I think it would be a win-win situation if these data were shared (and if fewer such surveys were done; what a waste of time so many of them are)!

3. I’m supportive of Chris’s suggestion to build a relationship with the data-gathering and data-collecting organizations, and I suspect that John might have some success with the Catalyst people. That said, I’ve often had difficulty making such connections with other organizations, even in settings where I already have personal contacts. Maybe it’s just that it’s work to put together a dataset, I don’t know, but I’ll say that the advice to build a relationship is easier said than done.

4. My own experiences coming at it from the other side: I have data and code from my books on the web. And people email me all the time asking for cleaner code, or help with the data, or advice on debugging their code, or whatever. I don’t mind–really, it’s no problem, when I’m too busy I don’t have to reply–but it does seem to be true that when you give stuff away, people start asking for more…