Home > News > Studying Social Media: Twitter and Replication
119 views 5 min 0 Comment

Studying Social Media: Twitter and Replication

- August 10, 2011

I closed my last post on “social media and mass mobilizaton”:http://tmc.org/blog/2011/08/09/more-on-social-media-england-and-collective-action/ by indentifying a subject that I thought warranted future study. Related to such endeavors, in a recent “opinion piece in Science”:http://www.sciencemag.org/content/331/6018/719.full, Harvard Professor “Gary King”:http://gking.harvard.edu/ addressed the opportunities and challenges posed for the social sciences by the huge amounts of data that social media like Twitter are now generating. While the “whole article”:http://www.sciencemag.org/content/331/6018/719.full is worth a read, I wanted to highlight one particular section:

The potential of the new data is considerable, and the excitement in the field is palpable. The fundamental question is whether researchers can find ways of accessing, analyzing, citing, preserving, and protecting this information. Although information overload has always been an issue for scholars, today the infrastructural challenges in data sharing, data management, informatics, statistical methodology, and research ethics and policy risk being overwhelmed by the massive increases in informative data. Many social science data sets are so valuable and sensitive that when commercial entities collect them, external researchers are granted almost no access. Even when sensitive data are collected originally by researchers or acquired from corporations, privacy concerns sometimes lead to public policies that require the data be destroyed after the research is completed—a step that obviously makes scientific replication impossible and that some think will increase fraudulent publications.

The reason I’m focussing in on the question of _replication_ is that it appears that Twitter, while at least somewhat facilitating the use of its data by researchers, is also prohibiting the distribution of the same data in replication data sets. Details – and an actual example of this behavior – can be found in this “post on Zero Intelligence Agents”:http://www.drewconway.com/zia/?p=2784. As Drew Conway, the author of the post, notes:

Despite its desire to be portrayed as the engine of social change, Twitter’s dirty secret is that it fights to prevent people from actually showing evidence of this. Through a combination of opaque adjudications for whitelist appeals, and contradictory and confusing language in the terms of service, Twitter has effectively locked researchers out of their data. Even if you’re an academic with the inclination and ability to hack together scripts for collection, as Michael was with the #25bahman data, all of your effort is subject to the whim of Twitter’s subjective approval.

In my view, this is a great tragedy of contemporary social science. The academy is very slowly beginning to understand the breadth of research topics that Twitter data can be applied to. In most cases this has been within technical disciplines, like computer science, but the real opportunity for knowledge building is in the social sciences. For it to be successful, however, Twitter needs to allow for reasonable fair use of their raw data in academic research and for this data to be redistributed widely. A simple Google search reveals that such fair use claims have legal precedent, but Twitter needs to proactively move their usage terms to the right side of this argument.

Clearly, policies such as this will lead to even more difficulty in ensuring the replicability of studies that King refers to in his article. However, earlier this week “Mark Huberty”:http://markhuberty.berkeley.edu/, a graduate student at UC Berkeley, posted “an announcement”:http://www.drewconway.com/zia/?p=2798#more-2798 regarding an attempt at mobilizing researchers to pressure Twitter to change its policies. Those interested in getting involved in this effort should see “Zero Intelligence Agents”:http://www.drewconway.com/zia/?p=2798#more-2798 for more details.

Topics on this page