Selection bias in the study of chain emails

- June 10, 2010

I blogged a “couple of months ago”:https://themonkeycage.org/2010/04/the_political_science_of_chain.html about an interesting piece by Liben-Knowell and Kleinberg on diffusion patterns in chain email, and suggested that the emails sampled might be at least somewhat atypical. Make that ‘plausibly _very_ atypical.’ A new article at PNAS suggests a model under which the rather odd looking patterns of diffusion that Liben-Knowell and Kleinberg found can be explained if these chains are the rare survivors of a process in which most chain emails fail much earlier.

bq. In particular, the simple Galton–Watson epidemic model suffices to generate trees reaching many nodes yet having long chains as in the data. To show this, we first fit the parameters of a Galton–Watson process by using maximum- likelihood estimation on the basis of one of the trees inferred by Liben-Nowell and Kleinberg. Then we simulate the process and examine only the rare outcomes in which a chain letter with these parameters spreads as widely as those that were observed. Most realizations are very small and have virtually no chance of being observed; we are interested in the properties of those rare ones that are big enough to match the public radio and war petitions described above. … The main difference between their approach and ours is that we do not explicitly model the network or detailed mechanics of the distribution process. We focus only on the random variable describing how many children each node has and on the selection bias. Those two ingredients alone suffice to produce a conditional distribution concentrated on trees with the right shapes.