Result shows how Social Media statistics can be misused.
Since the early days of social media, there has been excitement about how data traces left behind by users can be exploited for the study of human behaviour. Nowadays, reseachers who were once restricted to surveys or experiments in laboratory settings have access to huge amounts of “real-world” data from social media.
The research opportunities enabled by social media data are undeniable. However, researchers often analyse this data with tools that were not designed to manage the kind of large, noisy observational sets of data you find on social media.
We explored problems that researchers might encounter due to this mismatch between data and methods.
What we found is that the methods and statistics commonly used to provide evidence for seemingly significant scientific findings can also seem to support nonsensical claims.
The motivation for our paper comes from a series of research studies that deliberately present absurd scientific results.
Why would a researcher go out of their way to explore such ridiculous ideas? The value of these studies is not in presenting a new substantive finding. No serious researcher would argue, for example, that a dead salmon has a perspective on emotions in photos.
Rather, the nonsensical results highlight problems with the methods used to achieve them. Our research explores whether the same problems can afflict studies that use data from social media. And we discovered that indeed they do.
Researchers found the letters X, Y, and Z make tweets more shareable. The nonsensical result shows how easily statistics can be misused.
Positive and negative results
When a researcher seeks to address a research question, the method they use should be able to do two things:
reveal an effect, when there is indeed a meaningful effect
show no effect, when there is no meaningful effect.
For example, imagine you have chronic back pain and you take a medical test to find its cause. The test identifies a misaligned disc in your spine. This finding might be important and inform a treatment plan.
However, if you then discover the same test identifies this misaligned disc in a large proportion of the population who do not have chronic back pain, the finding becomes far less informative for you.
The fact the test fails to identify a relevant, distinguishing feature of negative cases (no back pain) from positive cases (back pain) does not mean the misaligned disc in your spine is non-existent. This part of the finding is as “real” as any finding. Yet the failure means the result is not useful: “evidence” that is as likely to be found when there is a meaningful effect (in this case, back pain) as when there is none is simply not diagnostic, and, as result, such evidence is uninformative.
What is a ‘meaningful’ finding?
The issues raised in our paper are not new, and there are indeed many research practices that have been developed to ensure results are meaningful and robust.
For example, researchers are encouraged to pre-register their hypotheses and analysis plans before starting a study to prevent a kind of data cherry-picking called “p-hacking”. Another helpful practice is to check whether results are stable after removing outliers and controlling for covariates. Also important are replication studies, which assess whether the results obtained in an experiment can be found again when the experiment is repeated under similar conditions.
These practices are important, but they alone are not sufficient to deal with the problem we identify. While developing standardised research practices is needed, the research community must first think critically about what makes a finding in social media data meaningful.