Fraud, Disclosure, and Degrees of Freedom in Science
There are lies, damned lies, and statistics.
Posted May 10, 2012
I point out in The Folly of Fools that science is naturally self-correcting—it requires experiments, data gathering and modes of analysis to be fully explicit, the better to be replicated and thus verified or falsified—but where humans or social behavior are involved, the temptation for quick and illegitimate progress is accelerated by the apparent importance of the results and the difficulty of checking on their veracity. Recently cases of deliberate fraud have been uncovered in the study of primate cognition (Harvard), the health benefits of resveratrol (U Conn), and numerous social psychology findings (Tilburg U, Netherlands). I will devote some later blogs to other aspects of fraud in science but will begin here with a very clever analysis of statistical fraud and lack of data sharing in psychology papers published in the United States. This and related work suggest that the problem of fraud in science is much broader than the few cases of deliberate, large-scale fraud might suggest.
Wicherts and co-authors made use of a little noted feature of all papers published in the more than 50 journals of the American Psychological Association (APA)—the authors of these papers commit by contract to sharing their raw data with anyone who asks for it, in order to attempt replication. Yet earlier work by this same group showed that for 141 papers in four top APA journals, 73 percent of the scientists did not share data when asked to. Since, as they point out, statistical errors are known to be surprisingly common and accounts of statistical results sometimes inaccurate and scientists often motivated to make decisions during statistical analysis which are biased in their own preferred direction, they were naturally curious to see if there was any connection between failure to report data and evidence of statistical bias.
Here is where they got a dramatic result. They limited their research to two of the four journals whose scientists were slightly more likely to share data and most of whose studies were similar in having an experimental design. This gave them 49 papers. Again, the majority failed to share any data, instead behaving as a parody of academics. Of those asked, 27 percent failed to respond to the request (or two follow-up reminders)—first, and best, line of self-defense, complete silence—25 percent promised to share data but had not done so after six years and 6 percent claimed the data were lost or there was no time to write a codebook. In short, 67 percent of (alleged) scientists avoided the first requirement of science—everything explicit and available for inspection by others.
Was there any bias in all this non-compliance? Of course there was. People whose results were closer to the fatal cut-off point of p=0.05 were less likely to share their data. Hand in hand, they were more likely to commit elementary statistical errors in their own favor. For example, for all seven papers where the correctly computed statistics rendered the findings non-significant (10 errors in all) none of the authors shared the data. This is consistent with earlier data showing that it took considerably longer for authors to respond to queries when the inconsistency in their reported results affected the significance of the results (where responses without data sharing!). Of a total of 1148 statistical tests in the 49 papers, 4 percent were incorrect based only on the scientists’ summary statistics and a full 96 percent of these mistakes were in the scientists’ favor. Authors would say that their results deserved a ‘one-tailed test’ (easier to achieve) but they had already set up a one-tailed test, so as they halved it, they created a ‘one-half tailed test’. Or they ran a one-tailed test without mentioning this even though a two-tailed test was the appropriate one. And so on. Separate work shows that only one-third of psychologists claim to have archived their data—the rest make reanalysis impossible almost at the outset! (I have 44 years of ‘archived’ lizard data—be my guest.) It is likely that similar practices are entwined with the widespread reluctance to share data in other “sciences” from sociology to medicine. Of course this statistical malfeasance is presumably only the tip of the iceberg, since in the undisclosed data and analysis one expects even more errors.
The depth of the problem was beautifully revealed in a recent paper by Simmons, and co-authors. The take home message is in (part of) the title: “Undisclosed flexibility in data collection and analysis allows presenting anything as significant”. And they mean anything. In one faux study they ran on real subjects, they managed to prove that listening to one kind of music changed one’s birth-date compared to listening to another. How did they achieve this astonishing—and very important!—result? By introducing father’s age of each subject as a dummy variable meant to “control for variation in baseline age across participants”. Probably the most common and effective ploy is to continue to gather data until a result is significant, then stop. Authors have a large number of “degrees of freedom” regarding data analysis and presentation, which degrees give them ample opportunity—not to “massage” their data, as scientists like to put it—but to create truth out of randomness.
So there is good news and bad news. The bad is that there is ample latitude for statistically significant deception and a lot of motivation to produce, and then hide it. This can operate at varying degrees of consciousness. At the same time, science is self-correcting, the studies cited above being nice examples. To have a science of truth—especially regarding human social life—requires a science of untruths, including new methodologies for exposing them.
Wicherts, JM, Bakker, M and Mlenar, D. 2011. Willingness to share research data is releated to the strength of the evidence and the quality of reporting of statistical results. PLoS One: 6: 1-7.
Simmons, JP, Nelson, LD and Simonsohn, U. 2011. False-positive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological Science 22: 1359-1366.