Scientific Reform Works
How reliable can psychology be when people use methods from reformers?
Posted Sep 13, 2019 | Reviewed by Abigail Fagan
Last weekend, I attended a new scientific conference: Metascience 2019. Metascience is when scientists study science itself—not "what is the new scientific finding?" but "how did scientists get that new finding?" The experience was at times exhilarating, because it was affirming to see that so many other people cared about understanding and improving the processes by which science is produced.
It was also at times depressing, as several talks involved important but difficult stories. The project that tried to replicate findings from cancer biology started by looking at 30+ papers and found they didn’t have the information needed to know how to replicate a single one without asking the authors for more details; the project that tried to reproduce epidemiological studies using the exact same data and instructions from the paper and found they couldn’t in over 30 percent of cases; or the simulation study showing that our current grant system would be more efficient by implementing a lottery for everyone above a basic threshold of competence.
There was a lot to digest from the conference, but I think one point that it's easy to overlook relates to a large social psychology project whose results were unveiled at the conference. It’s a major point of optimism on my part, and makes me feel good about the progress the field is making. But before I explain it, some context is needed.
Social psychology—and related disciplines—have a problem with replication. Slides from speaker Simine Vazire summarize the results of several large-scale and well-known replication projects: Experiments only replicated about 45 percent of the time. Given the data we’ve got, our best estimate that a study in social psychology will turn out the same way if you do it twice is a little worse than a coin flip.
Not summarized but also well known is that effects found in the literature tend to be smaller in replication attempts. Jonathan Schooler presented the results of a large scale project where he and his collaborators investigated this “Decline Effect” in studies. Schooler hypothesized that the decline in effect sizes was due to something mysterious, maybe even paranormal. But to support this claim, he needed to establish that the Decline Effect couldn’t be attributed to more normal causes of scientific error.
This involves well-known issues with scientific reporting like:
- Using questionable statistical practices to make an effect look bigger than it actually is.
- Using several measures and only reporting the one with the biggest effect.
- Running lots of studies and only reporting those that have big effects.
These types of strategies bias science so that the results that get published are larger than the real effect size.
So Schooler and his team completed a project where they found 16 new effects in social psychology (results for only 13 were completed) and tested whether they would decline once they were replicated. To rule out that the decline was due to poor reporting and fudging the numbers, Schooler’s team adopted the best practices scientific reformers have been advocating for:
- They preregistered their hypotheses, writing down their specific predictions before conducting their tests.
- They collected large samples, making it possible to give a more definitive answer as to whether an effect exists or not.
- They checked their effects in multiple labs, with experiments being run by multiple people.
The results: no evidence for a decline effect, but 89 percent success in replicating new effects. Compare that to the estimate of replicability Vazire reported for social psychology and related disciplines more broadly: 45 percent. When psychologists conduct research under normal conditions—with no strict checks on whether they are being careful in their methods—we have a replication rate below 50 percent. When psychologists conduct research rigorously sticking to the standards of good science, we have a replication rate around 90 percent.
This is a huge success for the Credibility Revolution. We now have a direct comparison of how reliable research is when we don’t address the concerns of reformers and how reliable it becomes when you do. Of course, this is a small sample of replications where we are sure the proposals of preregistration, high power, and direct replication have been used. But this is also a result that is easy to detect without even doing a statistical test. As my stats professors used to joke, it passes the “intra-occular trauma test:” It hits you right between the eyes.
Seeing how much psychology research could be improved by adopting methods we already know makes me excited to stay in the field. I want to see people making huge improvements in the way they do research. I can imagine the advice we give patients, policymakers, and everyday people trying to improve their lives being twice as effective. I can imagine psychologists writing fewer self-help books and implementing more programs that actually improve society—and the self-help books we are writing working better to help people. I can see students thinking that psychology classes aren’t fun and easy but rigorous and important. Society has a lot of problems, and stronger psychology can address them with more reliable and effective solutions. Who wouldn’t want to see that version of psychology?
LinkedIn Image Credit: Roman Samborskyi/Shutterstock