Looking Under the Hood of Research

Some potential weaknesses in psychological research.

Posted Oct 27, 2011

Psychology research has the power to change the world.  Understanding the way people think and act toward others gives us information to allow us to create evidence-based recommendations for making ourselves more effective.  Research on mental illness can also transform people's lives. 

But the quality of the recommendations we draw from the research is only as good as the data on which those recommendations are based.  So, it is worth looking under the hood on research a bit and thinking about some of the weaknesses in the way that data are collected.

Most research that is presented in the scientific journals has to reach a particular level of scientific reliability that could be called the 5% rule.  That is, if an experiment reports that there are two groups or experimental treatments that differ, the statistical procedures that we use are designed so that there is supposed to be only a 5% chance that the groups did not actually differ.  This rule is so important, that papers that do not satisfy this rule are rarely accepted for publication.

A forthcoming paper in the journal Psychological Science by Joseph Simmons, Leif Nelson, and Uri Simonsohn examines some common scientific practices and suggests that this 5% rule does not really hold.

There are a few practices that researchers use that cause trouble for this rule.

1) When doing an experiment with a new method, researchers often run a small group of participants to see if the study is working.  If it is working as expected, they continue to collect data, checking it periodically until the 5% rule is satisfied.  If the study is not working as expected, the experimenters stop and repair the method and run it again.   This process is repeated until the study starts producing interesting results or until the experimenters give up on that line of research.

2) Researchers sometimes collect a few different measures, but only report the ones that exhibit strong effects.  The idea is that other measures may not have been sensitive enough.

3) Researchers may run several conditions in the study, but only report the ones that show reliable effects.  The other conditions are seen as a distraction in the writeup.

Simmons, Nelson, and Simonsohn do statistical analyses to demonstrate that these practices can lead to cases in which turns the 5% rule into a 60% rule.  That is, these procedures can actually create cases in which it is more likely that an observed difference does not really exist than that it does.

To demonstrate this point, these authors ran a study in which they had people listen either to a Beatles tune (When I'm 64) or a neutral song. They found that the actual age that people reported their actual age as being significantly lower after listening to the Beatle's song than after listening to the neutral song. 

Of course, even if this result were statistically valid, it would be nonsensical.  Listening to a song cannot make you younger (though it generally does make you a few minutes older).  What the researchers actually did, though, was to collect a large number of measures, and to start testing the reliability of differences after running 10 participants.  They kept checking until one of the measures achieved a reliable result and then reported that result.   

The authors of this paper make a number of recommendations for authors and reviewers to help guard against these problems in the future.  In this blog entry, though, I am going to focus on what you as a reader can do.

A key part of experimentation is replication.  Often, studies are not done in isolation.  Rather, one group of researchers finds an intriguing result and then other labs pick up that procedure and extend it.  If a paper presents several studies that reveal the same basic pattern of results, that is a good sign.  When different labs report compatible results, that should also increase your confidence in a finding.  If you stumble on a single report of an amazing result, and no other studies come out that replicate the finding, that is a sign that the finding may not be reliable.

There are two types of research that are most likely to be affected by these research practices.

First, some areas of work thrive on novelty.  Newspapers and blogs often report strange findings from psychology.  Researchers looking for novel findings often try a number of different methods, and explore many measures.  These practices may help to inflate the possibility that the new finding is not reliable.

Second, some areas of research are quite expensive to do.  In particular, studies in cognitive and social neuroscience that use functional Magnetic Resonance Imaging cost a lot, and so there is a real premium in these studies on ensuring that something useful will come out of every study.  Likewise, many developmental psychology studies are expensive, because it is difficult to get infants and toddlers into the lab. 

When reading research in these areas, it is particularly important to treat new findings with some skepticism until more work can be done to ensure that the results are obtained reliably. 

In the end, the field of Psychology is quite healthy, and the large amount of ongoing research ensures that we generally find out which phenomena are reliable fairly quickly.  Nonetheless, any research finding that seems too good (or weird) to be true should be kept at arm's length.

Follow me on Twitter

Check out my new book Smart Thinking (Perigee Books) to be published in January, 2012.