Credibility and Its Discontents
Science reformers got change, but did they lose something important?
Posted Jun 23, 2019
One of the most important changes the Credibility Revolution in psychology has brought about is widespread awareness of the need to collect a lot of data to be able to make statements about psychology with confidence. As I wrote previously, an earlier generation of psychologists was trained to collect data comparing groups of just 10 or 20 people. Generations of statisticians have simultaneously been telling us definitively that these sample sizes were too small. To make confident statements about how People Work in General, we need to collect data on a lot more people*.
The good news, from a recently published manuscript examining published articles in several top social psychology journals, is that this awareness is translating to practical change. In the four journals surveyed, the median sample size jumped from just under 100 people per study in 2011** to around 175 in 2016. Scientists are collecting more data per study.
The bad news, according to some, is that this increase in the number of participants has come through increasing reliance of psychologists on collecting data online. The new data shows that the proportion of online studies has jumped from ~10% in 2011 to ~50% in 2016. What can we learn about real behavior through online studies? Is this victory in meeting a statistical criteria actually causing more harm by making our research less meaningful?
These arguments have been part of the informal conversation about scientific reform for the last decade (at least), being shared at conference bars, in Facebook discussion groups, and in heated Twitter discussions. What follows is a more deliberate and thoughtful version of my frustrated social media response to these arguments.
First, real behavior takes place online. I keep up with friendships, get news, play games, and do my work online. Many people meet their significant others online. Online behavior is thought to have significantly influenced the outcome of the last U.S. presidential election. To have a complete picture of how people operate in the real world, psychology needs to be examining how people behave online. This behavior comes in the form of making decisions about what to click on, writing messages, and completing short tasks in a web browser—exactly the types of things psychologists are assessing in online studies!
More to the point, these are often the types of tasks psychologists ask people to complete in the lab. Many an undergraduate student in Psychology 101 has had the experience of walking into a psychology lab only to be placed in front of a computer and asked to fill out a survey. Most of psychology research in the past generation was conducted on “samples of convenience,” meaning the undergraduate students who were around a researcher who decided to run a study. Collecting data online typically allows researchers to assess people of a wider age range (18 to 70 or 80), of wider socio-economic status (not just people who can afford going to college), and on people from a wider range of geographic areas (across the U.S.—or the globe). This ultimately should make us more confident that our results from online samples are about People In General, as compared to research done just on college students.
But what about behaviors that can’t be captured in an online task? For example, Milgram’s classic experiments on obedience involved an elaborate set-up where participants thought they were delivering electric shocks to other people as part of an experiment on learning. That kind of scenario is something you can only create in the lab. Should we worry that this type of research wouldn’t be possible in today’s reform-oriented scientific culture?
Well, around 50% of the studies published in most social psychology journals are still not conducted online, so it seems like there is still room for research done in the lab. I’d guess that visionary work like Milgram’s would still get published in top tier social psychology journals. What we’re really talking about is squeezing out lab-based research that would have been considered of marginal quality (but still worth publishing) in an earlier regime.
There is a tradition in social psychology of conducting highly “theatrical” research using elaborate scenarios in the lab. For example, a person might be brought into a room and told to complete a series of mazes, and then when the researcher leaves the room her phone goes off. The real outcome of interest might be whether the participant goes to get the researcher to tell her that her phone goes off, and from that we are asked to infer something about the types of situations that cause people to be helpful.
However, these kinds of theatrical scenarios rely on a lot of complicated and unmeasured factors to work. Does the experimenter look competent or a little frazzled and like they could use a hand? Does the participant make small talk with the experimenter before sitting down to do the mazes? If yes, how does the experimenter respond? If no, does the experimenter try to? Does she give an encouraging smile? Is the smile the same for everyone? Ultimately scenario-based research that examines “real behavior” could be influenced by dozens of variables that aren’t being measured or reported in results.
Unless researchers really believe that these factors won’t influence the outcome being measured, it’s very difficult for the knowledge that comes from this sort of experiment to be trustworthy. Any possible result could be the result of the key theoretical variable, or it could be the result of the idiosyncracies of the naturalistic social interaction, including things like the mood, dress, or non-verbal responses of the experimenter—to say nothing of differences between participants. In short, computerized tasks give us more careful control over our experiments. We can be more confident that the effects we find using these types of controlled tasks are really due to our theory, and not due to chance factors. Removing the scenario-based research that is least carefully done and only keeping the best of this kind of research seems like an approach that will make the scientific literature more reliable overall.
Perhaps most importantly, though, is that research as it was done in previous generations just wasn’t good enough. Increasing sample sizes isn’t just a perk, like getting a Bluetooth for your car. It’s core to being able to make statements about how things really work, like having a working engine in your car. We needed to increase our sample sizes to do good science.
Conducting studies via online surveys and tasks is one way to more easily collect more data, but it’s not the only one. Data can be passively collected through participants’ smartphones, allowing people to potentially study behaviors like activity level and location throughout the day. Field experiments can be done where real behavior is observed and catalogued out in the world. If we changed requirements for psychology undergraduates from doing surveys in the lab to going out and observing real behavior, we could also collect large samples of behavioral data. Now that we know what we need to do better psychology, the answer isn’t to complain about what’s being lost, but to embrace technology and creativity to meet the requirements of creating a robust science.
* Larger sample sizes are just one of the things we need to do to make statements about People In General. Another is collecting data on many different types of people, not just college undergraduates or random people from the internet.
** This suggests scientists were already improving on the studies with just 20 people that were popular in the 70’s and 80’s. It took 30+ years to jump from 20 or 40 people per study to 100 per study, and only 5 years to jump from 100 to 175 people.