How Do You Get Psychology Experiments to Work?
New research addresses informal practices that make studies work.
Posted Sep 29, 2019
How do you get a psychology experiment to work? A new paper in the journal Collabra: Psychology by Jonna Brenninkmeijer and colleagues interviews researchers about the informal knowledge they need to get their experiments to work. They describe things like projecting a professional appearance, designing materials to be easy to read, and making sure that people who experienced an experimental manipulation provide responses on the outcome of interest soon afterwards (e.g., not having too many filler tasks). As someone who has worked in several psychology labs, these sound like pretty standard measures taken by many researchers. But an issue that comes up in the manuscript is that psychologists don’t really know whether these steps matter. They think “you have to do it this way to get ‘quality data’ from an experiment” but this belief is just based on intuition. They haven’t systematically checked whether professionally dressed research assistants, nicely designed materials, or time spent on filler tasks matters for an experiment.
This can be a problem, because not everyone agrees what the right “standard operating procedures” for getting good data are. One researcher says you should always start by asking about demographics, then move on to more psychological questions (e.g., about attitudes, emotions, etc.). One research says the opposite, arguing that demographics should always go last. When describing professionalism, some researchers say that an experimenter should be slightly removed, not engaging in too much chit chat with participants. Later on, researchers argue that some experiments require “people skills” from an experimenter and lots of talking with participants to put them at ease. A social psychologist who does lab experiments says he tries to “make it lively” for participants, which is mostly done in the way he delivers instructions.
These ad hoc techniques for trying to get an experiment to work make intuitive sense, and have some justification in the philosophy of science. Ian Hacking describes physics experiments as hard to get right. The typical high schooler finds a science lab hard precisely because there is a certain amount of implicit knowledge needed to stain a cell properly or to know where to look in a microscope to see a structure. People critical of the value of replication studies often latch on to this point. There are all kinds of little decisions that a researcher makes when setting up an experiment, and many of these are the kinds of informal things described in Brenninkmeijer and colleagues’ works. Couldn’t it be the case that when a new study fails to replicate a published effect, the reason is that the experimenter missed one of these small, informal steps that was crucial for getting high quality data? Couldn’t they be like the high schooler who can’t properly stain a cell?
That can certainly happen sometimes. But always arguing that a failed replication is due to some hidden detail of an experiment that someone didn’t get right isn’t reasonable, because it implies that the core hypothesis being tested can never be wrong. In essence, it is arguing that every failed replication is due to some mistake in the informal parts of running the experiments—something that’s been called the “hidden moderators” argument in the scientific debate. This argument recasts all psychology research as methodological. The underlying effect can’t be challenged; the only thing a researcher can ever claim is that a particular method doesn’t work. No researcher can ever contradict the interesting, theoretical part of another researcher’s work—the actual psychology.
In case you think I’m taking the argument too far, I will point you to the work of influential psychologists William McGuire. In an essay on the philosophy of science in psychology, McGuire argued, “we cannot really test a theory, because all theories are true” (p. 417). Drawing out the implications more fully, he writes:
“… if a person who apparently has been contemplating the origins of leadership says to us, ‘You know, taller people are more likely to rise to power than shorter ones,’ the statement is probably ipso facto true for having been derived. Of course, it is quite possible that some other reasonable person may come up with the opposite hypothesis that shorter people are more likely to rise to leadership. However, this is no embarrassment because not only are all hypotheses true, but so are their contraries if they too have been honestly and thoughtfully generated.” (p. 417)
So according to McGuire, the contrary hypotheses “taller people rise to power more” and “shorter people rise to power more” are both true because a psychologist came up with them. What we should do in psychology research is just accept that everyone is always right, and then look for the specific conditions that make people right. Maybe tall people succeed as CEO’s and short people succeed as movie stars. The point is that no hypothesis is ever wrong, it’s just conditional.
This leads McGuire to propose the following advice for coming up with new studies to run: “Take a very straightforward proposition (such as, ‘people like others to the extent that the others are like themselves’), reverse it by stating its contrary, and then generate the special circumstances under which the contrary would be true” (p. 417). The successful psychologist isn’t trying to build a comprehensive and precise understanding of a topic by pitting competing explanations against each other (as Platt argues good sciences do in the strong inference method). The successful psychologist is trying to come up with something counter-intuitive that they can somehow still get to come out their way.
The real work in this paradigm comes behind the scenes in tweaking the setup. McGuire argues that psychologists “expend several times as much effort in … prestudies to get the conditions right as one spends in the final study that is written up for publication” (p. 418, 2013). Of course, McGuire recognizes that this causes problems. As he notes, if we are only presenting a few clean results out of the dozens of studies that we run to get the situation just right, then we aren’t presenting the most difficult and important part of our work: systematically working out when an effect will hold.
The biggest problem comes in the way we report results. As McGuire puts it, “to find the conditions under which their outlandish, nonobvious hypothesis was indeed valid,” psychologists had to conduct lots of unreported studies and arrange conditions “often outrageously” so that “the far-out hypothesis was indeed confirmed.” But what got reported was that the general counter-intuitive statement was true, and “by implication … the obvious hypothesis was wrong.” So social psychology worked for a long time by reporting results that were exciting and counter-intuitive but misleading. Counter-intuitive statements made implying a relationship was generally true were not true in general at all. From the McGuire perspective, classic social psychology effects aren’t generalizable statements about how the world works, they are highly fragile and specific results that likely rely on very precise conditions—only some of which were reported.
Under this paradigm, McGuire was right to argue that what psychology really needed was to report the full set of studies that led up to the studies that “worked” are were reported in the journal. The problem, for McGuire, was that we weren’t willing to report our “failed experiments.” Instead, social psychologists misrepresented our process as testing a hypothesis—when really we were assuming the hypothesis was true and instead the work came in all kinds of background shenanigans to make sure it came out right. But what McGuire wouldn’t give up was the belief that psychologists coming up with theories were always right.
This perspective is frustrating because it refuses to let us compare the real hypotheses we care about. Do tall people or short people rise to power more quickly? Just saying “both are true sometimes” isn’t satisfying. What we need is some underlying theory of how to do psychology experiments so that we can isolate the particular situation we’re talking about and remove the influence of factors we don’t care about. We can begin to address this by stating our results more honestly: not saying “short people rise to leadership more” but saying that “short people rise in contexts, like Hollywood, where personal charisma matters more.”
Longer term, we want to have background knowledge—and even some theory—about where we are testing things. This is what Paul Meehl described as a “theory of the instrument.” Students who can’t properly stain a cell for observation under a microscope don’t have to guess blindly at why the process didn’t work. They can rely on some core understanding of how staining and microscopes work to guide them to the right technique.
Imagine a biologist saying that a specific cell both has a specific protein and it didn’t have the protein. We wouldn’t just say “of course! They’re both true!” We would want a frame of reference for understanding what “normal conditions” were, and we’d want to know what was the case under those normal conditions. If there were other typical conditions that changed protein composition, we’d want to check those out. Then if there were special conditions unique to this protein where it was more or less likely to want to be present, we’d want to know about those, too. In short, we wouldn’t accept the “everybody’s right in some situation” premise—we’d want a detailed theory to talk about when statements are and aren’t true.
The new paper by Brenninkmeijer and colleagues is important because it’s a first step towards gathering this kind of information on the way that psychology experiments are—and should—be run. The way we do this is by developing a detailed theory regarding how background variables—things that we don’t inherently care about, like the way experimenters explain instructions, the order of questions, and the design of materials—work, and then using this theory to minimize their influence.
At one point, the interviewers ask the researchers whether they would be interested in a handbook of informal information on how to conduct good experiments. One jokingly said Brenninkmeijer and colleagues should write this sort of handbook themselves. I hope they—or others—consider doing this. A handbook of the role of background variables would help us adjudicate claims about the validity of replications, because we wouldn’t be relying on after-the-fact, motivated claims by researchers that “of course doing an experiment in a room with a video camera wouldn’t work.” It would help us determine if the replication should be expected to work based on our understanding of the background variables. As a psychologist interested in building a deep, cumulative theory of how people work, I do not want to spend my career in McGuire’s world coming up with “outrageous” situations to support “far-out” hypotheses. I want to build the tools needed to test real, important theoretical questions. To do this, we need to start figuring out what background factors really matter in experiments, and which don’t.