Getting to the Source
Confessions of a Replication Scientist
Posted Sep 02, 2015
As of the latest tally on the Google News scroll, the coverage of the results of the Reproducibility Project has hit 338 articles and blogs (or "re-blogs"). Some of these represent responsible coverage, even providing avenues for next steps, while others are laden with errors and can't even seem to be bothered to spell "psychology" correctly. Seems there are lessons to be learned here as much about the reporting of science in the news as there is to be about the reporting of results in journals. To that, I will just recommend that individuals go to the original source, the research article itself, before drawing conclusions. It isn't that long. If you really want to be a responsible consumer of science you could even get into the nitty-gritty of looking at each of the 100 studies replicated as each individual report is posted - with data - for the world's scrutiny on the Open Science Framework.
Among those individual studies, you can find the contribution of our Social Relations Collaborative. When we contacted the Center for Open Science in the late summer of 2014, we did not anticipate that our simple volunteering to "help out" would turn into a Science article sending ripples through our field. Rather, after having a positive experience being one of fifteen labs selected for one of the Center's earlier replication endeavors (i.e., the project to replicate important findings in social psychology), we merely wanted to pay the Center back for their support and see what we could do in the way of service. After all, we felt the Center was fostering an important avenue of self-examination for our field and were encouraging an essential, but oft-neglected, ingredient of scientific method: replication.
We had anticipated that perhaps we would be asked to review the grant proposals of others. Instead, we were asked to look over the list of studies the Center had compiled from 2008 editions of three top tier journals in psychology (Psychological Science, the Journal of Personality and Social Psychology, and the Journal of Experimental Psychology: Learning, Memory, and Cognition)1. We were to see whether there was a study we would be willing and able to try to replicate. In fact, knowing we were a relationships lab, they kindly suggested a number of relationships studies. Given the time frame, our resources, and our respect for Murray's work, we chose “Balancing connectedness and self-protection goals in close relationships: A levels-of-processing perspective on risk regulation” by Murray, Derrick, Leder, and Holmes (2008) in the Journal of Personality and Social Psychology.2
As lead on the project, I recall completing a questionnaire which included a question about whether I had any reason to expect that the findings of the original article would not be replicated. I answered "no."
My entire perspective going into the replication process of any study is to trust the original finding and to expect replication. Why not? I have faith in my field and think highly of my colleagues and their work. My goal, and that of my colleagues within the Collaborative, is to test (or rather re-test) that faith. After all, faith in science is one that relies on confirmation (and re-confirmation) of effects. In our scientific culture, we look for, want, and hope for, confirmations; whether of hypotheses, theoretical assertions, new perspectives, or past findings. However, we also wish to remain objective, reporting the facts - just the facts. Thus to avoid our own confirmation biases and file drawer effects, we need to go into each research study knowing that failure to confirm (which isn't the same as disconfirmation) is important. If we can value the contribution of both, we can conduct our science employing our best efforts rather than chasing the p-value less than .05.3
Does the "failure" to find the exact same results as in the original papers mean I have lost my faith? No.
It would be an oxymoron to say one study in psychology meant that the field of psychology was "psycho-babble" or "total bunk." Rather, much as has been echoed by other authors in the Open Science Collaboration, we believe that the present study is actually evidence of science working as it should. More sciences, beyond psychology, should pursue studies to determine replicability rates within their fields. After all, issues of replication are not limited to psychology (see also here and here). In fact, John Ioannidis - now infamously - asserted most published scientific studies are false, estimating approximately half of results in medicine were likely erroneous but that the problem could be even worse in other fields (e.g., economics, biology, neuroscience). It just so happens that we in psychology are fond of self-examination.
Do the findings of the Reproducibility Project make me question the integrity of my fellow social psychologists? No.
Even though psychology seems to be taking a lot of hits (particular for having a lot of "hits" see Figure 1), and social psychology has been called out as "devastated" in particular. It is not as if we aren't accustomed to being under attack as a false science. However, there are studies which have attempted to document the rate of unethical practices across sciences, and I do not believe that 50-60% of my colleagues are all Diederik Stapels hiding in ivory towers with locked file drawers engaging in deceitful practices (-cue the evil laugh-). As noted in the original Science article, there are a myriad of reasons as to why a study did not replicate. Further, it is important to pursue narrow-and-deep replication attempts in addition to the broad-and-shallow approach pursued by the Reproducibility Project.
So what does it mean?
Let's start with the goal. Although scientists pursuing replications have been called meanies, twits, envious second stringers, bullies, or replication police (and some may be or may be not), my goal in participating in replication research is to reveal our strengths and our areas for improvement to make our science stronger. My goal is the same as it is for all of my scientific pursuits; to uncover answers to questions and build knowledge. Well-intentioned curiosity, not McCarthy-esque persecution, drives me. Name-calling, on either side, gets us nowhere.
Yes, a number of studies failed to replicate completely. This lack of replication is important, as it highlights the value of direct replication (in addition to conceptual replication) as well as the need for potential culture shifts to reflect that value and facilitate the practice. It also illustrates that scientific findings should not be treated like some sort of one hit wonder, but rather that the practice of science needs to be programmatic, meticulous, and (preferably) well-documented. Because guess what? Science is hard. Especially the science of deciphering the puzzle of human dynamics.
The pieces to the human puzzle are not black and white. After all, a simply black and white puzzle would be pretty boring.
Let us keep in mind that there were a number of replications that were "successful" and it is worth noting that the "did it replicate or not" is not as black and white as the question implies. The replications fell on a spectrum as noted in the Nature article (see the decidedly not black and white Figure 2) that preceded the official Science publication. This spectrum was also recently noted by Alex Etz wherein he proposed a Bayesian means to compute replicability rates which takes into account that replication "success" isn't an either/or conclusion. Qualified answers are something most scientists are used to, hence the required "more research is needed" discussion sections.
For example, even in the replication we ran we partially replicated certain findings and found, interestingly so, that individuals vary in their responses to perceived partner criticism depending on their level of self-esteem. In particular, those low in self-esteem withdraw social support - but still report desiring it - from their partner during times of relational uncertainty (i.e., when believing their partner sees them as having numerous faults). Those high in self-esteem exhibit the opposite pattern: reaching out to lend help to partners in times of need, but reporting that they are less likely to seek help themselves from a partner perceived as critical. This is an interesting finding and consistent with the theoretical model being tested.
What didn't work? Mostly the prime (i.e., priming "approach goals"). And this isn't the first time that priming has failed to replicate. Daniel Kahneman noted that social priming research was in need of scrutiny.
Likewise, in our earlier replication of a classic study in social psychology of the Romeo and Juliet effect (Driscoll, Davis, & Lipetz, 1972)4, we failed to find evidence to support the notion that interference from parents increased love and commitment. However, we did find consistent - and strong - support for the social network effect (Felmlee, 2001), showing that approval from social network members - friends and family - bolstered relationship quality (e.g., love, commitment, trust), whereas disapproval (and interference) harmed relationships. And, in fact, with regard to our finding that parental interference heightened partner criticism and lowered trust, that was actually something found by Driscoll and colleagues originally but it was the novel effect -- interference heightening love -- that had legendarily lived on in texts, lore, and on the web. We do like our novel things, after all. In fact, I think there may even be some psychological research on that.
I think there may also be some psychological research on us giving extra attention to bad over good. Which may be why headlines surrounding the Reproducibility Project proclaiming "failures" seem to prevail.
In sum, I urge my colleagues, scientists at large, and all those who are critical consumers and fans of science to take a step back from the sensationalist headlines, take a deeper look before overlooking, and take a deep breath before responding. There are solutions to be had (see also here and here and here). Some of which are simply rooted in also us being more meticulous about addressing the methodological issues about which we preach. So while we are advancing new ideas let's also revisit Merton's (1942) norms of science and Meehl's (1993) lessons on the Philosophy of Science in comparison to current cultural norms. This isn't a crisis. This is a step forward. Future research is needed.
1.Note, the entire planned design for a reproducibility collaboration was laid out by the Center in a 2012 article in Perspectives on Psychological Science.
2. As with other researchers, we were only asked to replicate the central question of the last one of the multi-studies in the original paper. Plus, all materials, including the study script, were kindly provided by Dr. Murray herself.
3. Admittedly, I still get excited when my analyses yield p < .05. It is a difficult automatic association to overcome.
4. This replication was also conducted with the input of Dr. Driscoll throughout (plus, full disclosure, I consider Dr. Keith Davis a personal friend and was in contacted with him as well) and the study was chosen as our lab had been pursuing the Romeo and Juliet effect for years resulting in new discoveries about the influences of social network influence.