Replication Crisis

What Is Wrong With Social Psychological Science?

It goes way beyond the "replication crisis."

Posted October 13, 2016

Source: Wikimedia Commons

This blog entry summarizes some of the many and deep dysfunctions that characterize psychological science, and, especially social psychological science. Some of this involves "looking under the hood"—i.e., technical, statistical, and methodological aspects of social psychological research.

Isn't this incredibly boring, compared to, e.g., reading articles on “Six Ways to Better Sex” or “Five Signs that You are Living with a Sociopath”—the type of articles one often finds in other Psych Today blogs. Why should anyone other than scientists care about this?

This is why you should care and why I think many of you will find this interesting. The “information” in those advice columns can come from two places: The professional and personal experiences of the authors, or the science. In the absence of good science, professional and personal experience is all we have to go on, and it is definitely better than nothing.

But the gold standard is science. When science is conducted in a high quality manner and yields clear answers to some question—that is the answer I would go with every time. And if getting the right answer is important to you, you should try to track down the science.

Would you rather conduct your life based on your 95 year old great grandfather who smoked 3 packs every day till he finally kicked, or on the mountain of evidence that smoking places you at much higher risk for all sorts of awful diseases and an earlier death? Of course, this only makes sense if the science is sound…

So here is the view from under the hood...

The Controversial Fiske Essay

This is my second post inspired by a controversial, even inflammatory essay, by an eminent social psychologist. (Go here for the first entry).

In that essay, which was framed as a call for greater civility in scientific criticism, former President of the Association for Psychological Science, Susan Fiske, lashed out at online critics of psychological science, calling them all sorts of nasty names, such as "bully" and "methodological terrorist."

In this post, I review the deeply dysfunctional scientific context from which Fiske's essay emerged. There is a military term, beginning with "cluster" and followed by THE four letter word, for an operation in which multiple things have simultaneously gone badly wrong, once popularized by Jon Stuart on the Daily Show. It arguably applies well to the current state of Social Psychology.

WHAT HAS GONE WRONG WITH PSYCHOLOGICAL SCIENCE?

The Replication Crisis

Sometimes, the problems in social psychology are referred to as a "replication crisis" because:

1. Many of us believe we have recently discovered that much of our published empirical literature fails to replicate. All sorts of famous, cute, and counterintuitive findings have been subject to replication failures. See Are Most Published Social Psychological Findings False for more details. Towards the end of this post, I provide a list -- a very very long list -- of phenomena or findings, many of which were quite famous and made splashes in mainstream news outlets because they were amazing!!! that have been subject to failed replications or other sources of doubt.

Source: Lee Jussim, the floods are a-risin'

2. However, there is not even a consensus in social psychology as to what "counts" as a successful versus failed replication. For example, when the Open Science Framework (OSF) conducted multi-site replication attempts of many social psychological studies, they estimated replication in a variety of ways, and came up with a figure between 25% and 50%, depending on the criteria.

Gilbert et al (2015) argued for different standards, and concluded that nearly 85% replicated.

Simonsohn (2016) argued that both were wrong.

I am not going to "resolve" this issue; my main point here is that social psychologists cannot even agree on what counts as a replication. Until we, as a field, reach some sort of consensus on this issue, we* will be reduced to endless internal quarreling about what replicates and what does not, and what it all means.

* And by "we" I mean social psychological researchers. Tangentially, if we have no consensus on these issues, we might consider being a bit more tolerant of laypeople who do not necessarily always believe our sometimes lofty claims.

Regardless of what one thinks of the OSF replication study, many social psychologists do believe there is a replication problem, and for good reason. Lots of stuff does not replicate. I have posted repeatedly on Psych Today about this very problem:

Replication Crisis Essential Reads

Revisiting the Self-Refilling Bowl of Soup

Is Psychology Research Unreliable?

See my two Unicorns blogs (Social Psych Unicorns and Unicorns of Social Psych) and my more recent entry titled "Are Most Published Social Psychology Findings False?"). Spoiler: I doubt the answer is "yes," but it is very hard to know for sure. And it is even more difficult to know which findings, in our vast history of research publication, are likely to be robust and valid, and which are completely false, not completely false but wildly overstated, and which we can really hang our hats on.

However, "Replication Crisis" is a misnomer, because it focuses too narrowly on failed replications as a threat to social psychological science. Failed replications are a threat, and a big one. But...

The Validity Crisis: Dysfunctions in Social Psychological Science Go Way Beyond Failed Replications

Here is a brief sampling of additional problems that threaten the integrity of social psychological science, and how they create potential, and often very real, problems that undermine the validity of social psychology.

Tiny samples. Statistics computed with tiny samples border on meaningless, comparable to saying, “we are pretty sure the temperature tomorrow will be somewhere between 10 and 100 degrees Fahrenheit”. The Zimbardo Prison Study is perhaps one of the most famous examples of a study given wildly intense publicity based on a tiny sample of 24 participants.

And here is a tidbit that is really really wacky. A recent study by science reformers Chris Fraley and Simine Vazire (founder of the Society for the Improvement of Psychological Science) found that the more prestigious and influential journals in social psychology -- the ones that "everyone" (in the field) reads and which have huge impact factors and citation indices -- average smaller sample sizes than do the less prestigious and impactful journals. Who cares? Studies with small samples produce less credible results than those with larger samples. You might think that the "better" scientific journals would be guarantors of high quality science. But if you thought that, you would be wrong (which is not to say that everything published there is bad, only that they frequently failed to uphold one of the most important and obvious sources of the science being good).

10 years ago, I would have written "you can shoot me now" -- however, it is precisely because of the evidence and advocacy of reformers like Fraley and Vazire that things are, in fact, beginning to get better (exactly how is a blog for another day, but one simple change is that many journals may be requiring larger sample sizes).

Source: Lee Jussim, a building that lacked integrity

Unrepresentative samples. We often unjustifiably reach conclusions about “people” based on participants who clearly do not represent "all of humanity." Usually, they are not representative of any known group, not even college students (they can be college students without being representative of college students in the same way that you can be from Iowa without being representative of Iowans).

P-hacking and lack of transparency. We sometimes run lots of statistics, cherry pick the ones that say what we want, then concoct a compelling story. This has been called "p-hacking" to capture the ideas that, intentionally or not, researchers sometimes "hack" their way to the Holy Grail of "statistical significance," which is usually necessary to get a paper published. That is, we sometimes conduct our statistics like an archer who shoots an arrow, then draws a target around wherever the arrow lands, with the arrow in the bullseye. The Garden of Forking Paths captures the idea that there are many ways to analyze the data, and we only report a subset that gets us where we want to go (preferably, a publication in a prestigious journal).

Cronyism. Scientists are sometimes biased in favor the work of their friends and current and former students.

Suboptimal statistics. The workhorse stats in most of psychology do not answer the question researchers usually want answered. Usually, researchers want to know, “Is my theory or hypothesis right, or, at least, more right than known alternatives?” The stats most of us use answer the following question, “How likely is the relationship or difference I obtained to have occurred, or one even larger (more extreme), if the null hypothesis is true?” (the null hypothesis usually states that, in the larger population, there is no effect or relationship at all). If this sounds convoluted to you, and you cannot follow it, do not be too upset. It is convoluted. Actually, to reach the conclusion, "Eureka, my hypothesis is confirmed!" the logic gets even more convoluted. That is because the logic needs to be tortured to get the stats to look like they say "My hypothesis is confirmed!" when they do not actually do that.

Publication biases. Journals and researchers usually require empirical studies to reach a statistical Holy Grail before publishing, ("statistical significance," aka as "p<.05"). For the statistically uninitiated, you might be wondering what the Hell is p<.05? It means "there is less than a 5% chance that these data, or data more extreme, would occur if the null hypothesis were true." What is the "null hypothesis?" It is the hypothesis that there is no effect whatsoever of one's experimental conditions or intervention, or that there is no relationship whatsoever between two variables. So, e.g., one "null hypothesis" would be, "Confidence is uncorrelated with performance."

P<.05 has long been interpreted as meaning as “Eureka, my amazing! world-changing!! hypothesis is confirmed!” and journals rarely were interested in publishing unconfirmed hypothesis. One consequence of this is that the published literature often exaggerates the power of social psychological processes and hypotheses. Think about it. If only “strong” (p<.05) effects get published, then many of the effects near zero do not get published. But we need to know about those near zero effects to figure out the power of the phenomenon being studied. But we do not know about them because they are not published.

More publication biases. Carl Sagan once said, “Extraordinary claims require extraordinary evidence.” This is because something extraordinary is highly unlikely to be true. To overcome the justified skepticism about such a claim, the evidence should be very strong. Social psychology has functioned in an almost entirely opposite manner: “Extraordinary claims require tiny little sample sizes.” This is because social psychology has a weird culture of valuing “surprising” and “counterintuitive” findings. The posterchildren for these dysfunctions are the infamous priming elderly stereotypes/walking down the hall study that helped trigger the replication crisis, which had two experiments “demonstrating” this effect, each with a mere 30 participants; and stereotype threat, in which the largest study (114 participants) had no significant stereotype threat effect, but one was reported in studies with 40 and 47 participants). Such findings are seen as innovative and creative, so, rather than holding researchers to higher evidentiary standards, many hold such studies to lower standards.

The upshot is that the social psychological scientific literature may be peppered with amazing! counterintuitive! world-changing! dramatic! results that are, at best, difficult to replicate, and at worst, not true at all. This is the scientific version of the old saying, "If it sounds too good to be true, it probably is." How many findings does this describe? Which ones? No one yet knows.

Storytelling and overselling. Researchers often make extraordinary claims based on very weak evidence. My book goes into case after case of this, where self-fulfilling prophecies and other biases are routinely touted as powerful and pervasive, when, in fact, the evidence shows they are weak, fragile, and fleeting. Stereotype threat, in which a minor situational tweak is claimed to completely eliminate racial achievement gap differences, is another example.

Researcher confirmation biases and questionable interpretive practices. Many in earlier generations of psychological scientists (Phds obtained 1970-2010) were trained to tell “compelling narratives.” Researchers have gotten so good at telling these stories that, sometimes, they do so regardless of what the data say. My recent article Interpretations and Methods, goes through example after example (after example after example) of social psychological papers telling such compelling narratives, even though the data did not support those narratives.

Cherrypicking results and studies, researchers can often reach almost any conclusion they prefer. Alternative explanations are often overlooked or ignored; and are often more viable than the published explanations. Sometimes, social psychologists go so far as to claim certain things are true without any evidence at all (e.g., claims that stereotypes are inaccurate; humans are a blank slate; standardized tests are invalid, etc.). Famous studies often get cited at much higher rates than the failed replications of those same studies, even after the failed replications get published.

Undead theories. Some of us believe that falsification has a central role in science. Some of us believe that scientific progress is reflected in discarding old, bad or suboptimal theories with better and more valid ones. It is, however, almost impossible to declare any theory in social psychology “wrong” or falsified. Some possible reasons for this bizarre state of scientific affairs can be found here: "Is it offensive to declare some psychological claim to be wrong?"

Status biases. Psychologists sometimes act like those of us with named chairs and Ivy League appointments warrant greater voice, attention, and credibility – and access to publication and funding outlets – than do those of us without such status. But science should be about the quality of the data, not the status of the authors.

Political biases. For a long time, social psychology has been something of a club for people with left-wing political views. This has driven some scholars away from the field, and it has distorted conclusions in politicized topics.

IS SOCIAL PSYCHOLOGICAL SCIENCE UNSOUND?

The short answer is ... no one knows. Some probably is; some is surely fine, and we currently have little or no consensus on how to figure out which is which. In fact, it is probably the wrong question. The right question, or, at least, a better question is, Which social psychological findings are sound and valid, and which are not?

Each of the Following Results or Conclusions Were Once Widely Accepted. They are all now under a cloud because they are either:

Known To Be False, Exaggerated, or Misrepresented
Subject To Failed Replications
Or their Conclusions are Dubious, Or Otherwise In Doubt.

If you are not familiar with any of these areas of research, you can find out more as follows:

1. Topics with links are to my blogs or papers that discuss them.

2. For nonlinked topics, you can do a Google search for that topic + "replication" or "replication failure" or "reproducibility" and you will usually be able to find the original result and the source of doubt.

Stereotype threat
all sorts of priming studies
top down influences on perception
massive ingroup bias in perceptions of a football game
the power of the situation
climate skeptics believe bizarre conspiracy theories
conservatives deny environmental realities
conservatives are more biased than liberals generally
self-fulfilling prophecies are powerful and pervasive
stereotypes are the default basis and powerful distorters of person perception
stereotypes are inaccurate
power posing (expansive nonverbal postures have extraordinary influences on confidence and how others perceive you)
so-called “wise” interventions increase achievement or voter turnout
demographic gaps result from discrimination
scores above 0 on the prejudice version of the implicit association test are inegalitarian
implicit prejudice measures predict discrimination better than do explicit measures
Stereotypes lead to their own confirmation
ego depletion
the facial feedback hypothesis
himmicanes
large portions of social neuroscience
and probably a lot more that I have not listed.

Source: Lee Jussim, a collapsing building

The controversies over whether swaths of social psychology are collapsing around our ears have ignited scientists' passions. These are sometimes righteous passions, sometimes even self-righteous and insulting passions, about what is science, what counts, who is doing good science, and who isn't. That is the scientific context from which Fiske's essay, with its incivil charges of inciviility, emerged.