Bargh et al (1996)

Main finding: Priming people with the elderly stereotype led them to walk more slowly down the hall.  

Citation count: 2062. (Note: Citation counts are one indication of how influential an article is; most articles get a few dozen citations; articles cited hundreds of times, or more, are usually classics in a field).

Importance: Priming of behavior, embodied cognition, stereotypes. Helped boost the modern wave of emphasis on the power of automatic and unconscious processes. 

Published failures to replicate: Cesario et al (2006); Doyen et al (2012); Hull et al (2002).

Notes: Bargh has claimed Ceasario and Hull et al did replicate the finding (see Bargh's Blog post titled Priming Effects Relicate Just Fine, Thanks). However, neither produced statistically significant main effects of elderly primes on walking speed (p's ranging from .06 to .13). All three papers, however, did report some conditions under which the priming effect seems to have "worked."   

Unpublished Reports of or References to Failures to Replicate

Lemm, K. (2012). Lieberman, M. (2012). Pashler et al. (2008). 

Darley & Gross (1983)

Main finding: Social class stereotypes biased judgments concerning the motivation and performance of a student in the presence of individuating information, but not in its absence.  

Citation count: 786.

Importance: Interpreted as showing that stereotypes act as hypotheses that lead to their own confirmation; seems consistent with a conclusion that stereotypes pervasively bias judgments.  

Published Failures to Replicate and Studies Finding Opposite Patterns

Baron et al (1995). Two exact attempts to replicate. Found the exact opposite pattern: Social class stereotypes biased judgments in the absence of individuating information but not in its presence. Citation count: 29.

Locksley et al (1980, 1982). Studies of sex stereotypes and stereotypes of day/night people.  Showed that stereotypes biased judgments with useless individuating information but did not bias judgments in the presence of relevant individuating information.  

Krueger & Rothbart (1988). Studies of sex stereotypes showed that the less individuating information people had, the more they relied on stereotypes. 

Note: Darley & Gross (1983) has been cited over 600 times since 1996, i.e., after the two exact failures by Baron et al, had been published.

Snyder, Tanke, & Berscheid (1977)

Main finding: Physical attractiveness stereotypes created a self-fulfilling prophecy whereby people were friendlier to those they believed were more attractive, and this evoked friendlier behavior from those believed to be attractive (thereby "fulfilling" the stereotype that attractive are warmer, friendlier and just plain better people).

Citation count: 949.

Importance: One of the most influential social psychological studies of the 1970s, producing results that fit nicely with some of the main theoretical and political narratives in social psychology (emphasizing the pervasive power of stereotypes to bias judgments and create reality and injustice).  

Published Failure to Replicate

Anderson & Bem (1981). I know of no other attempts at exact replication. 

Snyder & Swann (1978)

Main finding: People seek information about others in ways that are heavily biased towards confirming their own expectations.

Citation count: 712

Importance: Results augmented theoretical narratives in social psychology emphasizing people's supposedly irrational, suboptimal, and error-prone natures.  

Published Failures to Replicate

I know of no published exact failures to replicate. That is because researchers quickly realized there was something off kilter about the original study's methods. Snyder&Swann required participants to choose from hypothesis-confirming or hypothesis-disconfirming leading questions (e.g.. "What would you do to liven up a dull party?").

In the real world, people are not required to seek social information with leading questions.  Subsequent research, therefore, either gave the people the option of choosing from leading or non-leading questions (e.g., "If you were at a dull party, would you try to liven it up?"), or allowed people to make up their own questions. This research consistently found that, overwhelmingly, people asked diagnostic,unbiased questions. Snyder & Swann's conclusion was disconfirmed in contexts other than the artificial one that they studied. See: Devine et al (1990),  Skov & Sherman (1986),  Trope & Bassock (1982),  Trope et al (1984).   

Interesting note: When added together, these papers have been cited less than Snyder&Swann (1978).  


Word, Zanna, & Cooper (1974)

Main finding: Racial stereotypes lead to their own fulfillment.

Citation count: 688

Importance:  Another influential pair of social psychological studies of the 1970s, producing results that fit nicely with some of the main theoretical and political narratives in social psychology (emphasizing the pervasive power of stereotypes to create injustice).  

Failed replications: None.

Replications:  None. I have been able to identify zero replications in the nearly 40 years since the study was published. It is a good study, but it seems important to find out if the results hold up. 

Rosenthal & Jacobson, (1968)

Main finding: Teachers' expectations are self-fulfilling (they cause students' achievement).

Citation count: 4742

Importance: The first study to empirically demonstrate that interpersonal expectations can be self-fulfilling. Launched a major area of research. Seemed to explain inequalities of race, class, gender

Published Failures to Replicate

There are many. Nonetheless, self-fulfilling prophecy effects do occur. They just do not occur very often or very powerfully. Rosenthal himself (Rosenthal & Rubin, 1978) found that almost 2/3 of experiments produce no statistically significant evidence of self-fulfilling prophecy.   

Naturalistic studies focus on what happens in the real world, rather than under highly controlled and sometimes artificial experimental conditions. My book (Jussim, 2012) includes a table describing the effects obtained in every naturalistic study of teacher expectations and self-fulfilling prophecies that I could find. Those results were shocking, as can be easily seen in this chart:    

Larger samples produce self-fulfilling prophecy effects that hover barely above 0.  Small samples produce larger effects. Smaller samples, however, cannot "cause" larger effects. What we have here is dysfunctional methods and review processes. Small studies should be suspect, but, when they produce "significant" effects (which must be larger to reach significance in small compared to large studies), they have a good shot at being published. This pattern, therefore, reflects the dysfunction of focusing on "successful" studies. The effect size in the real world is much smaller than it seems (as indicated by the large studies), because of publication bias in favor of "successful" studies and against publishing null results. Let the failed replications be published!  

 If you, dear reader, know of any comparable failures to replicate influential classic studies in psychology (or of replications to the studies listed here), please contact me—I'd love to find out more about them. 


