"Gender Stereotypes are Inaccurate" if You Ignore the Data
Where's the bias?
Posted June 5, 2018
Let’s start with a quiz.
1. Who was more likely to vote for Donald Trump in 2016, men or women?
2. Who is more likely to commit a murder, men or women?
3. Who receives higher grades in high school, boys or girls?
4. Who is more likely to be labeled as having some sort of behavior problem in elementary school, boys or girls?
Answers at end of this paragraph. If you got at least one right, you have just demonstrated to yourself that not all beliefs (stereotypes) about males and females are wrong. If you got three or four right, you should be convinced that your gender stereotypes are not inaccurate. You’re not alone: Lots of other people may—many actually do—hold fairly accurate gender stereotypes. (Answers are, respectively: men, men, girls, boys).
This helps explain why I was mystified when the Annual Review of Psychology published a major review in January declaring how inaccurate gender stereotypes are. The Annual Review of Psychology is widely considered to be one of the most influential and high-impact repositories of the highest quality reviews in the field. “Annual Review of Psychology offers expert, integrative reviews that go beyond top-of-the-head sound bites or clickbait, instead examining a topic’s nuances and pros and cons, the weight of the evidence, and gaps in our knowledge to date,” the editors state on the journal’s website. “What’s more, our authors and topics are vetted by the expert Editorial Committee. Our articles are carefully reviewed by devoted colleagues and by the editors. So, take the time to experience some curated wisdom from hand-picked experts.” In this particular case, the claim to only publish work that evaluates nuances, pros, and cons, and the weight of the evidence, fell short of the reality.
Disclaimers and Nuance
Accuracy is a simple idea referring to correspondence of belief with reality. Because I define a stereotype as people's beliefs about groups, stereotype accuracy, then, simply means “the extent to which a belief about a group corresponds to what the group is actually like.” It does not matter, with respect to assessing accuracy, how the group got that way. It could be biology, culture, socialization, history, or anything else.
Also, if there are no accuracy criteria for certain beliefs, it means we are in no position to declare them accurate or inaccurate. When I state “gender stereotypes are mostly accurate,” what I mean is: “When accuracy has been assessed, the correlation of gender stereotypes with criteria is one of the largest relationships in all of social psychology.”
Last, accuracy and bias are not mutually exclusive. A belief can be mostly accurate, but also sometimes lead to biases. In fact, gender stereotypes do sometimes lead to biases – but these tend to be, on average, quite small.
Claims about Gender Stereotype (In?)Accuracy
Ellemers’ review of gender stereotype (in?)accuracy starts off reasonably enough. From the abstract:
“There are many differences between men and women. To some extent, these are captured in the stereotypical images of these groups.”
This would seem to acknowledge at least a moderate degree of accuracy. But then Ellemers’ abstract continues:
“Stereotypes about the way men and women think and behave are widely shared, suggesting a kernel of truth.”
The “kernel of truth” phrasing has a long history in social psychology. From my 2012 book, Social Perception and Social Reality, p. 314:
Variations on the idea that there might be some truth to stereotypes became known as the “earned reputation” theory and the “kernel of truth” hypothesis both of which emphasized that, although stereotypes were largely inaccurate exaggerations, they did contain “a kernel of truth” ... I do not know whether those promoting this idea thought about it in the following manner, but it always brought to my mind an image of a single kernel of decent corn (the “kernel of truth”) in an otherwise entirely rotten cob (the rest of the stereotype exaggerating and distorting that truth). Still, one kernel is better than none.
The exaggeration hypothesis has long and deep roots within social psychology. It long was the only perspective that permitted researchers to acknowledge that people were not always completely out of touch with social reality, while simultaneously allowing researchers to position themselves well within the longstanding traditions emphasizing stereotype error and bias. But in case one has any doubts that this is what Ellemers meant, she continues on p. 276-7:
“Yet the stereotypical perception that a particular feature characterizes membership of a specific group typically leads people to overemphasize differences between groups and underestimate variations within groups.”
She concludes her discussion of inaccuracy in gender stereotypes on p. 277:
“If there is a kernel of truth underlying gender stereotypes, it is a tiny kernel, and does not account for the far-reaching inferences we often make about essential differences between men and women.”
Ellemers has clearly cast gender stereotypes as, except for a “tiny kernel of truth,” mostly inaccurate. This claim is not merely wrong, it is instructively wrong.
Scientists cannot be in the business of simply ignoring evidence that is inconsistent with their narrative. This does not mean they need to accept the evidence at face value. But ignoring evidence should not be a scientific option, especially in an outlet that prides itself on presenting nuance and pros and cons and the weight of the evidence regarding some issue.
In this particular case, however, there are 11 papers published in peer-reviewed journals reporting a total of 16 studies that have directly assessed the accuracy of gender stereotypes. Ellemer’s review did not cite a single one. In addition to the 11 papers published in peer-reviewed journals, several reviews of this evidence have appeared in numerous other sources (a list appears at the end of this article). How could Ellemers’ review have just missed all that?
For example, women are, on average, better at reading nonverbal cues than are men, and people are pretty good at recognizing that. Nearly all of the stereotype accuracy correlations exceed .50, and many are over .80. A correlation of .50 can be interpreted as people being right 75% of the time; .80 as people are right 90% of the time. Furthermore, those 16 studies provide at least as much evidence that people underestimate gender differences as that they “overemphasize” them.
On page 285, when discussing a model on “privileging stereotype consistent communication," Ellemers states:
“Female applicants received lower ratings than male applicants on forms containing gendered evaluation labels and were less likely to have their applications awarded, even though there was no difference in the perceived quality of the proposals they submitted (van der Lee & Ellemers 2015).”
This is a paper published in the prestigious Proceedings of the National Academy of Science, and claimed (in the abstract) “Results showed evidence of gender bias in application evaluations and success rates.”
Except they didn't, really. Albers (2015) argued that they actually found Simpson’s paradox. This refers to situations where a statistical relationship that is true for a population is not true for population subgroups. In this particular case, women received fewer grants, but that in itself is not evidence that bias was involved. Instead, women applied more in fields where grants were less likely to be funded (life and social sciences); and applied less in fields where grants were more likely to get funded (e.g., chemistry and physics).
In their reply to Albers, van der Lee & Ellemers (2015b) acknowledged that “correcting for scientific discipline indeed reduces the effect of applicant gender, so that the overall effect is no longer significant” (though they claimed that differences in funding rates within some of the fields support the idea "that applicant gender contributes to early career funding success").
Yet in 2018, Ellemers simply declares that “females were less likely to have their grants funded.”
This raises all sorts of questions: How is it possible for this paper to promote a debunked conclusion? Does the conclusion of bias even hinge on what the data say? How is it possible that, if the main evidence presented as showing bias actually does not show bias, the conclusion remains intact? And if the “bias” conclusion cannot be disconfirmed by data, is it a scientific conclusion?
The Problem Goes Well Beyond This Review
The editors of ARP declared the value of having the papers they publish vetted by devoted colleagues and by the editors themselves. How is it possible that such a large literature on gender stereotype accuracy was ignored or overlooked by so many experts? How is it possible that the editor and reviewers of Ellemer’s PNAS article on grant funding all overlooked Simpson’s Paradox, which has been well-known since at least the 1970s to constitute an alternative explanation to sexist bias for many outcomes? This is impossible to know. We do know, however, that, throughout the social sciences, empirical findings that contest social justice narratives often are systematically ignored, overlooked, denigrated, and dismissed.
This type of problem cannot be solved by better statistics or improved methods. Nonetheless, it threatens the scientific credibility of social psychology at least as much as do unreplicable findings, faulty statistics, and suboptimal research methods. It also risks undermining public support for the social sciences more broadly. Why should the public continue to support funding social sciences, if it cannot be reasonably assured that scientists’ conclusions will be responsive to their own data?
There is an alternative. Social psychology could live up to its scientific ideals. Conclusions that enter the field’s canon cannot be based on cherry-picked evidence to support a narrative. ARP’s stated goals of presenting nuanced perspectives, pros and cons, and evaluating the weight of the evidence are exactly right. That is exactly how science should be conducted.
As usual, please read my guidelines for commenting before doing so. In short, no snark, sarcasm, or insults, and please stay on topic.
A draft of this essay was sent to Dr. Ellemers and the editors of ARP inviting them to provide feedback regarding anything that might be inaccurate or misrepresented. As of this writing, I have received no reply from any, except Dr. Schacter, who informed me that he was not the action editor for the Ellemers chapter and that I had misspelled his name (the prior draft had it spelled Schachter).
Reviews of Stereotype Accuracy (Including Gender Stereotypes)
Jussim, L., Crawford, J.T., Anglin, S. M., Chambers, J., Stevens, S. T., & Cohen, F. (2016). Stereotype accuracy: One of the largest relationships and most replicable effects in all of social psychology. In T. Nelson (ed.), Handbook of prejudice, stereotyping, and discrimination (2nd ed), pp. 31-63. Hillsdale, NJ: Erlbaum.
Jussim, L., Crawford, J.T., & Rubinstein, R. S. (2015). Stereotype (in)accuracy in perceptions of groups and individuals. Current Directions in Psychological Science, 24, 490-497.
Jussim, L., Cain, T., Crawford, J., Harber, K., & Cohen, F. (2009). The unbearable accuracy of stereotypes. Pp. 199-227 in T. Nelson (ed.), Handbook of prejudice, stereotyping, and discrimination. (Hillsdale, NJ: Erlbaum).
The 11 Articles Reporting 16 Studies Assessing Gender Stereotype Accuracy Not Included in the ARP Chapter Claiming Gender Stereotypes Have “Only A Tiny Kernel Of Truth”
Allen, B. P. (1995). Gender stereotypes are not accurate: a replication of Martin (1987) using diagnostic vs. self-report and behavioral criteria. Sex Roles, 32, 583-600. (note: despite the title, the article found a correlation of .61 between sex stereotypes and criteria after removing a single outlier – see Jussim et al, 2016, referenced above).
Beyer, S. (1999). The accuracy of academic gender stereotypes. Sex Roles, 40, 787-813.
Briton, N. J., & Hall, J. A. (1995). Beliefs about female and male nonverbal communication. Sex Roles, 32, 79-90.
Cejka, M. A., & Eagly, A. H. (1999). Gender-stereotypic images of occupations correspond to the sex segregation of employment. Personality and Social Psychology Bulletin, 25, 413-423.
Hall, J. A., & Carter, J. D. (1999). Gender-stereotype accuracy as an individual difference. Journal of Personality and Social Psychology, 77, 350-359.
Halpern, D. F., Straight, C. A., & Stephenson, C. L. (2011). Beliefs about cognitive gender differences: Accurate for direction, underestimated for size. Sex Roles, 64, 336-347.
Lockenhoff, C. E., Chan, W., McCrae, R. R., De Fruyt, F., Jussim, L., De Bolle, M., … & Pramila, V. S. (2014). Gender stereotypes of personality: Universal and accurate? Journal of Cross-Cultural Psychology, 45, 675-694.
Martin, C. L. (1987). A ratio measure of sex stereotyping. Journal of Personality and Social Psychology, 52, 489-499.
McCauley, C., & Thangavelu, K. (1991). Individual differences in sex stereotyping of occupations and personality traits. Social Psychology Quarterly, 54, 267-279.
McCauley, C., Thangavelu, K., & Rozin, P. (1988). Sex stereotyping of occupations in relation to television representations and census facts. Basic and Applied Social Psychology, 9, 197-212.
Swim, J. K. (1994). Perceived versus meta-analytic effect sizes: An assessment of the accuracy of gender stereotypes. Journal of Personality and Social Psychology, 66, 21-36.