When the American College Testing Board released the results of 2013 SAT performance, they found that, once again, boys outperformed girls on the mathematics section of the test. In fact, this sex difference was the latest entry in an uninterrupted trend dating back to the 1970's.
And the blog wars began.
According to Dr. Mark Perry of the conservative American Enterprise Institute, this "huge, statistically significant +30-point gender gap on the SAT" is a clear indication that "there are some innate differences by gender for mathematical ability" and so "closing the STEM gender jobs gap may be a futile attempt in socially engineering an unnatural, and unachievable, outcome."
Those who look askance at sex differences beg to differ. According to University of Wisconsin-Madison psychology professor Janet Hyde, "There just aren’t gender differences anymore in math performance. So parents and teachers need to revise their thoughts about this. Stereotypes are very, very resistant to change, but as a scientist I have to challenge them with data.”
The purpose of this blog is to explain how these seemingly divergent views can be both right and wrong at the same time. The key is appreciating the full meaning of the words of fellow blogger, Dr. Steve Stewart-Williams:
Let's start with a simple fact: Most women do not have the right aptitude to be professors at top STEM departments. This is unfortunate, perhaps, but it’s true. It’s also true, though, that most men don’t have the right aptitude! Only a small minority of people do. The phenomenon we’re trying to explain is not why half the population (men) can do it whereas half the population (women) can’t. Most of the population can’t, and of the tiny fraction who can, some are men and some are women. The only question is: Why is the tiny fraction of men working in STEM fields today somewhat larger than the tiny fraction of women?
To fully appreciate the wisdom of asking the question this way, consider that the SAT math score range for college admissions to top American public universities. For top engineering schools, it is 630-800. Now consider this breakdown of the recently published SAT data:
First, notice that only 7.2% of the 1.7 million students who were tested in 2012 scored in the "genius" (700-800) range. Of that tiny percentage, 4.5% were male and 2.7% were female—a male-to-female ratio of 1.6 to 1. Only 17.9% scored in the "above average" category (600-690), and the male-to-female ratio is much narrower (1.2 to 1). Finally, almost 30% scored in the "average" category (500-590), and about half were male and half were female; the ratio is just about 1:1.
So it simply is not the case that every male outperforms every female on math, nor is it even the case that the majority of males outperform the majority of females on math. Yet this is typically the conclusion drawn in the popular press when SAT performance scores are reported.
In fact, some claim that this 32-point difference not only constitutes evidence of innate male superiority in mathematics, they claim that it is evidence of male superiority overall. The comment section following Perry's article is quite telling in this regard. Many interpret this 32-point sex difference on a subsection of a paper-and-pencil college entrance exam as support for patriarchy as the "natural human order".
So let's look more closely at what the 32-point difference means. First, compare the graph from Perry's blog with the same data redrawn using the full range of SAT scores.
Note how the "enormous" sex difference actually appears quite small when the Y-axis is more truthfully drawn.
Now compare the distributions of male and female math SAT scores in the following graph:
Notice how similar the distributions are, and how close together the means of the distributions are.
So if there is actually very little difference in performance between the vast majority of males and females, how could a 32-point mean difference be statistically significant?
There is no secret to this. It is simply a matter of sample size and variability: The larger the sample and the more tightly clustered the scores, the smaller the difference needed to achieve statistical significance. A total of nearly 1.7 million students took the SAT tests in 2013. The math scores ranged from 200 to 800 points, were normally distributed with an overall mean of 514, and an "average spread" of scores around that mean (standard deviation) of a little over 100 points (sd = 118). With a normally distributed sample size that large with such tightly clustered scores, even a tiny difference in average performance would be statistically significant.
Because significance tests can sometimes be misleading, scientific journals typically require other statistics to assess the importance of a result. The most common are assessments of effect size—tests that tell you how large the effect is. Using the data released from the SAT board (Mean male = 521, sd = 121; mean female = 499, sd = 114), it turns out that about 3% of the variability in SAT math scores can be attributed to the sex of the test-taker; 97% is due to other factors—presumably differences in training and natural aptitude in math (Cohen's d = .37, effect size r-squared = .03).
Now if men and women are about the same in terms of mathematics aptitude, how do we explain these facts:
In my next blog post, we'll discover the answer.
Copyright March 17, 2014 Dr. Denise Cummins
Dr. Cummins is a research psychologist, a Fellow of the Association for Psychological Science, and the author of Good Thinking: Seven Powerful Ideas That Influence the Way We Think.
More information about me can be found on my homepage.
Follow me on Twitter.
And on Google+.