Smelling Sexual Arousal and Scientific Common Sense

Can men smell women's sexual arousal? Check out this claim.

Posted Jul 24, 2020

 Photo by Rene Asmussen from Pexels
Recent research suggested that heterosexual men can smell women's sexual arousal.
Source: Photo by Rene Asmussen from Pexels

A recent academic paper claimed that women produce a unique scent when sexually aroused, and that men can pick up on it. Not only can men pick up on it, but this effect is huge. It’s bigger than the differences between conservatives and liberals on the importance of social equality as a value. It’s bigger than the effect of being an extravert on talking more. It’s more in the ballpark of men (on average) being heavier than women. Or, as a recent commentary by John Sakaluk puts it, “we are asked to imagine a world in which a heterosexual man could smell a sexually aroused women … to an extent that dwarfs most other psychological effects.”

The problem with this study is that it’s implausible. The effects are too big to be true. It flies in the face of previous psychological literature (and many people’s personal experience) that men tend to overestimate women’s sexual interest. How could they, if arousal is so clear and easy to detect? If the reported effects were true, Sakaluk asks, “How could heterosexual men inhabit gyms, dance clubs, and the like—or even more innocuous settings like a classroom, or [drive] a car—without individually and collectively becoming psychologically triggered … whenever a nearby woman experiences sexual arousal?”

 Photo by Rene Asmussen from Pexels
People can often tell when something is too good to be true about a scientific finding.
Source: Photo by Rene Asmussen from Pexels

Identifying errors in scientific journal articles is often thought of as involving hi-tech statistical forensics. These arguments are made against the arousal sniffing study, but Sakaluk’s approach also involves a good dose of common sense. Knowledge of the typical sizes of effects in psychology experiments and knowledge of how many people you need to study play an important role. People who read, review, and conduct lots of psychology experiments get a sense for what’s typical. They can use this to flag implausible findings for deeper investigation.

This might surprise educated lay-people, who want “purely objective” measures of scientific evidence. “Well, why couldn’t it be the case that these researchers just happened to find a big effect? Aren’t you being biased by saying this is improbable?” But ignoring expertise, like knowledge of how experiments in your field work, is a different kind of bias. It assumes that implausible claims—claims that don’t line up with expert judgments or the everyday experiences of many people—should be treated just as plausibly as those that do. If a researcher were to report that women were, on average, taller than men, we would have grounds to be skeptical. Since that’s the opposite of what we experience in everyday life, we should require stronger evidence to change our position.

This is not to say that common sense cannot be overturned. Scientific breakthroughs are often counter-intuitive. Instead, it argues that “extraordinary claims require extraordinary evidence.” We are given license to use our common sense in evaluating science. If something seems off, we are allowed to ask for more evidence before we believe.

Of course, in the case of this study, we are in a position where the common sense red flag led to more technical red flags being raised. For example, the Granularity Related Inconsistent Means (GRIM) test checks out whether the average of a sample is possible, given other reported information.

Intuitively, you can think about the fact that if you ask 25 people to give a rating on a 1 to 5 scale, the average can only end up having certain values. If everyone gave a 5 except one person, and that person gave a 4, then the average would be 4.96. There is no possible way for the average to come out 4.98. In four of six experimental conditions in the smelling arousal study, the means were impossible. Even having this occur once is evidence of an error.

 Photo by Andrea Piacquadio from Pexels
Statistical tests can detect when reported results are not internally consistent.
Source: Photo by Andrea Piacquadio from Pexels

A related test, Sample Parameter Reconstruction via Iterative Techniques (SPRITE), uses similar logic to determine whether the average and variability reported in a study are possible. In this case, every condition in every study failed the test. The data reported by the authors is not internally consistent. There must be (at least) one error in each study reported.

Another battery of tests was used to evaluate whether the p-values (probability values) in the study are plausible, given information about how many people were being studied. These rely on the well-understood principle that when fewer participants are used in a study, it is less likely that a true effect can be picked up by a statistical test. Small studies are therefore unlikely to show significant effects consistently, even if the effect is real. Seeing three small studies all reporting a significant effect, with no failures, is therefore highly implausible and cause for concern that the results are “lucky” (or may have been “cherry picked” by the researchers). They are not likely to represent what would happen if the same study were to be repeated.

Even though the studies were published in a peer-reviewed academic journal, the finding that heterosexual men can smell women’s arousal is not valid. A careful review of the evidence suggests that the studies reported have major errors. That’s not to say that this couldn’t be true, just that we would need larger, error-free studies to support that conclusion. Further, this critique is an invitation to think of science as more than just technical wizardry. It’s also a process of reasoning about the real world outside of the lab. Science needs to do more than just wow and surprise us. It also needs to explain and make sense of the wider world.