The P-Hack Rap
How to Improve Psychological Science One Hip Hop Song at a Time
Posted Feb 19, 2018
This is a guest post by Daniel Rosenfeld, an undergraduate psychology major and lab manager at Cornell University. His research centers on food choice, morality, and identity, with a particular focus on vegetarianism.
If you prefer to get right to the rap, there is an actual youtube video of an actual rap, The P-Hack Rap, about 3/4 of the way down.
I am an undergraduate student. I have a mere two years of experience conducting psychological research. In this short time, however, I have developed an immense appreciation for the wide scope, meaningful achievements, and practical implications of psychological science. But I have also learned about scientific replicability—and I’m quite concerned.
The Replication Crisis
I am concerned about how we as a field will train an emerging generation of scholars. I am concerned that admirable efforts to resolve a “replication crisis” will not reach enough laypeople, including other undergraduates. I am concerned that today’s rising psychological scientists will miss many of the conversations that will shape the future of their discipline.
At the same time, I am optimistic. This past semester, I completed a seminar focused on Chris Chambers’ The Seven Deadly Sins of Psychology and discussing methodological, statistical, and ethical dilemmas in psychological research. I learned about emerging movements toward open science – which, in a myriad of ways, attempts to move psychological science out of the shadows of proprietary research labs (where the original hypotheses, post-hoc storytelling, practices, statistical decisions and full set of methods are kept behind closed doors, known only to that lab) and into the daylight (where all can see – and, crucially, examine for errors and biases, and attempt to replicate using methods close to the original study). One such practice is preregistration. Pre-registration involves researchers preparing a written document identifying their original hypotheses and planned analyses; it is important because pre-registration renders the full set of practices far more transparent, and, sometimes, publicly available. It makes transparent exactly which hypotheses were tested a priori, and which were exploratory, thereby allowing consumers of that research (including other scientists) to have a better sense of which findings are more credible and which are more tentative.
P-Hacking: The Quest for “Statistical Significance”
I learned that we—as individuals and as a field—can address pressing issues that threaten scientific validity by revising normative ways of conducting research. Here, I would like to share my perspective on an issue discussed at great length in my seminar: the concept of “p-hacking.”
For lay readers: the “p” in p-value means probability, and one below .05 is typically referred to as “statistically significant.” P-values are usually obtained when researchers compare experimental conditions or compute correlations. Without going into too much detail, if all sorts of assumptions are met, a p-value is the likelihood of finding an observed difference or correlation (or one even larger) by random chance, if there really (out in the wider world) is no difference or no correlation.
P-values below .05 are often taken to mean, “eureka, my results are real!” because it was often mistakenly believed to mean “my results are systematic and valid, they are not due to random chance!” (it does not mean any of these things, but that is a blog for another day). What is important, however, is that until very recently, it was almost impossible for psychological scientists to publish their findings without a p<.05, i.e., a “statistically significant” result. And, of course, in the world of academic science, it is “publish or perish” – meaning psychology researchers are highly incentivized to “reach” statistical significance, which some do by any means possible. Some such means are scientifically sound – such as obtaining very large samples; this “works” because large samples have more statistical “power” and are therefore more likely to produce significant findings. But some means are not so sound.
Which gets us to p-hacking. Imagine that you are conducting a study to test the effect of textbook reading mode on academic performance. You select students from an introductory psychology class, have them take an exam, and then randomly sort them into two groups. You instruct the first group of students to read all upcoming textbook chapters via print textbook and the second group to read via online textbook. You test a clear hypothesis: The online textbook readers will show greater improvements in performance over the course of one month.
After the follow-up exam, you analyze the data. There was no significant difference in performance change between the two groups! “Damn,” you think to yourself, “That can’t be the story. Maybe reading a textbook online only benefits students who scored lower than the 80% average on the baseline exam.” So you re-run the analyses—this time excluding all students who scored 80% or higher at baseline—and out pops a significant result: a p-value of .04. You write this up, conclude that below-average-performing students who read textbooks online show greater academic improvements, and submit your manuscript for review.
It appears that you got your study to work—but only with the help of p-hacking. Colloquially, p-hacking can be thought of us “cooking the data” or “torturing the data until they confess” – it is not fraud, in the sense that researchers are not making up data, but they perform so many different analyses on the data that, almost inevitably, something will cross the .05 threshold into “statistical significance." When p-hacking is common, across an entire body of scientific literature, the lines between established phenomena and mere statistical noise may become completely blurred.
As a field, we need creative ways to engage an emerging generation of scholars in current conversations on scientific replicability. Efforts are afoot to render psychological research more transparent, so that problems and errors can be more readily identified, and, therefore, to render its findings more reproducible. As we now see psychologists celebrating independent labs’ preregistered direct replications of their studies on social media, we have much to gain from teaching the scholars of tomorrow about the benefits of open and sound methodology.
Combining my own involvement in research with my passion for hip-hop, I have written a rap on p-hacking.
The P-Hack Rap
Trying to publish in journals
That have a high impact
But quite often, in fact
Methods are far from intact
With a p-hack we see that
We lack proper a priori testing
But in a system where novelty
Is viewed as the best thing
Validity flies out the door
As we test moderators and more
Lose sight of reality
Fearing a boot through the door
Fearing being that person
Who has a low H-index
Because their analyses were clean
With stats morals like Windex
Stuck yielding null findings
Not sticking to the binding
Conditions that define
Any hypothesis any time
Maybe delete this outlier
Only one, just to try it
And unknowingly succumb
To sheer confirmation bias
Or one can start hypothesizing
After results are known
Just to make a novel claim
That in reality is overblown
Like a fisherman in the sea
One can reel for false positives
Uncover a novel discovery
When in reality it's the opposite
Searching data like it's a gold mine
Trying to earn a good merit
Need a p less than .05
Cause it's publish or perish
Together we can revise
Conventional norms and systems
As a field we can rise
Toward methodological wisdom
We need greater transparency
To beat this replication crisis
To overcome biases that emerge
When left to our own opaque devices
Revision starts with new visions
That ignite new ignitions
Preregistration will reveal
All those post hoc decisions
If you enjoyed this, you might also enjoy this poem:
Lee Jussim tweets as PsychRabble, and you can follow him here: https://twitter.com/PsychRabble where he covers issues of science reform, diversity, prejudice, stereotypes, discrimination, and political psychology.