The P-Hack Rap

How to Improve Psychological Science One Hip Hop Song at a Time

Posted Feb 19, 2018

This is a guest post by Daniel Rosenfeld, an undergraduate psychology major and lab manager at Cornell University.  His research centers on food choice, morality, and identity, with a particular focus on vegetarianism.

If you prefer to get right to the rap, there is an actual youtube video of an actual rap, The P-Hack Rap, about 3/4 of the way down. 

---------------------------------------

I am an undergraduate student. I have a mere two years of experience conducting psychological research. In this short time, however, I have developed an immense appreciation for the wide scope, meaningful achievements, and practical implications of psychological science. But I have also learned about scientific replicability—and I’m quite concerned.

The Replication Crisis

Lee Jussim
Source: Lee Jussim

I am concerned about how we as a field will train an emerging generation of scholars. I am concerned that admirable efforts to resolve a “replication crisis” will not reach enough laypeople, including other undergraduates. I am concerned that today’s rising psychological scientists will miss many of the conversations that will shape the future of their discipline.

At the same time, I am optimistic. This past semester, I completed a seminar focused on Chris Chambers’ The Seven Deadly Sins of Psychology and discussing methodological, statistical, and ethical dilemmas in psychological research. I learned about emerging movements toward open science – which, in a myriad of ways, attempts to move psychological science out of the shadows of proprietary research labs (where the original hypotheses, post-hoc storytelling, practices, statistical decisions and full set of methods are kept behind closed doors, known only to that lab) and into the daylight (where all can see – and, crucially, examine for errors and biases, and attempt to replicate using methods close to the original study).  One such practice is preregistration.  Pre-registration involves researchers preparing a written document identifying their original hypotheses and planned analyses; it is important because pre-registration renders the full set of practices far more transparent, and, sometimes, publicly available.  It makes transparent exactly which hypotheses were tested a priori, and which were exploratory, thereby allowing consumers of that research (including other scientists) to have a better sense of which findings are more credible and which are more tentative.

P-Hacking: The Quest for “Statistical Significance”

I learned that we—as individuals and as a field—can address pressing issues that threaten scientific validity by revising normative ways of conducting research. Here, I would like to share my perspective on an issue discussed at great length in my seminar: the concept of “p-hacking.”

Lee Jussim
Source: Lee Jussim

For lay readers: the “p” in p-value means probability, and one below .05 is typically referred to as “statistically significant.”  P-values are usually obtained when researchers compare experimental conditions or compute correlations. Without going into too much detail, if all sorts of assumptions are met, a p-value is the likelihood of finding an observed difference or correlation (or one even larger) by random chance, if there really (out in the wider world) is no difference or no correlation.  

P-values below .05 are often taken to mean, “eureka, my results are real!” because it was often mistakenly believed to mean “my results are systematic and valid, they are not due to random chance!” (it does not mean any of these things, but that is a blog for another day).  What is important, however, is that until very recently, it was almost impossible for psychological scientists to publish their findings without a p<.05, i.e., a “statistically significant” result.  And, of course, in the world of academic science, it is “publish or perish” – meaning psychology researchers are highly incentivized to “reach” statistical significance, which some do by any means possible.  Some such means are scientifically sound – such as obtaining very large samples; this “works” because large samples have more statistical “power” and are therefore more likely to produce significant findings.  But some means are not so sound.

Which gets us to p-hacking. Imagine that you are conducting a study to test the effect of textbook reading mode on academic performance. You select students from an introductory psychology class, have them take an exam, and then randomly sort them into two groups. You instruct the first group of students to read all upcoming textbook chapters via print textbook and the second group to read via online textbook. You test a clear hypothesis: The online textbook readers will show greater improvements in performance over the course of one month.

After the follow-up exam, you analyze the data. There was no significant difference in performance change between the two groups! “Damn,” you think to yourself, “That can’t be the story.  Maybe reading a textbook online only benefits students who scored lower than the 80% average on the baseline exam.”  So you re-run the analyses—this time excluding all students who scored 80% or higher at baseline—and out pops a significant result: a p-value of .04. You write this up, conclude that below-average-performing students who read textbooks online show greater academic improvements, and submit your manuscript for review.

It appears that you got your study to work—but only with the help of p­-hacking.  Colloquially, p-hacking can be thought of us “cooking the data” or “torturing the data until they confess” – it is not fraud, in the sense that researchers are not making up data, but they perform so many different analyses on the data that, almost inevitably, something will cross the .05 threshold into “statistical significance." When p-hacking is common, across an entire body of scientific literature, the lines between established phenomena and mere statistical noise may become completely blurred.

As a field, we need creative ways to engage an emerging generation of scholars in current conversations on scientific replicability. Efforts are afoot to render psychological research more transparent, so that problems and errors can be more readily identified, and, therefore, to render its findings more reproducible. As we now see psychologists celebrating independent labs’ preregistered direct replications of their studies on social media, we have much to gain from teaching the scholars of tomorrow about the benefits of open and sound methodology.

Combining my own involvement in research with my passion for hip-hop, I have written a rap on p-hacking.

Enjoy.

The P-Hack Rap

Trying to publish in journals

That have a high impact

But quite often, in fact

Methods are far from intact

With a p-hack we see that

We lack proper a priori testing

But in a system where novelty

Is viewed as the best thing

Validity flies out the door

As we test moderators and more

Lose sight of reality

Fearing a boot through the door

Fearing being that person

Who has a low H-index

Because their analyses were clean

With stats morals like Windex

Stuck yielding null findings

Not sticking to the binding

Conditions that define

Any hypothesis any time

Maybe delete this outlier

Only one, just to try it

And unknowingly succumb

To sheer confirmation bias

Or one can start hypothesizing

After results are known

Just to make a novel claim

That in reality is overblown

Like a fisherman in the sea

One can reel for false positives

Uncover a novel discovery

When in reality it's the opposite

Searching data like it's a gold mine

Trying to earn a good merit

Need a p less than .05

Cause it's publish or perish

Together we can revise

Conventional norms and systems

As a field we can rise

Toward methodological wisdom

We need greater transparency

To beat this replication crisis

To overcome biases that emerge

When left to our own opaque devices

Revision starts with new visions

That ignite new ignitions

Preregistration will reveal

All those post hoc decisions

-------------------------------------------------------------------------------

If you enjoyed this, you might also enjoy this poem:

Graduate Students Revenge

Lee Jussim tweets as PsychRabble, and you can follow him here: https://twitter.com/PsychRabble where he covers issues of science reform, diversity, prejudice, stereotypes, discrimination, and political psychology.