Skip to main content
Psychology

Small Effects, Big Stories

Why weak evidence can still drive moral panics and shape public opinion

Key points

  • Study design, sample size, and analytics shape research results and their interpretation.
  • Small effects can look big when framed to fit an appealing or moralized narrative.
  • Motivated reasoning makes us less critical of results that align with our beliefs.
  • Weak effects can still influence policy, opinion, and public debate if the story fits.
OpenAI/ChatGPT
Source: OpenAI/ChatGPT

Potatoes, smartphones, social media, and AI. What do they have in common?

On the surface, not much. But all four have recently been the subject of scientific claims—some modest, others dramatic—that quickly became headlines. In each case, the research was used to shape a narrative: Potatoes increase your risk of diabetes, smartphones hurt academic performance, social media is destroying an entire generation’s mental health, and AI is quietly ghostwriting scientific papers. Some of these claims contain a kernel of truth. Some are overblown.

The problem isn’t that researchers are asking bad questions or running bad studies (though that certainly happens). It’s that the conclusions—and more importantly, the way they’re communicated—often seem reverse-engineered to serve a larger story.

In today's climate, a weak but statistically significant result can pass as "evidence" to support a narrative that’s way overblown. The science may be technically correct; the takeaway often isn’t reasonable.

This isn’t a new problem, but it’s now a more visible one. Psychologist Christopher J. Ferguson (2025) recently argued that much of modern social science runs on weak effect sizes and strong convictions. With enough data, there’s support for almost any hypothesis.

And once that support clears the magical p < .05 threshold, it becomes easy to overstate what’s been found—especially if the result aligns with an intuitively appealing narrative. That’s how weak effects become headlines and how nuance gets crowded out by moral panic.

Let’s talk a little about how that happens.

Small Effects, Big Stories

At the risk of oversimplifying, there are three primary elements that shape the results of any study.

First, the design—detailing what’s measured and how—determines the inferences that can be made and the sample size needed. Second, sample size drives whether an observed effect reaches “statistical significance.” With large enough samples, even tiny, practically meaningless effects look impressive on paper.

Finally, analytics—the statistical methods used—determine which patterns emerge, which are ignored, and how strong the results appear. And it’s integral to how a messy dataset becomes a clean, compelling story.

Each of these elements involves choices, and they’re rarely neutral. I’ve argued previously that the way we define concepts, decide what counts as relevant, and choose which comparisons to make can completely change the story the data tells.

So, there are a lot of decisions to be made when it comes to transforming a research idea into a published study. And these decisions all represent opportunities—conscious or not—to nudge the result toward supporting a preferred interpretation. And when that interpretation aligns with a compelling or morally charged narrative, even the smallest effects can be turned into big stories.

This is, in essence, the point Ferguson was making. When the design, sample, and analytics all slant—intentionally or not—toward producing a “positive” result, it becomes much easier for a small, technically significant effect to be framed as decisive evidence.

He illustrates this with a couple of graphics—and I asked ChatGPT to produce two similar examples using the same scaling for each [1] (see Figures 1 and 2)—representing two different correlational effects. With the first chart, the trend is obvious: People at the low and high ends differ noticeably, though there’s still some random variability (noise).

Figure1 and 2. Two correlations with identical axis scaling.
Figure1 and 2. Two correlations with identical axis scaling.
Source: OpenAI/ChatGPT

With the second chart, the trend is barely visible. People at the low end of the scale look similar to those at the high end, and the data is far noisier.

Yet, that second effect—visually unimpressive as it may be—is the average effect size reported in pre-registered psychology studies (Schäfer & Schwarz, 2019) [2]. On paper, it’s “statistically significant.” In headlines, it can become “proof.” But in practice, it may be little more than a statistical blip.

Narratives and Moral Panics

If a small effect can be dressed up to look “significant,” it’s even easier to fit it into a compelling story. In research, compelling stories are currency—they get published, covered, and acted upon.

All it takes is for a result to be easy enough to fit into a conclusion that aligns with an existing moral panic. Consider a recent randomized controlled trial involving more than 17,000 students, testing whether removing phones from classrooms would improve academic performance (Sungu et al., 2025). The study found a statistically significant improvement in grades—but the effect size was minuscule (d = 0.086), which, in keeping with our correlational visuals, is equivalent to a correlation of about r = 0.043.

In practical terms, that’s negligible. And the researchers found no significant changes in well-being, academic motivation, online harassment, or overall digital use. It’s easy to imagine how “phone bans show tiny effects on grades” becomes “removing phones boosts performance” in headlines and policy debates. The story fits an existing cultural concern about phones and attention, and once it’s framed that way—helped along by the study’s own title—the small effect becomes a footnote rather than the focal point.

This dynamic also affects how results are received. Decades of research on motivated reasoning show that people are more willing to accept statistics and arguments that align with their existing beliefs or desired outcomes, and far less likely to scrutinize their weaknesses (Kunda, 1990; Taber & Lodge, 2006). The effects are often small to moderate in size (Ditto et al., 2019)—not overwhelming, but strong enough to tilt judgments toward an appealing narrative.

When a result feels true—when it fits the story we already believe—it’s easier to overlook how small the effect actually is or to not bother examining it closely in the first place. At that point, small effects can pass as big ones—not because the evidence is strong, but because it’s convenient. That’s how weak results, paired with a receptive audience, can drive policy, shape public opinion, and fuel moral panics.

Wrapping Things Up

Potatoes, smartphones, social media, and AI don’t share much on the surface, but the way research about them has been packaged and received follows a familiar pattern. A small or highly specific finding gets distilled into a headline-friendly conclusion, often one that aligns with a preexisting narrative. Once it fits the narrative, the effect size becomes secondary—or disappears entirely from public discussion.

This isn’t an argument against studying these topics. Good research questions sometimes produce small effects, and those results can still be valuable. But when weak effects are presented as strong evidence—whether through selective framing, moral urgency, or simply the gravitational pull of motivated reasoning—we risk inflating both their scientific and practical importance.

The challenge is resisting the urge to overhype implications, paying attention to effect sizes, and distinguishing statistical significance from practical relevance—applying the same level of scrutiny to results we like as to those we don’t.

In the end, science is best served when it informs our narratives, not when it’s bent to fit them. Weak effects will always exist. We just have to be willing to treat them for what they are, not more than that.

References

Footnotes

[1] Using the same scaling makes it easier to see the real difference between the two: one shows a strong, visible relationship; the other, barely any pattern at all.

[2] The average effect includes both between- and within-subjects designs. The median effect for between subjects designs is r=.12, which would look like even more noise. Thanks, Chris Ferguson, for pointing me towards this particular study.

advertisement
More from Matt Grawitch Ph.D.
More from Psychology Today