Artificial Intelligence
Why We’re All Just Making It Up (Even When We’re Not)
AI’s strange explanations mirror how humans justify decisions after the fact.
Posted June 9, 2025 Reviewed by Michelle Quirk
Key points
- Large language models often provide plausible explanations that don’t reflect how they reached an answer.
- Humans do the same, especially when decision processes are unconscious or unclear.
- We tend to believe fluent explanations, even when they aren’t grounded in actual reasoning.
Imagine asking an AI like Claude to solve a simple math problem: What’s 36 plus 59? Claude quickly replies: “95.” So you follow up with a natural next question: How did you get that answer?
Claude responds with the kind of explanation you’d expect from a middle school student—or a calculator with charm: “I added the ones—6 and 9—to get 15, carried the 1, and then added the tens—3, 5, and the 1—to get 9, resulting in 95.”
It sounds perfectly reasonable. Logical. Familiar. Exactly how we’d want Claude to explain it.
But here’s the catch: That’s not what Claude actually did.
According to recent research by Anthropic (2025a and 2025b),1 Claude’s real reasoning process was far less orderly. It approximated values—adding things like “40ish and 60ish” or “36ish and 57ish”—and then separately processed the last digits before cobbling together a final answer. And the explanation it gave? That wasn’t a transparent report of its internal steps. It was a narrative—a plausible-sounding reconstruction based on the kinds of things people tend to say when asked to explain basic math.
It’s easy to chalk this up as a limitation of artificial intelligence (AI). But the reality is more unsettling: We do the same thing.
The Stories We Tell Ourselves
Sometimes our reasoning is exactly what it appears to be: deliberate, logical, and consciously constructed. But other times—especially when we don’t really know why we made a decision—we tell ourselves a story after the fact. And we believe it.
Back in 1977, Nisbett and Wilson reviewed a variety of studies and concluded that we often lack direct access to the mental processes that produce our behavior. When those processes operate outside conscious awareness, we still offer explanations—but those explanations tend to reflect our assumptions or implicit theories about what must have influenced us, rather than the actual causes.
That doesn’t mean we’re intentionally fabricating an explanation. Rather, we’re filling in the gaps in ways that feel plausible. And, importantly, Nisbett and Wilson noted that people are more accurate in their self-reports when the relevant causes are salient—clear, noticeable, and consciously available. In other words, when we do know why we did something, we’re generally pretty good at explaining it. But when we don’t, we often think we know anyway.
The same dynamic appears in large language models like Claude. In some cases, Claude generates an explanation that reflects its actual internal reasoning, such as when it successfully computes something like the square root of 0.64. But in other cases—especially when it’s given a subtle hint about the correct answer—it produces a tidy, coherent chain of thought (CoT) that never actually happened. It builds the story backward from a predetermined conclusion.
So while asking AI models to “show their work” might seem like a safeguard against error, it isn’t always. Sometimes the explanation is faithful to what happened. Sometimes it’s just a story that makes sense. And that’s something humans and machines seem to share.
We Don't Just Explain—We Predict
In a previous post, I questioned whether intuition is truly unconscious. I noted that while we often treat intuition as some kind of unconscious magic, it's more accurate to see it as fast, experience-based recognition. What feels like a sudden “knowing” is often our brain picking up on patterns we’ve seen before—often below the level of conscious awareness.
And here’s the kicker: We’re usually unaware of how we recognized the pattern. We might be aware that something feels familiar or that a particular option just seems right. But we can’t typically reconstruct the process that got us there. As Herbert Simon (1992) once put it, “We are aware of the fact of recognition… we are not aware of the processes that accomplish recognition” (emphasis in original).
That makes intuitive decisions fertile ground for post-hoc explanation. The reasoning often happens before we realize a decision has even been made—so by the time we’re asked to explain it, we’re left with the same challenge Claude faces: Reconstruct a plausible story. And if we’re confident and articulate enough, even we may come to believe it.
In that light, Claude’s fabricated reasoning isn’t just a technical limitation. It reflects something deeper about cognition—whether artificial or biological. When the real process is hidden or inaccessible, we don’t stop and say, “I don’t know.” We fill in the blanks. We predict. We narrate.
And sometimes, we make it up.
Fluency as a False Signal
Whether it comes from a human or an AI, a good explanation has one thing going for it above all else: fluency. When a response is coherent, well-structured, and easy to follow, we tend to trust it. We assume that clarity reflects competence, that confidence reflects accuracy, and that logical flow reflects actual reasoning.
But those assumptions can be misleading.
In psychology, this tendency is known as the fluency heuristic—the idea that things that are easier to process are perceived as more accurate, truthful, or trustworthy. If something sounds right, we’re more likely to believe it—even if it’s wrong. That’s part of what can make both human and AI-generated bullsh*t so convincing: The explanation may be false, but if it’s presented fluently enough, we’re more likely to accept it.
Claude’s explanations are a prime example. When it gives a textbook-perfect breakdown of how it solved 36 + 59, it sounds like it understands the logic—even though we now know it doesn’t. The same goes for motivated reasoning: If Claude reverse-engineers an answer to match our hint and explains it with a plausible CoT, we may not question it. The output is fluent, but it’s not faithful.
And the same thing happens when people explain their decisions. If someone speaks confidently, walks you through a reasonable sequence of events, and hits all the right rhetorical notes, we tend to believe them. We take fluency as a proxy for insight. But unless we know how the decision was actually made, we’re just trusting the story—not the process.
That’s the problem with fluency: It’s a signal we’re wired to trust, even when it has nothing to do with truth.
When the Explanation Isn’t the Process
When we ask someone—or something—to explain how a decision was made, we assume the explanation reflects the reasoning. But as we've seen, that isn't always the case.
Claude’s fabricated CoTs reveal more than a technical quirk—they expose a broader truth about cognition: We often don’t know how we arrive at a conclusion. And because we don’t realize that we don’t know, we construct explanations that feel right—even when they aren’t. Sometimes those explanations are grounded in real reasoning. Other times, they’re just fluent fictions.
That’s not a reason to dismiss explanation altogether. But it’s a reason to treat explanations—whether from humans or AI—as outputs, not guarantees. A fluent story might reflect the real process, or it might just be a convincing reconstruction.
The danger lies in our inability to tell the difference.
As AI becomes more embedded in how we reason and decide, that risk grows. We’ll need to ask not just whether an explanation sounds good, but whether it’s likely to be faithful to the process that produced the answer.
Because sometimes, “How did you get that?” is a much harder question than it seems.
References
Footnote
1. These are both drawn from a more comprehensive research report entitled On the Biology of a Large Language Model.