Artificial Intelligence
The Double-Edged Sword of Artificial Intelligence
Why the most sophisticated AI models are also the most convincing bullsh*tters.
Posted February 10, 2025 Reviewed by Michelle Quirk
Key points
- More sophisticated LLMs are more accurate and fluent, but they are also better bullsh*tters.
- This can lead to significant risks in high-stakes contexts.
- The challenge lies in managing the risks while still reaping the benefits.
Each new iteration of a large language model (LLM) feels like a step forward—better at understanding nuanced questions, more capable of providing detailed answers, and increasingly adept at sounding, well, human. These advancements are celebrated as breakthroughs in artificial intelligence (AI), and for good reason.
But we also have to remember that LLMs themselves are just tools trained by humans, regardless of how sophisticated they get. They cannot evaluate the truth of the responses they produce. As I’ve argued in the past, their responses are nothing but bullsh*t, which is information that is communicated with little regard for its accuracy. And a recent study by Zhou et al. (2024) suggests that as LLMs get more sophisticated, they may also get better at giving us plausible-sounding incorrect answers. In other words, as these systems become more educated, they also become better bullsh*tters.
This duality—impressive sophistication paired with an enhanced capacity for generating convincing falsehoods—raises important questions. How do these models become so adept at bullsh*tting, and what does that mean for the way we use and trust AI? To explore this, we need to look at the mechanisms behind their design, the risks they pose, and the role human expertise plays in consuming and acting on their outputs.
The Mechanisms Behind LLMs as Sophisticated Bullsh*tters
At their core, LLMs, like ChatGPT, are pattern-matching machines. They predict the next word in a sequence based on probabilities derived from their training data. These models are trained on vast data sets of text, capturing language patterns that make their responses coherent and contextually appropriate. However, their design prioritizes plausibility over accuracy.
Reinforcement learning with human feedback (RLHF), a common method used to refine these models, exacerbates this tendency. Previous, less sophisticated versions of an LLM were more prone to produce outputs that were either (1) nonsensical—responses that were incoherent and clearly wrong—or (2) avoidant—failing to address the user’s query meaningfully.1 These obvious errors made it easier to identify when the model was off track.
With RLHF, human testers reward responses that feel natural and convincing, but this doesn’t always ensure factual accuracy. You see, both the clearly wrong and the avoidant answers get flagged, teaching the LLM to minimize these types of responses. Over time, the model learns to prioritize reasonable-sounding answers—even when they deviate from the truth. The more fluent and articulate a model becomes, the easier it is for users to mistake its output for truth, especially when the falsehoods sound reasonable.
The Risks of Plausible Falsehoods
This enhanced ability to generate plausible but incorrect responses poses real risks. Because the language appears authoritative, users may accept the information at face value. These mistakes might lead to minor issues in many or even most settings but can have more significant consequences in others. For example, a doctor could misdiagnose a patient,2 a lawyer could build a case on incorrect legal interpretations (Dahl et al., 2024), or a business executive could approve a new product line based on inaccurate market analysis3—all because the outputs produced by the AI were plausible-sounding but inaccurate. In such cases, the stakes are too high to place blind trust in AI outputs.
It’s tempting to anthropomorphize AI, attributing human-like intentions to its behavior. When discussing the Zhou et al. article, Krywko described the results as indicating AI is more likely to “lie.”4 But this framing is misleading. To lie requires intent—an understanding of truth and a decision to deceive. LLMs lack both. These models don’t understand the words they generate; they merely predict patterns based on training data. When they produce falsehoods, it’s not because they intend to mislead—it’s because they don’t know what’s true.
Anthropomorphizing AI not only distorts public understanding but also obscures the real challenges of designing and using these systems effectively. Instead of framing these errors as lies, it’s more accurate to view them as the byproduct of RLHF. LLMs are designed to sound human, not to be infallible repositories of truth. This distinction matters because it shifts the focus from moral judgments about AI to practical strategies for managing its limitations.
Yet, it’s also essential to recognize the flip side. The same sophistication that enables bullsh*tting also allows these models to provide remarkable benefits. They can draft essays, translate languages, and assist with complex problem-solving tasks—all within seconds. For example, researchers have used LLMs to accelerate scientific discovery by summarizing vast bodies of literature and generating new hypotheses (Pearson, 2024), while another firm is using AI to potentially identify new patent opportunities.5
The challenge lies in balancing these capabilities with safeguards that reduce the likelihood of acting on misinformation. Safeguards could include educating users about the limitations of AI and ensuring human oversight in critical decisions (something I’ve written about previously). In high-stakes contexts, these safeguards aren’t optional—they’re essential. One potential safeguard involves improving the transparency of AI confidence in its outputs.6 Instead of presenting all responses with equal authority, LLMs could indicate how certain they are about their predictions, helping users assess when additional scrutiny is needed.
Conclusion
Each iteration of an LLM brings us closer to realizing the promise of AI—but it also brings new challenges. These models aren’t liars, but they are increasingly skilled bullsh*tters, generating plausible-sounding falsehoods with unsettling ease. Understanding this trade-off is key to using AI responsibly.
The solution lies not in rejecting these tools but in embracing a partnership between AI and human expertise. With careful oversight and a commitment to accuracy, we can better ensure that as AI becomes more sophisticated, it remains a tool that enhances, rather than undermines, our understanding of the world.
References
1. An example discussed by Zhou et al. would be providing an LLM a series of numbers to add and then receiving a response that amounted to either a refusal (e.g., As a large language model, I am not programmed to perform mathematical operations involving such large numbers) or one that does not conform to what was requested (e.g., a lot of power).
2. Rebecca Sohn. Biased AI can make doctors' diagnoses less accurate. LiveScience. December 21, 2023. Note that recent evidence also suggests the opposite as well—that AI can sometimes do a better job, but doctors need to know how to actually use it well. This just adds further evidence that AI has a role but the human-AI interface still needs work.
3. This one is a more hypothetical example to show application to other fields besides just medicine and law.
4. Jacek Krywko. The more sophisticated AI models get, the more likely they are to lie. ARS Technica. October 4, 2024.
5. Lucas Laursen. This AI-Powered Invention Machine Automates Eureka Moments. IEEE Spectrum. October 8, 2024.
6. Rupa Chaturvedi. Design human-centered AI interfaces. Reforge. More transparency in confidence is one of several recommendations made by the author for making AI more usable.