Fear

Spot the Chatbot: Study Provides Reasons for Hope, Fear

A New York Times article revealed experts were unable to spot ChatGPT essays.

Posted January 22, 2023 | Reviewed by Jessica Schrader

Key points

A recent New York Times article revealed experts were unable to spot essays written by ChatGPT.
Educators worry that AI-assisted writing might lead to widespread plagiarism and the decline of writing skills.
Patterns of error in writing are seldom consistent, unlike the work of chatbots, suggesting fine detail is useful in assessing writing.

ChatGPT, a chatbot powered by artificial intelligence (AI), is less than two months old but is already terrorizing teachers who now envision ninth graders and college freshmen alike submitting ChatGPT’s essays in lieu of their own.

ChatGPT can churn out an essay in perhaps 20 seconds, far faster than finding and copying an essay online or cribbing from SparkNotes. For the cribbers and plagiarizers, the antidote has long been Turnitin.com, software that draws from a database of millions of texts and assignments, which handily tags each bit of borrowed prose with its origins, right down to the college or location of the high school where the content was submitted. Technology giveth, and technology taketh away.

However, now, teachers fret over the prospect of AI becoming undetectable—as ChatGPT can generate essays de novo, without copying strings from online works, fears confirmed by a recent feature in the New York Times. But those fears, at least for now, are a bit premature—even if AI like ChatGPT requires teachers to pay more attention to their students’ writing.

Four journalists created an experiment that assessed whether a panel of experts could tell the difference between essays written by fourth- and eighth-grade students and those written by ChatGPT. Predictably, the experts failed, a bit like the hapless recruits once asked to distinguish between a psychotherapist and ChatGPT’s great-grandmother, Joseph Weizenbaum’s Eliza, the first chatbot that promised to pass the Turing test by behaving the way a human would. However, Eliza merely drew off scripts and used pattern-matching to interact with users, famously mimicking a Rogerian psychotherapist to fold users’ responses into questions. In contrast, ChatGPT can write music, converse, write essays, and as a recent post in Psychology Today mentioned, even write essays in the style of anyone who has published any text online.

Yet ChatGPT’s essays should have failed to flummox the New York Times’ panel consisting of a fourth-grade teacher, a writing tutor, a Stanford professor of education, and YA novelist Judy Blume. The Times helpfully turned its feature on ChatGPT into a test that invites readers to spot the ChatGPT-generated essay. Despite the fumbled guesses by the panel, ChatGPT’s efforts are fairly easy to detect, even when its instructions included making a few typos. First, ChatGPT’s responses are peculiarly generic, but consistent in their handling of sentence structure, grammar, usage, and punctuation. For example, in one of ChatGPT’s essays, written as a fourth grader, every noun appears alongside an adjective in one paragraph:

I like to bring a yummy sandwich and a cold juice box for lunch, and sometimes I'll even pack a tasty piece of fruit or a bag of crunchy chips.

In contrast, the actual fourth grader's essay spares the adjectives entirely:

First I eat my sandwich then I open my drink, then eat my fruit and last but not least my treat… We usually play four square or play on the play ground. If we are not on the playground or on the four square ground we are on the field playing tag, kickball, or soccer.

No paired adjectives, just stark nouns for lunch, followed by the kind of inconsistent handling of playground, which is first two words, then one. ChatGPT is a paragon of following rules; kids aren’t.

Moreover, ChatGPT, like all things digital, is an engine of consistency. While the NYT journalists instructed ChatGPT to include several typos in the eighth grade essays, ChatGPT is poor at emulating the irregular patterns of error that students make, as anyone using the chatbot will discover. Ask it to make grammatical and punctuation errors in an essay—something few students would request, even if they want to ensure their essay seems authentic—and ChatGPT responds with comma splices in every sentence and omitted commas before coordinating conjunctions. But students are seldom so consistent in their use of punctuation. In fact, they frequently catch and correct some errors while failing to see others—consistent with their incomplete mastery of the rules of punctuation, as well as cursory to non-existent proofreading after they finish writing.

Back to ChatGPT’s tells and the bewildered panel’s reactions to the essays, which offer striking insights into the broad ways teachers respond to students’ writing that left the panelists unable to distinguish between the chatbot and the flesh-and-blood kid. The Stanford professor of education believes a chatbot would never use extended dialogue in an essay, but then fails to spot a particularly glaring series of tells, in the same paragraph:

The man chuckled. “I understand your confusion, Madam President, but the fact remains that you are now the President of the United States. You were chosen by the previous president to be his successor in the event that something were to happen to him.”

Here, again the consistency of the correct punctuation suggests a bot, not an eighth grader, as does the subjunctive mood, in the event that something were to happen to him, which even savvy ninth or 10th graders, at best, would associate with if, rather than the much vaguer cue for the subjunctive, in the event that.

Still terrified that your students will start turning to ChatGPT for their essays? Try using the GPT-2 Output Detector, based on the AI precursor to ChatGPT, which rapidly and accurately calculates the odds that an essay was produced by a bot, rather than a kid, especially if the sample runs to more than 50 words.

Fear Essential Reads

How to Overcome the Fear of Death

Torchlusspanik: The Fear of the Gate Closing

Technology giveth, and technology taketh away.