Skip to main content

Verified by Psychology Today

Artificial Intelligence

Can Humans Detect Text by AI Chatbot GPT?

Scientists evaluate how well people can spot GPT AI-generated text.

Key points

  • Researchers have warned about the risks of potential fraud and spread of misinformation due to the use of AI-generated text.
  • The popularity of ChatGPT in business and academic contexts makes it important to understand how humans engage with AI text.
  • According to a recent study, humans are significantly better than random chance when it comes to determining AI-generated text.

The rise of powerful conversational artificial intelligence (AI) chatbots such as Generative Pre-trained Transformer (GPT) by OpenAI, amplifies the need to determine real versus fake text. A new peer-reviewed study evaluates how well humans can detect text generated by OpenAI’s GPT chatbot with more realistic scenarios than existing studies.

In February 2023, researchers at the University of Pennsylvania School of Engineering and Applied Science presented their study at the 37th Association for the Advancement of Artificial Intelligence conference.

Neural language models (LMs) are capable of generating increasingly natural-sounding text,” wrote lead author Chris Callison-Burch, Associate Professor in the Department of Computer and Information Science (CIS), along with the team consisting of Liam Dugan, Daphne Ippolito, Arun Kirubarajan, and Sherry Shi. “One growing worry is that bad actors may attempt to pass off automatically generated text as genuine.”

The researchers point out the risks of potential fraud and spread of misinformation with AI-generated false news articles and fraudulent reviews of products and services.

“These harms will inevitably become more and more prevalent as language models become better and cheaper to deploy,” the researchers wrote.

The use of AI-generated text by large language models (LLMs) is on the rise among educators, students, and professionals.

On Wednesday, a new study published by the Walton Family Foundation reveals that most teachers and many students are already using ChatGPT. The March 2023 survey of K-12 teachers and students between the ages of 12 to 17 years old reveals that 51 percent of teachers reported using ChatGPT and 33 percent of students have already used it for school.

Professionals are starting to use ChatGPT for work. A January 2023 survey of 4,500 professionals conducted by Fishbowl, a social network for professionals that was acquired by Glassdoor in 2021, shows that 27 percent already use ChatGPT to assist with work-related tasks and the highest adoption rates are in marketing, advertising, and technology sectors. Among those polled include professionals working at Google, Twitter, Amazon, Meta, IBM, Edelman, McKinsey, JP Morgan, Nike, and thousands of other companies.

“As text generated by large language models proliferates, it becomes vital to understand how humans engage with such text, and whether or not they are able to detect when the text they are reading did not originate with a human writer,” wrote the researchers.

A new study tests human detection of AI text

To answer this question, the team took a unique approach. Rather than test whether humans can detect whether or not an entire passage was generated by AI machine learning, the researchers adopted a more nuanced approach. The over 240 study participants were either senior undergraduates or graduate students taking an AI course at the University of Pennsylvania.

“In this paper, we study a more realistic setting where text begins as human-written and transitions to being generated by state-of-the-art neural language models,” the scientists reported.

To achieve the detection of the transition point, the researchers sought to measure human ability at the boundary-detection task rather than classification. In this manner, the researchers could evaluate the performance of various generative systems as well as quantify the AI model’s risks at the same time.

The scientists collected human annotations using RoFT (Real or Fake Text), a tool created by Dugan, Ippolito, Kirubarajan, and Burch for measuring human detection of AI-generated text that was presented at the Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. RoFT is set up as a game where players are shown a sentence at a time and earn points for guessing close to when the text was machine-generated and no longer human-written, then select a reason why they made that decision.

RoFT uses data from 1.8 million articles published by The New York Times during 1987-2007, 963 speeches given by presidents of the United States during 1789-2015, short stories from Reddit Writing Prompts, and recipes from Recipe1M+ dataset, and AI machine learning models GPT2, GPT2-XL, and CTRL.

In addition to producing valuable data for analyzing detectability, our study serves as the first large-scale attempt at using a gamified platform to analyze the detectability of generated text,” the researchers wrote.

In over 7.8 thousand different gaming rounds, the scientists collected over 42K annotations that were further filtered to produce a final dataset consisting of over 21K annotations over 7K continuations.

We found that players were significantly better than random chance at the boundary detection task, correctly selecting the boundary sentence 23.4 percent of the time (chance being 10 percent),” the researchers reported. “For rounds with at least one generated sentence, players selected a generated sentence as the boundary sentence 72.3 percent of the time.”

The researchers found there was a wide variance in player skill level and that accuracy can improve over time with additional instruction and extra credit proportional to their game score. Reading the help guide that contained tips and examples was the most predictive feature for an annotator.

“We show that, while annotators often struggle at this task, there is substantial variance in annotator skill and that given proper incentives, annotators can improve at this task over time,” concluded the scientists.

Copyright © 2023 Cami Rosso All rights reserved.

More from Cami Rosso
More from Psychology Today
More from Cami Rosso
More from Psychology Today