Verified by Psychology Today

AI Embodies Our Education Standards, For Better or Worse

AI aims for a passing grade. It is up to educators to decide what that means.

Educators everywhere are freaking out, confused, or intrigued, and sometimes all three at the same time, regarding the invasion of AI into classrooms. It is still too early to know how this will pan out. But at some level, we educators bear responsibility for its ultimate form. That is because we are the ones who define what is “smart” and what is "correct."

Every educator assesses their students in one way or another. It might be a letter or number grade on an exam, a threshold specification for acceptable homework completion, a written evaluation based on classroom performance, or myriad other instruments. Whatever it is, our assessment is ultimately compared to a standard of knowledge. Often this means defining what constitutes three things: unacceptable work, acceptable work, and exceptional work. This is important for AI makers, since they, like educators, are in the business of building knowledgeable entities.

Obviously, AI makers do not want products that produce unacceptable work (though there is plenty of that). And exceptional work is hard to do: it’s best left to the experts: computer models trained to do just one thing, such as chess-bots (more on them below). "Expert systems" of this kind operate on different principles than LLM chatbots and the like that typify modern AI.

What modern AI strives for is acceptable work. Indeed, why would educators have defined a level of acceptability in students’ work except if the standard has some bearing on personal growth, moral or ethical rightness, or economic utility? The goal of AI is to meet this standard—nothing more, nothing less. In doing so, it will undoubtedly upend every industry to a greater or lesser extent, since every industry employs people who have been educated according to some standards.

Educators are the ones who get to decide what will be acceptable for AI. This will be determined via millions of small decisions we make every day about what is acceptable work and knowledge.

The US isn't going to be the only one exercising this power. Educators—and often their government overseers—in other countries and cultures define their own standards, which will define the success or failure of their AI. In China, for example, AI standards of knowledge may very well be higher than in the US, simply because the definition of acceptable school work is often higher there, especially in STEM fields. However, educational standards in China also dictate that certain topics, such as the Tiananmen Square Massacre of 1989, are off-limits. In The Guardian, DeepSeek complies with this content standard, while Western chatbots provide accurate descriptions of these kinds of events. The point is that many dimensions come into play in defining what constitutes acceptable knowledge.

Maybe this has been obvious to other people, especially non-educators, but this is just dawning on me. I suppose the good news is that I and every other educator have a kind of superpower I had not recognized before. If you want to know how AI will eventually be utilized in a given culture, look to its educators and educational standards.

***

As I have written, YouTubers have been having great fun recently showing how poorly chatbots play chess. I pointed out that this is ironic since the modern wave of computer intelligence was launched when computer programs of the "expert systems" variety defeated the human world chess champion in 1997.

International Master Levy Rozman (aka GothamChess) had a go with the DeepSeek model. He finds that DeepSeek, like ChatGPT and Bard, frequently makes illegal moves, and overall plays poorly. In his game against the chatbot, Rozman wins, but it can’t be called a game at all given DeepSeek’s deep confusion about the rules.

Rozman then sets DeepSeek against ChatGPT. In this case, both systems use fairly standard openings, which can be found in any of the thousands of websites and books about chess. The game proceeds rationally for a dozen moves or so.

But then the game descends into chaos, with both systems bringing back captured pieces and often being unaware of where pieces are on the board. ChatGPT comes out marginally better but both models ultimately go haywire.

Perhaps this is more evidence for my point about educators. The US and China are middle powers in the number of chess grandmasters they produce per capita, and neither country has wide incorporation of chess into formal education. In contrast, as of 2016, Russia required 33 hours of chess study for all first graders. If Russian scientists ever build a globally competitive LLM, I predict it will be better at chess than US or Chinese models.

Special thanks (again!) to Wesley for inspiring this post!

Copyright © 2025 Daniel Graham. All rights are reserved, including those for text and data mining, AI training, and similar technologies. For reprint requests, email reprints@internetinyourhead.com.

More from Daniel Graham, Ph.D.
More from Psychology Today
Most Popular