The Not-So-Hidden Bias of AI

Why artificial intelligence is biased, and what we can do about it.

Posted Jan 07, 2020

Photo by Markus Spiske on Unsplash
Source: Photo by Markus Spiske on Unsplash

Is AI prejudiced? Is deep learning biased? The short answer is yes. Because without safeguards, machines simply replicate what humans do, and humans themselves are biased. 

At the Learning Agency Lab, we’ve increasingly been working on issues of artificial intelligence and natural language processing in education, and we’ve seen something missing in the debate over bias within AI-based tools.

Too often, people forget that without clear checks, AI will develop forms of bias. Indeed, the “natural” approach of machine learning is to develop a bias.

The reason for this issue is simple. AI models try to “imitate” humans, and humans are biased. We are a deeply social species, one that easily defines “in” and “out” groups based on made-up social distinctions. In other words, we learn biases easily and quickly. 

At the same time, almost all AI technologies rely on large sets of training data that have been created by people. People often have biases, both conscious and unconscious, and these biases easily slip into algorithms.

In some cases, the process of building an algorithm might amplify the bias present in human scorers. In other cases, the process might attenuate the bias. But without some checks, the patterns will pass from humans to machines. 

These biases can have a dramatic impact. Some Amazon technology tools cannot identify darker skin tones or consider that Google Photos has auto-tagged black people as gorillas. 

Sometimes the bias is subtle. Many natural-language-processing programs—basically ed-tech tools for writing—tend to be trained on “standard” or “professional” English and do not account for valid dialects, like African-American Vernacular English. 

In education and the science of learning space, bias can have a particularly large effect for a number of reasons, often due to the complexity of schooling and the nature of context in learning engineering

But consider that the testing company ETS estimates that one of their AI tools created a whopping difference of over two points (on a seven-point scale) between the group that the algorithm was most biased in favor of (test-takers from China) and the group that the algorithm was most biased against (African American test-takers) when compared to expert rater evaluations.

Follow-up research suggested a reason why the ETS program was so biased: The algorithm over-valued textual complexity and under-valued the development of ideas. For instance, it was fooled by “shell text”—text memorized and inserted into essays, but largely devoid of content. 

Test-takers from China frequently began essays with two or three sentences that they had memorized beforehand, followed by repetitive, less elaborate sentences that failed to develop an argument fully. Expert human raters were less impressed by such tactics. Expert human raters were also more open to departures from the traditional five-paragraph essay format.

The ETS solution to this problem has been to keep humans in the loop. Others should take a similar approach and use humans when designing systems that are interactive. Keeping a human in the loop can improve the performance of systems by bridging the gap between humans and machines. 

Another approach is identifying  (and removing) biased ratings from the AI training set. In this approach, subjectivity lexicons are built and used to determine the biased language in ratings. The identified biased ratings are then removed from the training data, resulting in improved automated scores   

Another central approach is to mitigate bias at the start of building new models. One way to do this is to include a domain-specific data set in addition to the training data. This additional data set can function as background knowledge for the machine. Essentially, it provides the machine with data that we humans may consider “common sense.”   

Still, there are many open questions for future research to address. Are there better ways of measuring bias? Are there particular approaches that can minimize bias? Can bias be eliminated altogether?

Open data sets and transparent development methods are a few ways we can begin to address these questions. In order to create systems that humans trust, it is crucial that processes are put in place to mitigate discriminatory consequences.