The Achilles' Heel of AI Computer Vision

The binding problem of AI and the neuroscience.

Posted Jan 16, 2019

Source: pixabay

Would you ride in an autonomous vehicle if you knew that it was subject to visual problems? How about undergo cancer treatment based on a computer interpretation of radiological images such as an x-ray, ultrasound, CT, PET, or MRI scan knowing that computer vision could easily be fooled? Computer vision has a problem–it only takes slight changes in data input to fool machine learning algorithms into “seeing” things wrong.

Recent advances in computer vision are largely due to the improved pattern-recognition capabilities through deep learning, a type of machine-based learning. Machine learning is a subset of artificial intelligence where a computer is able to learn concepts from processing input data either through supervised learning where the training data is labeled, or not as in unsupervised learning or a combination without explicit programming. The depth of deep learning refers to the number of artificial neural processing layers in its neural network.

A team of artificial intelligence (AI) researchers with Kevin Eykholt, Ivan Evtimov, and additional researchers from the University of California Berkeley, the University of Michigan, Stony Brook University, and the University of Washington discovered that it only takes slight changes to a stop sign using black and white stickers is to cause state-of-the-art deep neural networks (DNNs) to misclassify images. The team published their findings in April 2018 in arXiv.

One of the current drawbacks to deep learning is the large amounts of data required for the computer for training. In sharp contrast, once a child learns what a bird is, she or he can easily identify an animal as a bird without having to learn all of the different species of avians in existence.

Various regions of the brain process different types of input. For example, the parietal lobe is the area of the brain where sensory input for touch, temperature, and pain are processed. The occipital lobe interprets vision. The temporal lobe plays a role in hearing. Given different regions of the brain process sensory input in different areas, how does it form a unified experience? This describes the binding problem.

For example, when a jet airplane high up in the sky passes overhead, the brain knows that the swooping sound corresponds to it. The brain recognize that the wings, tails, fuselage and white contrail (condensation trail) belong to the jet, and not the surrounding sky, sun or background clouds. Somehow, the human brain is able to intake various sensory inputs data such as sight, sound, taste, smell and touch, and compose a cohesive experience. Yet it is a mystery to scientists exactly how the brain does it.

British mathematician and neuroscience professor Simon Stringer of the Oxford Foundation for Theorectical Neuroscience and Artificial Intelligence is currently researching for neurons in the brain that act as “binding neurons” and has ambitions to bestow “rat-like intelligence on a machine within 20 years.”

For now, the workaround for AI researchers is to aim to achieve good performance on average when it comes to correctly interpreting visual images.

“The eye sees only what the mind is prepared to comprehend.” – Robertson Davies

Copyright © 2019 Cami Rosso All rights reserved.


National Geographic. “Brain.” Retrieved 1-16-2019 from

Eykholt, Kevin, Evtimov, Ivan, Fernandes, Earlence, Li, Bo, Rahmati, Amir, Xiao, Chaowei, Prakash, Atul, Kohno, Tadayoshi, Song, Dawn. “Robust Physical-World Attacks on Deep Learning Visual Classification.” arXiv: 1707.08945v5. 10 April 2018.

Geddes, Linda. “The ‘weird events’ that makes machines hallucinate.” BBC. 5 December 2018.

More Posts