DeepMind on the Brain’s Dopamine System and AI

Study shows AI distributional reinforcement learning also applies to the brain.

Posted Feb 19, 2020

Source: GJD/Pixabay

Artificial intelligence (AI) researchers strive to advance machine intelligence by applying theories and concepts of human intelligence for learning, motivation, memory, reasoning, and more. There are concepts in artificial intelligence, namely neural networks, that are somewhat inspired by the brain. What is not expected is the reverse—using machine learning concepts to help explain how the biological brain works. In a recent unexpected twist, artificial intelligence (AI) researchers apply distributed reinforcement learning to neuroscience in order to better understand the interplay of dopamine, pessimism, and optimism in the brain.

In January 2020, research scientists at DeepMind published peer-reviewed research in Nature that applied machine learning concepts to neuroscientific experiments. The findings reveal similarities between the biological brain’s dopamine system and AI distributional reinforcement learning algorithms. These discoveries may validate recent advances in AI machine learning, and advance neuroscience in the areas of motivation and mental health.

The researchers team of Will Dabney, Zeb Kurth-Nelson, Naoshige Uchida, Clara Kwon Starkweather, Demis Hassabis, Remi Munos, and Matthew Botvinick set out to test their hypothesis that the biological brain’s dopamine-based reinforcement learning reward prediction can be characterized by a probability distribution, versus a single mean, that represents multiple future outcomes simultaneously and in parallel.

Dopamine is a neurotransmitter and hormone that impacts pleasure, learning, locomotion, cognition, emotion, working memory, motivation, and pain processing, among other functions. A common denominator among schizophrenia, amphetamine addiction, and Parkinson’s is the brain’s dopamine system.

Reinforcement learning is a concept that applies across many disciplines such as psychology, economics, behavioral research, education, game theory, information theory, operations research, swarm intelligence, and genetic computer algorithms.  Examples of reinforcement learning algorithms include Monte Carlo, Q-learning, SARSA, Deep Q Network (DQN), and many others.

American psychologist, behaviorist, and American Psychological Association Lifetime Achievement Award recipient B.F. Skinner put forth the concept of operant conditioning in the 1930s, a concept where behavior is shaped by the consequences of reinforcement or punishment, and changes in behavior are due to a response to events in the environment.

AI reinforcement learning is a type of machine learning where algorithms are trained by interacting with its environment via a system of reward and punishment. The agent seeks to maximize reward and minimize penalty. Deep reinforcement learning combines deep neural networks with reinforcement learning architecture.

The reward prediction error (RPE) theory of dopamine explains how the brain represents reward and value. To predict the reward, the temporal difference learning (TD) algorithm was developed. It works by taking the prediction of an immediate reward with its prediction of the immediate next upcoming reward. When new data arrives, any difference is used to adjust the old prediction to the new one, thereby refining the accuracy of the predictions to actual outcomes.

Researchers P. Read Montague, Peter Dayan, and Terrence J. Sejnowsk published in The Journal of Neuroscience in 1996 their findings “how activity in the cerebral cortex can make predictions about future receipt of reward and how fluctuations in the activity levels of neurons in diffuse dopamine systems above and below baseline levels would represent errors in these predictions that are delivered to cortical and subcortical targets.” This suggested that the brain uses a temporal difference learning algorithm. Since then, this concept has been widely accepted in the neuroscience community.

In computer science, distributional reinforcement learning algorithms have improved reinforcement learning in neural networks. Unlike the temporal difference learning (TD) algorithm, distributional reinforcement learning algorithms use a range of predictions that captures the full probability distribution over future rewards.

Mathematically, it seems intuitive that capturing the full probability distribution would provide richer learning than using a single quantity that is the average overall potential reward outcomes, weighted by respective probabilities. The DeepMind researchers put this to the test.

“We hypothesized that the brain represents possible future rewards not as a single mean, but instead as a probability distribution, effectively representing multiple future outcomes simultaneously and in parallel,” wrote the DeepMind researchers. “This idea implies a set of empirical predictions, which we tested using single-unit recordings from mouse ventral tegmental area. Our findings provide strong evidence for a neural realization of distributional reinforcement learning.”

The DeepMind researchers trained five mice on a variable-probability test and six different mice on a variable-magnitude tasks. For the variable-probability, the mice were given one of four different smells, with a pause, then either a reward (3.75 μl water) or penalty (air puff). The chances of reward vary (odor 1 at 90%, odor 2 at 50%, and odor 3 at 10%). The chance of an air puff with odor 4 is 90%. The odor meanings were randomized. For the variable-magnitude, in 90% of the trials the reward magnitudes were delivered randomly (0.1, 0.3, 1.2, 2.5, 5, 10 or 20 μl water), and for the other 10% of trials, a smell was cued to indicate no reward. For 50% of the trials, the reward had an odor signal to indicate that a reward was coming, but not the magnitude, and the other half had no odor cues.

The team used optogenetics to conduct their observations. Specifically, to track dopamine neurons during the recording, VTA neurons in transgenic mice were tagged with channelrhodopsin-2 (ChR2).

The predictions of the distributional TD model closely reflected the responses of the brain’s dopamine cells to the seven different reward magnitudes. Different dopamine cells showed different amplifications. Dopaminergic neurons were calibrated for different levels of optimism or pessimism, and operated together as a whole in a manner similar to distributed reinforcement learning.

The DeepMind researchers wrote that their findings “provide strong evidence for a neural realization of distributional reinforcement learning,” and that this may open the path for future neuroscience research as there may be implications from the distributional hypothesis of dopamine for the mechanisms of mental disorders such as addiction and depression.  And that is how the interdisciplinary fields of mathematics, behavioral psychology, optogenetics, statistics, and AI machine learning in combination are contributing to discoveries in neuroscience.

Copyright Cami Rosso 2020 All Rights Reserved.