How Do We Learn From Our Mistakes?
Insights from the theory of reward prediction error.
Posted Jan 06, 2020
Suppose you’re at a vending machine and you insert a dollar bill to get a candy bar. But instead of one candy, as you would expect, you receive two candies. You think: “This is a pleasant surprise, better than I expected.” As a result, your dopamine (pleasure chemical in the brain) response goes up. If you receive one candy, as expected, there is no change in the dopamine. On the other hand, dopamine neurons show depressed activity when you get no candy.
The basic idea in this example is that neurons release dopamine in proportion to the difference between the expected and realized rewards of a particular event. Unpredictable rewards cause more dopamine release than predictable ones. And more dopamine means more pleasure.
Part of the appeal of live sporting events is their inherent unpredictability. People keep coming back as if addicted to the joy of experiencing unexpected rewards. In fact, gambling is designed to produce surprising rewards. The gambler is buying the prospect of a positive surprise. The anticipation of an uncertain reward, as well as the reward itself, causes intense excitement. The habituated pursuit of that excitement can lead to addiction.
The pleasant surprise is the positive reward prediction error. Schultz (2016) explains that a prediction error exists when the perceived reward that is received is different from what is predicted. The error is the difference between what is expected and what actually occurs. We desire positive prediction errors and hate negative prediction errors.
The purpose of the dopamine surge is to make the brain pay attention to new and potentially important stimuli. When the stimulus ceases to be novel, we become accustomed to it. If one is told that they will get any prize for sure and is then given that prize, there is no surprise and no dopamine release.
The sensitivity to unexpected outcomes plays a key role in our ability to learn new things every day. We learn whenever anything unexpected happens but not when things are predictable. In contrast, highly predictable environments can lead to reduced attention and lowered arousal (sleepiness).
For example, a better-than-predicted meal in a restaurant will teach us that a meal is different than predicted, and we better adjust our prediction of good food in that restaurant. In a classroom setting, students are more persuaded by a surprising explanation that goes against their held expectations. Nothing focuses the mind like a surprise.
Many loud noises (e.g., cars sirens, slamming a door, or even someone yelling on the street) can startle veterans who associate loud sudden noises with immediate danger. However, eventually, they learn that loud noise signals something harmless. Thus the prediction error functions to update expectations about future events.
When the learned rule is violated, the dopamine neuron responds. The reward system gets the message that old rules don’t apply anymore and it may be time to learn a new association.
In short, how much we learn in life depends on how big the difference is between what we expected and what actually happened. Although mistakes are usually poorly regarded, they nevertheless help us to get a task right at the end and obtain a reward. If no further error occurs, the behavior will not change until the next error.
The theory also has unwanted side effects. We become familiar with the things around us, and reaching a goal can make us unhappy. New activities are exciting at first but then become boring. Additional material goods and services initially provide extra pleasure, but it is usually temporary. The extra pleasure wears off.
Habituation is similar to tolerance to drugs. The habituation drives us towards always wanting more rewards. Nothing is ever as good as that first time. As humans, we get used to things. The trick is to keep habituation in check so that you can continue to savor the pleasure of the activities you really enjoy. To gain happiness is to learn how to desire things we already have. Buddha once said that the secret to happiness is to learn to want what you have and not want what you don’t have.
Schultz, Wolfram,(2016). Dopamine reward prediction error signalling: a two-component response. Nat. Rev. Neurosci. 17, 183–195.