Everyone knows that giving a dog a reward for responding in the correct way during training changes his behavior. For example, when we are lure training a dog to sit, we move a treat over a dog's head and toward its back while we give the command "Sit." In order to keep his eyes on the treat, the dog rocks back into the sitting position. Once the dog is in the correct position, we give him that treat. After a few repetitions of this action, we find that the dog now responds to the "sit" command by sitting.
Dog trainers take it for granted that giving the dog rewards has changed his behavior, but behavioral scientists still want to know the mechanism for why and how this works. A new study headed by Molly Byrne at Boston College suggests that there is a very simple bit of behavioral programming, most likely genetic, which accounts for the effectiveness of training rewards.
Let's take a step back and see what is really involved in dog training. Dogs, like most living things (including people), are behavior emitters. That's just a technical way of saying that they do things, lots of different things. The trick involved in training a dog is to get him to emit the specific behavior that we desire, such as sitting on command, and to avoid emitting other unwanted or unneeded behaviors, such as lying down, spinning in circles, jumping up, and so forth. But of course, when you start training, the dog has no clue as to what you want. There are so many different behaviors that he can produce.
The same thing goes on in problem-solving. There is only one behavior that will solve the problem and all of the other behaviors are irrelevant. For example, suppose you have arrived at a garden gate. You push the gate to open it, but it doesn't work. Do you continue to push at the gate? Of course not. You try something else — let's say pulling the gate. It still doesn't work. So you don't continue pulling the gate; instead, you try yet another behavior. This time you lift the latch so that the gate can swing open.
The next time you encounter this gate, you will not push or pull it. Since you have been rewarded for a specific behavior previously, you will immediately reach for the latch to open it. You are engaging in what psychologists call a "win-stay-lose-shift" strategy. This means that if you try a behavior and it doesn't grant you the reward you desire, you don't do it again but rather try a different behavior. If you try a behavior and it allows you to get the reward you want, then you repeat it. If this simple cognitive strategy was genetically wired into dogs, it would guarantee that we could use rewards as a means of training them. This would certainly work in training the dog to sit, since when he sits on command he gets the reward (hence the sitting behavior is repeated) while other behaviors are not rewarded and the dog doesn't repeat them.
To determine whether dogs have this win-stay-lose-shift cognitive strategy the Boston College research team tested 323 adult dogs with an average age of about three years. The dogs were first shown that if they knocked over a plastic cup they could obtain a food reward hidden under it. Next, they were presented with two plastic cups, open-side-down, on a surface in front of them, one to the left and another to the right side of the field. Now only one of the cups contained a treat while the other did not. The dogs were released and allowed to choose one of the cups. If dogs have this win-stay-lose-shift strategy, then if on a particular trial, they knock over a cup and it has a treat under it we would expect that the next time they are offered the same choice they would select the cup on the same side of the field where they found that reward (win-stay). While if there was no reward they should change their behavior and select the cup on the opposite side (lose-shift). In fact, that is what they did, and approximately two-thirds of the dogs chose the same side that had been rewarded previously, while if there had been no reward then on the next trial nearly 45 percent shifted to the opposite side.
Now the question remains whether this win-stay-lose-shift behavior is a strategy which the adult dogs have learned to be useful over their lifetime, or whether it is part of their genetic wiring. To answer this, the research team conducted an identical set of tests using a set of 334 puppies who were between 8 and 10 weeks of age. The results were nearly identical, so when a cup that the puppy selected had a treat under it, then on the next trial, approximately two-thirds chose the cup on the same side that had been rewarded before. In contrast, if there had been no reward for the prior choice nearly half of all of the puppies shifted to the other side on the next trial. Because this behavioral strategy appears so early in the life of a dog, a sensible guess is that it is a genetically coded canine behavior predisposition.
So it seems like the mystery of how rewards serve as an effective means of training dogs is solved because a very simple strategy has been wired into canines. It says, "If something you have done has given you a reward, repeat it. If not, try something else." It is a remarkably simple bit of behavioral programming, but it works, and it allows humans to successfully use rewards to train our dogs.
Copyright SC Psychological Enterprises Ltd. May not be reprinted or reposted without permission.
Molly Byrne, Emily E. Bray, Evan L. MacLean, Angie M. Johnston (2020). Evidence for Win-Stay-Lose-Shift in Puppies and Adult Dogs. http://scholar.google.ca/scholar_url?url=https://cognitivesciencesociet…