The Premack Principle is named after David Premack, a psychologist who did some research that turned out to be very useful to dog trainers – and anyone else trying to train in the real world, rather than controlled laboratory conditions.
Premack defined a reward as “the opportunity to engage in a preferred activity”. The preferred activity will vary from time to time for one individual as well as between individuals. Your dog can like various activities, but the key to whether something is rewarding or not is whether it is a higher preference at the time to the behaviour you want to reinforce.
According to Premack, a reward or positive reinforcer is not a “thing”. Try not to think of it as a piece of food or a ball. In practice, this means that a reward is not always a reward i.e. food is only a reward if your dog wants it more than any other available option at this moment. I always think of Ian Dunbar’s very funny send-up of a frustrated dog trainer furiously trying to cram fillet steak down the bloody dog’s throat, when the dog prefers to play rather than come when called. In that case the steak could actually function as a punishment rather than a reward.
Premack’s research included the “candy and pinball machine effect” where children could freely choose how much time to spend on each activity (playing or eating) to establish their “pre-training preferences”. Then Premack introduced a contingency (“if you do X you can have Y”) and found that giving the children limited access to their first preference, as a reward for spending some time on the other activity, could be used to reinforce the second activity.
This has led some commentators to describe the Premack Principle as “grandma’s law” i.e. “eat your greens first, then you can have dessert”. Other versions of this idea are “eat your dinner, then you can watch TV” or “do your homework first, then you can play”. However, this explanation does not capture the full significance of the Premack Principle. All it suggests is the principle of contingency, i.e. that the reward is contingent upon the desired behaviour being performed, which is simple positive reinforcement. (Look at the Bailey Basics – a reward must be contingent). There is more to Premack than simple contingency.
In my case, I can go down the street and the library is on my left, the swimming pool on my right. I could spend hours in the library, even though it would be beneficial to go across the road to the swimming pool. I like swimming but hardly ever do it. If I had to dip a toe into the water before being allowed into the library, my rate of swimming would increase, I would be choosing to do it more often. I have a friend who likes to exercise but has an aversion to libraries. He actually wants the information contained in books, but feels sick, bored and tired as soon as he enters a library or classroom. In his case, one could reward him for opening a book by letting him into the gym. I end up exercising more and liking it more, as a result of being given a library pass, while my friend ends up reading more and liking it more as a result of being given a gym pass.
In dog training, the principle leads to assessing the value of rewards relatively rather than absolutely. Write down your dog’s top three rewards, and then produce a different list for different environments – at home, at dog training class, and out in the park. You need to be aware of what difference it makes to the dog – for example, when out in the park, what difference does it make whether you use food, praise, play with toy, release to run around or release to play with dogs as the reward for coming when called? Not only can access to engage in the preferred activity reinforce the dog for engaging in a lesser preference, but attempting to use a lower preference as a reward can have a punishing effect.
The important issue though, is the point made earlier. You may start off with an artificial primary reward to reinforce a behaviour that you want (which the dog is offering less that you would like). If you control your dog’s access to his or her favourite activity, and use it to reinforce the other activity, the dog will offer you more of the other behaviour. However, in time, not only will be behaviour increase, but your dog’s preferences will shift. Your dog will actually want to come to you more in the park as a result of being rewarded by being allowed to go off and play. I will start to enjoy my exercise classes, rather than just regarding them as a means to an end.
In summary, Premack’s contribution was to say:
- a reward is not just a thing – a reward is the opportunity to engage in an activity – not just a piece of food, but the opportunity to eat, not just a ball, but the opportunity to play
- there are usually (in real life as distinct from a skinner box) many activities available, all of which can be rewarding, but preferences change
- if you find out a person’s or a dog’s preferences (some would rather play with friends, some would rather play ball by themselves, some would rather have a snack) you can then use the favourite activity to reinforce (and therefore increase the amount of time the person or dog chooses to spend on) the second preference
- a less preferred activity, even though it might be rewarding in other circumstances, will not reinforce a preferred activity, in fact it will punish and reduce it
Reward-based training in everyday life requires more than just food rewards. If you are walking your dog and you want to teach your dog not to pull on the lead, all other things being equal, you can give your dog a food reward for walking nicely. However, as soon as your dog starts to drag you towards an interesting bush to sniff or pee on, the game has changed. It’s no good offering your dog food if pulling achieves the major reward of reaching the bushes. The way to turn this situation to your advantage, using the Premack Principle, is to restrict your dog’s access to the bushes, by stopping or moving away when your dog pulls, and releasing your dog to go and have a sniff as a reward for not pulling. In other words, your dog no longer has free access to go anywhere – going to the bushes (the numero uno activity) is contingent upon walking on a loose lead (a lesser preference).
W.A. trainer Sue Hogben posted on the OZObedienceTrialling list about one of her dogs that “he didn’t naturally play tug, but it was shaped using food. Other dogs I own weren’t good eaters but were very enthusiastic tuggers so they learnt that if you eat the treat you get to play tug so they learnt to love food treats.”
My comment: Classic Premack!!! Some dogs like candy, other like pinball machines. Which activity reinforces the other depends on the dog’s preferences if given a free choice.
From Wikipedia: Premack’s Principle
From Wikipedia: David Premack
Premack’s first publication (1959) was a new theory of reinforcement (which became known as Premack’s principle). It argued that the more probable response in any pair of responses could reinforce the less probable response—demonstrating that reinforcement is a relative, not an absolute property.
This theory predicts six conditions, all of which have been supported by evidence:
- Reinforcement is a relative property. Responses A, B, C have a descending rank order of probability. A will therefore reinforce both B and C. C will reinforce neither. This suggests that reinforcement is an absolute property. However, B corrects this view. B will reinforce C, but not A. B is both a reinforcer and not a reinforcer. Reinforcement is therefore a relative property.
- Reinforcement is a reversible property. When drinking is more probable than running, drinking reinforces running. When the probabilities are reversed, running reinforces drinking.
- Historically, consummatory responses, eating and drinking, have served exclusively as reinforcers, but consummatory responses are, like any other response, subject to reinforcement.
- Reinforcement and punishment, traditionally contrasted as opposites, are in fact equivalent except for sign. If response A leads contingently to B, and B is more probable than A, A will increase in frequency (reinforcement); conversely, if A leads contingently to B, and B is less probable than A, A will decrease in frequency (punishment). The major contrast is not between reward and punishment; but between reward and punishment as contrasted with freedom. Freedom is the condition in which stimuli are freely (not contingently) available to an individual.
- When motorized running is more probable than lever pressing but less probable than drinking, then running reinforces lever pressing and punishes drinking. In other words, the same response can be both a reinforcer and a punisher – at the same time and for the same individual.
- The equivalence of reinforcement and punishment is further suggested in this interesting fact: rats are either sensitive to both reinforcement and punishment, or insensitive to both; they are never sensitive to one but insensitive to the other.
Kaye’s Note: Wikipedia says “Freedom is the condition in which stimuli are freely (not contingently) available to an individual.”
This is what I mean by the Autopilot (be a dog) Level of Control. This is why it is so important to have a special release word to release your dog to Autopilot. It is highly motivating, because your dog can choose the best reward the environment has to offer and this will reinforce a less preferred activity. So you should control that contingency and make the most of it. This means your dog pays attention to you, comes to you, settles down, sits, stops pulling (or whatever you want) BEFORE being released to go free in the park.
Let me give you an update.
Kerryn does Flyball with her Spoodle, Bailey. Bailey likes playing with a ball. Bailey races out over the hurdles to get the ball from the box. Unfortunately Bailey ends up with only moderate times because he always comes back slowly. If only he came back as fast as he went out! Kerryn rewards him when he gets back with gourmet roast chicken, so she can’t understand why he continues to be so slow.
My interpretation is this – in that environment, Bailey’s most highly preferred activity is playing with the ball. Getting the ball is contingent on going out over the hurdles. This behaviour gets reinforced, with the result that Bailey does it more enthusiastically. Returning over the hurdles has the consequence that Bailey loses the ball, and therefore the opportunity to engage in his highest preference activity. Losing something high value is “negative punishment” in terms of the quadrants of operant conditioning.
Running back with the ball is followed by being given a piece of roast chicken, which is a lower preference than playing ball. Eating roast chicken in this context is a punishment. A lower preference activity punishes a higher preference activity. As a result, coming back with the ball may not have been eliminated altogether, but Bailey is doing it less enthusiastically, slowly.
There you have it. If you want Bailey to come back at lightning speed, STOP PUNISHING HIM!!