Schedules of Reinforcement

Just to recap what you’d have learnt in puppy classes;

reinforcing a behaviour makes that behaviour more likely to recur.

When we ‘positively reinforce’ a dog for a behaviour, we’re adding something of value to that particular dog as a consequence of the dog performing that behaviour.  In puppy class, the ‘reinforcer’ (reward) is normally food, as most dogs like food, and food is quick and easy to deliver, meaning you can practice more times within a given time period.  We normally couple the food with praise so that eventually the food and the praise can become interchangeable.

Why Reinforce?

Are we bribing the dogs or rewarding them?  No, we’re not bribing the dogs; rewards are earned for good behaviour, bribes are offered to avoid or stop bad behaviour (which we don’t do anyway, you’ll have learnt other methods of avoiding unwanted behaviour during the puppy course).

Dogs are really good at understanding consequences, so by rewarding their ‘good’ behaviour they’ll quickly learn what you’re asking of them.

When you boss asks you to do something, do you do it because you respect him?  Possibly, but more likely you’ll do it because you’re being paid!  The harder the job, the more you’d expect to be paid, or conversely, the more you enjoy your job, the more willing you’d be to do it for free.

We don’t always want the dog to be reliant on us having food in order to do what we ask, so we need to phase out the continuous  rewards and make them more random.  We do this using ‘reinforcement schedules’, also known as ‘reinforcement patterns’.  There are 3 types of schedules.

Continuous Reinforcement Schedule

A continuous reinforcement pattern reinforces/rewards the dog every time he does what you wanted.

In human terms, this is like you putting money into a vending machine and getting a bar of chocolate out; your reward for putting money into the vending machine is to get the bar of chocolate that you wanted.

This is a good reinforcement pattern to start off with.

Fixed Reinforcement Schedule

A fixed reinforcement pattern reinforces/rewards the dog after he does what you wanted a set number of times, for example the dog gets a treat after every third sit.

In human terms, this is like a quarterly bonus;  you get a £500 reward every 3 months, providing you hit your target.

This is not a particularly effective reinforcement schedule, as what tends to happen is that productivity increases just before the bonus is due, then lapses until just before the next bonus is due.

Variable Reinforcement Schedule

A variable reinforcement pattern reinforces/rewards the dog after he does what you wanted, but randomly, so the dog doesn’t know whether he’s going to get the treat or not.

In human terms, this is like putting money into a slot machine; you put your money in, even though you don’t particularly expect to get anything out, because when you do win, it’s really exciting!

This is the best reinforcement schedule as it’s addictive and hard to break!

Where to start?

If the best reinforcement pattern is the variable reinforcement schedule, why not just start with that?

It’s easiest to start with the continuous reinforcement pattern; it’s consistent and easier for the dog to learn that ‘good’ behaviour has ‘good’ consequences.

In human terms, if you use a vending machine in work every day, and every day you get your bar of chocolate, there’s an element of trust that by putting your money in the machine, you’ll get your bar of chocolate out;  you have a good ‘reinforcement history’ with the vending machine at work.

How to change?

Let’s take ‘sit’ as an example.  Firstly you’ll need a really good ‘reinforcement history’ with sit, meaning that you’ve consistently rewarded the sit for a good length of time.   The longer the reinforcement history, the less likely your dog is to think that you’re ‘broken’ if your dog doesn’t get a reward.

In human terms, if the vending machine at work, which you’ve been using for the past year, doesn’t dispense your bar of chocolate  when you put your money in, you’re much less likely to consider the machine to be broken than if you put your money into a vending machine at the train station, one that you’ve used only a couple of times in the past year.  You’ve got a good ‘reinforcement history’ with the machine at work, it’s never failed you before, so you’re more likely to put more money in, with you assuming at first that maybe the coin you put in was faulty.  You have far less history with the vending machine in the train station, so you’re going to consider that machine to be broken much quicker than you’d consider the vending machine at work to be broken and stop putting money in it.

So, the shorter the ‘reinforcement history’, the easier and quicker it is to ‘break’ the dog’s behaviour, the better the ‘reinforcement history’, the more likely your dog is to try again, even when not rewarded.

However!  Let’s consider the vending machine at work:

You put your money in, you get nothing out, this is unusual it must be a faulty coin.  You put another coin in and get nothing out.  Although you have trusted this machine in the past, you’re going to quite quickly realise that this machine no longer gives you chocolate, so not only do you stop putting money in and lose faith in the machine, but you’re probably going to get pretty frustrated at having put so much money and effort in, with nothing in return.

It’s important therefore to change from a continuous reinforcement pattern to a variable reinforcement pattern quite gradually, to avoid frustration,  and to make sure that the random reward is something that the dog really values, to get him addicted to sitting when asked.

Don’t forget that praise is free, and should be used as on a continual reinforcement basis.