DopaMinecraft - how we learn from our errors to get what we want

About once a month I find myself down the Youtube rabbit hole of ‘motivation’ videos. They are normally long pieces stitched together with inspiring music and with audio excerpts from Rocky and sports films from the last 40 years. And as I walk around my room in my pants brushing my teeth bleary eyed at 2am from staying up watching Rick and Morty repeats I think ‘Yeah, I can do this thing’. Of course it doesn’t matter that I don't know what ‘this’ is or whether I want it or not, but I feel that energy coursing through my body. Motivation. Yes dear reader, today we will be talking about motivation, how Reward Prediction Error - one of the pathways for motivation and learning  - works, and how we can start to think about applying this to our agile teams.

To start with, what does it feel like to be you right now? What sensations are you experiencing in your body right now? I'd encourage you to close your eyes and spend a couple of breaths on that question. 

Now look up and look around the room for something you can see that you want. How does it feel to look up and go and get something? How does it feel different from sitting still with the here and now? 

The reason I'm indulging in this hippy shit is because it illustrates the differences between what scientists Daniel Lieberman and Michael Long refer to as the Here and Now system, and the More system. This broadly maps to the different physical structures in the brain dealing with things within reach - the 'peripersonal space', and those outside of your immediate reach - the 'extrapersonal space'. 

The Here and Now systems involve the molecules that make us feel things in the moment. Seratonin, oxytocin, endorphins. In the paraphrased words of Andrew Huberman, when you are stroking a dog, you don't want to stroke a thousand dogs (citation needed - lots of my friends have argued with me on this point). You want to stroke this dog because it feels good and you enjoy that moment. 

On the other hand, we have dopamine, which Lieberman and Long's refer to as the Molecule of More. It's the molecule that nature evolved to make it feel good to do things that look like they will further our survival. Dopamine is the driver and the medium by which our body says ‘I need something that I don’t already have’. 

A good illustration of this is with rats. Scientists who enjoy doing elaborate things to rats and mice have shown that if you either block dopamine in a rat or run tests on mice that naturally don't produce much dopamine that they lose their get up and go to search for food, even to the point where they die of malnutrition. Dopamine is the driver to get us out of our peripersonal space and out into the big bad world of extrapersonal space where our survival lies. 

Dopamine feels good, but it feels the kind of insatiable good that is different from the Here and Now molecules. If Here and Nows are generated by stroking a dog, Dopamine is eating just one pretzel, or just one square of chocolate - things that feel impossible. In these cases a large part of the enjoyment comes from the salty/sweet/chocolatey reward creating a dopamine spike of anticipation and reward. Interestingly the Dopamine level falls back down below baseline, and either it takes more chocolate for the second serving to feel as good as the first serving. 

It's potentially dangerous stuff; addiction is the hijacking of these dopamine pathways where people will be motivated to go to extraordinary lengths - and at extraordinary costs - to meet their dopamine needs. 

 Dopamine gets a bad rap nowadays with constant access to dopaminergic phones, games, and foods generating some unhealthy behaviours. However it’s also the molecule that gets us up to the fridge so we don't starve to death, it gets us to the gym, to the first date (and hopefully the second). It gets us out to vote for social change and home to bed before a big exam. The question for us is how we can harness the properties and the patterns of dopamine for our benefit. 

One pattern I want to talk about in this post is Reward Prediction Error. What is it that keeps us checking our phones when we know we don't have a notification? Why does the gambler keep playing at the slot machine even though they know that most people lose money and have lost much of theirs? The answer lies partly in Reward Prediction Error. I'm going to call it RPE for convenience. 

There's a curious phenomenon where certain triggers only cause a feel-good dopamine spike when we get an unexpectedly good or bad outcome. The good or bad outcome is what scientists call a reward - where reward can mean anything that would make you want to do a behaviour again. In our day to day lives, we make predictions about what will produce rewards. When I go to the fridge, I expect to be rewarded with the amount of food that was in the fridge 5 minutes ago when I last checked. That’s my prediction. If I discover an unexpected tupperware with chicken soup and a sticker that says ‘Daniel please eat this’ I’ll have made a Reward Prediction Error, and in likelihood it will feel pretty great. Dopamine will spike, and I’m more likely to keep checking the fridge. If I went to the fridge and found that all the food had been cleared out with a note saying ‘soz, was hungry’ I’d be less likely to check as frequently. 

What happens to dopamine when our predictions are either right or wrong? 

I could explain it myself but I found a good summary in a review about Neural Circuitry of RPE so I shall borrow it 

When monkeys receive unexpected reward, dopamine neurons fire a burst of action potentials. If the monkeys learn to expect reward, that same reward no longer triggers a dopamine response. Finally, if an expected reward is omitted, dopamine neurons pause their firing at the exact moment reward is expected (Hollerman et al., 1998). Together, these results suggest that dopamine neurons signal the difference between the reward an animal expects to receive and the reward it actually receives. When reward is greater than expected, dopamine neurons fire; when reward is the same as expected, there is no response; when reward is less than expected, activity is suppressed. [bold my own]

This is the essence of RPE. When we make errors in our predictions about whether a good or bad thing will happen, dopamine kicks in to make us feel good or bad. This helps us learn to do more of the unexpectedly good thing and less of the unexpectedly bad thing. Hopefully this intuitively makes sense - these systems evolved to help us discover more about the world, and survive better. If doing a thing produces something unexpectedly good, nature has helped it feel good so we do more of it (or vice versa). If walking on this unfamiliar forest path leads you unexpectedly to a bush full of berries on the savannah, that feels good and that will make you want to go there again berry gathering.  

So this goes some way to explaining the compulsive checking of the phone when you’re expecting news; maybe there will be a message! And the news will be good! You are less likely to get the same feeling or compulsion when you know that you have a WhatsApp notification or the email about the new job because there’s not prediction to be wrong about. It doesn't feel good in the same way as when you get one unexpectedly. When you pull the slot machine handle, it wouldn't be as motivating if you knew for a fact that every tenth pull will lead to a payout. The uncertainty makes for the excitement and therefore the motivation to perform a certain behaviour. 

To be clear this isn't necessarily about addiction - addiction is one type of dopamine pattern, and whereas all addiction probably involves dopamine in one form or another, RPE is one dopaminergic pathway - there are others.

So what's the agile angle on this? 

RPE is a mechanism for steering us towards favourable outcomes. If the uncertain thing is good, it feels good and we do more of it. If it is worse than expected it feels bad and we avoid it. Sounds a lot like feedback loops to me. 

So one way we could leverage our knowledge or RPE in agile teams is focussing on increasing the salience of the reward (or lack of reward) in different agile team settings. 

Here's a couple of ways I can imagine this playing out and what it might mean for an agilist trying to bring these ideas to the team. Some of these practices are basics, but hopefully the RPE angle reinforces them. I’m keen to hear what else you think I could include! 

Stakeholder feedback in the sprint review

We can find ways to visualise and concretely emphasise stakeholders' response to the finished increment. If the increment is good and the team is heading in the right direction, maybe a confidence vote or a thumbs up/down could help the team feel the dopamine response of being praised

Cues for continuous integration 

The modern devops practices of Continuous Integration and Delivery involve encouraging developers to commit small pieces of code frequently. A suite of automated tests runs with each commit and if the code passes the quality checks the developer gets feedback that their code works. This leads to simpler code which can be released and fixed with more confidence. 

In this case the notification from the automated years or the little green tick is the reward, and there is uncertainty about whether it will come or not. Perhaps we should be conscientious and making little pleasing ping noises or an automated delivery of a boiled sweet every time the code is successfully committed to encourage those small simple iterative builds. 

OKRs with lagging indicators  

OKRs (Objectives and Key Results) are a practice in which teams set business objectives which reflect their strategy, and then the team agrees on the Key Results that will measure progress against that objective. The idea is that each team member has good visibility about how their more local objectives feed into wider strategy and freedom to do whatever they need to do to reach those objectives. 

Key results can be leading indicators (where you measure your input or your efforts) or lagging indicators (where you measure an outcome).

For example:

Objective: Host a fun party for your friends

Leading Indicator Key Result: send out 100 invitations

Lagging indicator Key Result: number of people that message me afterwards to tell me it was a fun party

A lagging indicator is often the hard one to influence. It's more uncertain, and more indicative as to whether you're making the progress against a certain objective. Looking at lagging indicators through an RPE lense we could say that the we make a prediction about whether the number is going in the right direction, and get a dopamine boost when it does or a dopamine supression when the number goes the wrong way.

The question for us as agilists/professionals then is how to help the teams take that RPE effect upon viewing their favourable or unfavourable Key Results and turn that into learnings. Maybe a serving suggestion could be to look at the Key Results measurements at every Sprint Planning and Sprint Review. That way the team can get the feedback on what their work is achieving, in a timely way for changing the plan for the next sprint.

(if you want more of this kind of thing, sign up to my Substack where I'll be publishing in the future)

That's all for now! I'm curious which other examples come up for you reading this, comment on this post here or LinkedIn with what you think of in your agile world. I'd love these ideas to be steps in a wider conversation and I definitely don't have all the answers. Until next time, stay dopaminergic friends. 



Comments

Popular posts from this blog

Send in the Clowns

Only Fools Rush In

A Little Bit of Knowledge...