Yale Psychiatry Grand Rounds: "Multiple Steps to the Precipice: Risk Aversion and Worry in Sequential Decision-Making"

January 19, 2024

"Multiple Steps to the Precipice: Risk Aversion and Worry in Sequential Decision-Making"

Peter Dayan, FRS, Director, Max Planck Institute for Biological Cybernetics

Information

ID: 11196
To Cite: DCA Citation Guide

Download Transcript

00:14And just swap your screen and
00:16then we'll be done. Exactly.
00:18We have this all nicely prepared,
00:19of course. That's OK. Perfect. Super.
00:25OK, Well, thank you very much indeed.
00:27Sorry about that. That that hiccup.
00:28No, nothing is quite as smooth as you hope.
00:30Thanks so much for that
00:31really generous introduction.
00:32You know, it's a really great
00:33pleasure and honour to be here.
00:34I really followed Phil's work
00:35over many years as well,
00:36really learned an awful lot from it.
00:37So. So it's really great to be
00:39here and thanks for the thanks.
00:41That's it.
00:41So the work I'm going to talk about
00:43is joint with a number of people.
00:45So Chris Gagney,
00:46who was a post doc in tubing
00:47and is now a now works for a
00:49company called Hume in New York,
00:51two research assistants in in tubing
00:53and Kevin Shen and Yannick Striker.
00:55And then I might also talk
00:56about some work with two of my
00:58other colleagues in Tubing and
01:00Kevin Lloyd and Shin Sui.
01:03So to introduce this,
01:05imagine the following game.
01:06You're controlling this rather crude
01:09refrigerator like a robot here,
01:12and your job is to get to
01:15this treasure chest here.
01:16And there's a word for getting to
01:18the treasure chest worth worth
01:20five points to our subjects.
01:22There's a cost for falling into
01:24these these things which Chris
01:26loves to call these lava pits.
01:27There's this,
01:29this this is the Iceland version of
01:31this with the with the the volcanoes you
01:34have when you try to move north-south,
01:35east and West,
01:36there are some blockages
01:38shown by these brick walls.
01:39And there's also an error chance
01:41of an error of a of a of an
01:43eighth when you try to move.
01:44So if you try to go north,
01:46there's an eighth chance you'll
01:47move in one of the other directions
01:48instead and then we have a discount
01:50factor to try and encourage you
01:51to get to the goal quickly.
01:52So the question then we pose our
01:54subjects is which route would
01:56you take given this?
01:57So there's a three obvious routes.
01:59I think there's this route that
02:00goes down here through all the
02:02lava hits to get to the reward,
02:03the most direct route.
02:04There's a route which goes as sort
02:06of the intermediate route which goes
02:08around here and then goes close
02:10to this lava but not not the the
02:12main bulk of lava to get to here like this.
02:14And then there's this long route
02:16that goes around here all the way
02:17and then gets to the novel pit that
02:19gets to the to the goal in that way.
02:21So we administered this to
02:23to our subjects in the lab.
02:24I promised I wouldn't tell tell you
02:26who they are because he's kind of
02:28revealing about about your colleagues
02:30when you do this and you can see
02:31that there are subjects divided about 1/3,
02:33a third,
02:34a third maybe a few fewer.
02:36So some people took this very
02:37direct route to get to the goal.
02:39Another group took this intermediate
02:41one and you can see here the where
02:43they're being deviated off this route
02:45by these by these random spots.
02:48And then some other subjects
02:49took all the way around.
02:51And so the question for this talk is,
02:52what is it that goes on in terms of
02:55evaluating the risk associated with these,
02:57with these parts?
02:58And how do you make these?
02:59How do you make these choices?
03:01In this instance,
03:01we're very interested in the
03:03case that you're making choices,
03:04not just a single choice,
03:06but by committing to this path here,
03:08you're successively adjusted.
03:11You have to adjust yourself so these
03:13many steps of risk that you get.
03:15And I think that in,
03:15you know a lot of the work that
03:17that that we and other people have
03:18done in reinforcement learning is
03:20thinking about sequential decision
03:21problems where you don't only make
03:23one choice, you make many choices.
03:25And when those choices are
03:26are infected by risk,
03:27risk can accumulate on paths
03:29in rather interesting ways.
03:31And that really is the context of my
03:32talk of my talk to think about what
03:34the consequences are of that and how we
03:36should think about that as the whole.
03:38So the original,
03:39some of the original thinking about
03:41risk was actually came from the
03:43Bernoulli's thinking about what's
03:44what then became known as or what is
03:47known as the Saint Petersburg problem.
03:48The way that you pose this is you're
03:50tossing a fair coin and then you
03:52look at the number of heads that
03:54you get before you get a tail.
03:56So if you get one head before a tail,
03:57you get to €2 or two monetary units.
04:00If you get 2 heads, you get 4,
04:02three heads, 8 and so forth.
04:03And the question is how much would
04:05you be willing to pay me to give
04:07you an instance of this game.
04:08And the the reason why it's a problem
04:11or a paradox is that the expected value,
04:14so the mean value of these of
04:16this sequence of of outcomes,
04:18this mean value of of of being
04:20playing this game like this is
04:22actually infinite because with a
04:23probably over half you get €2.00
04:25the probably of 1/4 you get €4.00
04:28probably an 8 you get €8 and so forth.
04:30And so the sum value each of these,
04:32each of these possibilities is
04:34worth €1.00 and that would then
04:35just go off to the off to Infinity.
04:38And so the expected value is about
04:40is Infinity,
04:41but the amount that most people think
04:42how much you'd be willing to pay most
04:44people will pay you know somewhere
04:45between 4:00 and 8:00 EUR or four
04:47and $8 to play a game like this.
04:49And so that's the paradox,
04:49is to try and understand why.
04:52But I think the paradox becomes
04:53sharper or at least the task becomes
04:55sharper when you think of it in the
04:57sequential manner that it really
04:58is originally could also be posed.
05:00So here you're tossing the first
05:02coin and at stake is €2.00.
05:04If you get a, if you get a a tail,
05:07that's what you're going to walk
05:08away with is just two EUR.
05:09On the other hand, if we're lucky,
05:11we get a head.
05:12This is the world's smallest gold coin,
05:13which is that Einstein, It's a Swiss coin.
05:16Then you get a head.
05:18That means that now you get stake is €4.00.
05:21And again you're tossing this
05:22coin and you're thinking,
05:23you know what's going to happen.
05:24I get a head or a tail.
05:26I'm lucky.
05:26I'll get a head and then now the
05:28stake becomes €8 and so forth and then
05:31you get a tail and then and then in
05:33this instance you'd walk away with the €8.
05:35And so you can imagine that as you're
05:38getting you know essentially more and
05:39more money is at stake as you do this.
05:42I'm sure many of you are familiar
05:44with the balloon adaptive risk task,
05:46the balloon adaptive risk,
05:47the bot task, which has something
05:49very similar where you're pumping up
05:50a balloon and you know at some point,
05:52you know one pump is going to make
05:53it burst and you lose everything.
05:54And the question is when do you,
05:55when do you quit?
05:56And the Saint Petersburg problem,
05:57it's you have to pay before you ever start.
06:01OK. So the plan for the talk is talk
06:03a bit about risk aversion in general,
06:05how it comes up,
06:06talk about the measure of risk,
06:08which I think is a particularly useful
06:10measure for the sort of work that that we do.
06:13And I think also that it applies
06:14also in animal cases too.
06:15And I'll give you a little example
06:17of that at the end of my talk,
06:18I hope if I have time,
06:20so talk about tail risk in
06:22sequential problems,
06:23then talk about risk of
06:25those online behaviour.
06:26So thinking about our subjects
06:27making their choices in the in
06:30that little maze that you know
06:31with the with the robot and the
06:33and the lava pits and so forth,
06:35say a word about risk,
06:36averse offline planning.
06:37So the idea is if you're in an
06:40environment in which risk is,
06:42which is replete with risk,
06:43then maybe there are things that you
06:45can do ahead of time to try and mitigate it.
06:47Maybe that's going to change the
06:48way you go about thinking about
06:50the about the aspects of the world,
06:52doing some offline planning to
06:54prepare yourself correctly and
06:55then think about what that looks
06:57like in the context of of of risk,
07:00risk diversion and risk sensitivity.
07:02And then also as I say if I have
07:03a chance I'll talk a word,
07:04say a word about a some modelling
07:07we've done of a some lovely data on
07:10how mice do apparently risk sensitive
07:13exploration with some data from
07:16whatabi Yoshida Mitsuko's work in in Harvard.
07:20OK,
07:21so decision making and risk.
07:22So as you all know,
07:24risk is a very critical aspect of decision
07:26making and it comes up anytime that
07:29we have uncertain or probabilistic outcomes.
07:32So here you know you're here in
07:33Saint Petersburg,
07:34we're spinning a coin in other contexts,
07:36we have other sorts of ways
07:37of generating these,
07:38these these probabilities.
07:41Obviously whole industries
07:42have been designed around it.
07:44So things like insurance markets.
07:45So this is the famous,
07:47this is Lloyds of London,
07:48a little picture of Lloyds of London.
07:50And I think that it's likely
07:52plays a very crucial role in
07:53many aspects of psychopathology.
07:54And this is a study that has
07:56been done by very many groups,
07:57including obviously working
07:59in in in in Yale too.
08:01So things like anxiety and mania
08:04are obviously issues about what
08:06might happen could could be there
08:07in OCD you'd see that as well
08:09something again something that
08:10Phil has actually worked on too.
08:12And you also you have this notion of
08:14these sort of ruminative what ifs.
08:16So in the context of the complex world
08:18that we occupy there are many ways in
08:21which we can be many risks that can
08:23with very low probability events there
08:26will cast swerves on the ice in a in
08:28a in Tubian this morning very icy.
08:31So you can imagine when you're you know
08:32walking on the pavement there is a
08:34chance that something nasty can happen.
08:35If you pay a lot of attention to these
08:37very low probability probability outcomes,
08:39then then of course that's going to be
08:42problematical for your expectations
08:44about what might about what might happen.
08:46And when you do that,
08:47when you know you commit to a long
08:49series of choices, then as I as I said,
08:52you have to worry about how risk
08:54accumulates along these along these paths.
08:56So it's been beautifully studied
08:58using single shot gambling paradigms.
09:00So here's a classic example where
09:02you have a choice of either a Shaw
09:04$5 or a 5050 chance of $10 or a 5050
09:07chance of $16.00.
09:08I'm sorry in this case
09:10here so many paradigms.
09:12Obviously Canavan diversity done a
09:13lot of work on that in in in Yale.
09:14IFAT has done a lot of beautiful
09:17work along these lines too.
09:19But what we want to look at is
09:21the sequential problems and not
09:22only not only single shot games.
09:24And so we'll see how that comes out.
09:27So in order to make progress,
09:29we have to define what sort of what
09:31measure of risk we're going to use.
09:33So there are a number of measures that
09:36have been studied in the literature.
09:37So prospect theory, for instance,
09:38very famously gives us a ways of thinking
09:41about how to combine your utilities and
09:43probabilities and these risk cases.
09:46But there's also a lot of work
09:48from the insurance industry,
09:49which of course has been,
09:50you know,
09:50which was worried about many
09:52aspects of risk for a long time
09:54and in a very quantitative way.
09:55And one of the and they've
09:57sort of come up with ideas,
09:59or the mathematical aspect of that has come
10:01up with ideas about how to systematize risk.
10:04And one of the systematic ways that they
10:07think about is to think about tail events.
10:10So here we think of the distribution
10:12of possible returns as just
10:13some sort of histogram.
10:14And then we the the risks
10:17that we worry about,
10:18the risks we care about are risks which
10:20are found typically in the lower tail.
10:22They're the nastiest things that can happen.
10:23So for instance,
10:24many of you will know that you could
10:27think about there these Markovits
10:28utilities where you add to the
10:30mean some fraction of the variance,
10:32but the variance of the distribution
10:34includes not only the lower tail but
10:36also the upper tail that thinks about
10:37the whole structure of the distribution.
10:39Whereas the things that we
10:40worry about are the tail risks.
10:42They're the nastiest things
10:42that could possibly happen.
10:43So things like and that's naturally medicine,
10:46finance, engineering and maybe also
10:48things like predation in animals too.
10:51So how does that work?
10:52So just illustrate this
10:53with our very simple case,
10:55the Saint Petersburg problem.
10:57So yeah, So what I'm now doing is
10:59showing you all the outcomes and their
11:01weighted by the and their probabilities.
11:03So this is 5050 for two EUR up to,
11:05you know, gets vanishingly small with this,
11:07this average value outcome
11:09being worth Infinity.
11:10And if you think about the tail,
11:12what we might do is to say
11:13let's choose in this instance,
11:14let's say the lower 7/8 of the distribution.
11:17So that's just these three dark blue bars.
11:20And that cuts off the upper
11:221/8 of this distribution,
11:24which is all the other,
11:24the much nicer outcomes you could
11:27possibly have and and this and then
11:31this the the value of the outcome
11:33at the which is which is defined
11:36by this by this lower 7/8 tail.
11:38That's a quantile.
11:39That's just a 7/8 quantile
11:40of this distribution.
11:42That's a risk measure itself
11:44called the Value at Risk or VAR,
11:47shown here.
11:47It turns out that the value at
11:49risk doesn't satisfy some of these
11:51nice qualities that we expect that
11:54the from the insurance industry
11:56nicely worked out by Artzner,
11:57Rockefeller and EUR 7 many others as well.
12:00But a measure which also thinks
12:02about the lower tail and does
12:05satisfy these axioms is called
12:07the conditional Value at Risk,
12:08which is simply the average
12:10value in that lower tail.
12:12So the idea is you say I'm
12:14worried about the tail,
12:15we have an alpha value saying
12:16which tail am I worried about?
12:17The 7/8 tail.
12:18If it's the if it's the 100% tail,
12:20the one tail,
12:21it's just the whole distribution.
12:23Here it's the seven eighths tail.
12:24I've cut off all the really
12:25nice outcomes and I'm left only
12:27with the nastiest outcomes.
12:28And as that gets more extreme,
12:30I think about more and more or
12:31less and less of the distribution,
12:33just more and more of the nastiest
12:34things that can happen are going to be
12:36the things that I imagine happening.
12:37And that then defines the
12:39average value in those,
12:40defines this conditional value at
12:42risk or this C bar value itself.
12:44So how does that look?
12:47As we reduce alpha so alpha equals one,
12:49we have the whole distribution.
12:50That's Infinity.
12:51If alpha is 15 sixteenths,
12:53we just get these four bars,
12:557/8 the three bars,
12:563/4 these two bars, and alpha is 1/2.
12:59We just have this one bar left
13:01and so as alpha gets smaller we're
13:03getting more and more risk averse.
13:05We're thinking about this lower
13:07tail of the outcomes that we could,
13:08that we could possibly have.
13:10So formally you can write that down as
13:13being the expected value in this lower tails.
13:16That's.
13:16Then you could just write down these,
13:18these,
13:18this expected value underneath this
13:20quantile of the of the distribution.
13:22But there's another way
13:23of thinking about this,
13:24exactly the same calculation,
13:26almost like a dual view,
13:27which also relates to the way
13:29that prospect theory controls
13:31or thinks about probabilities,
13:32which is to have a what they call
13:35a probability distortion function.
13:37So here I've also now written down explicitly
13:40these probabilities of these outcomes,
13:42so half, 1/4 and so forth.
13:44And what you do with probably
13:46distortion is to say I'm allowed
13:49to multiply the values or change
13:51the values of the nastier outcomes.
13:54I boost those probabilities and I
13:57suppress the higher probabilities,
13:59and the idea inside this conditional
14:03value at risk is that there's a
14:05maximum value of possible distortion.
14:08So if my alpha value is 7/8,
14:10which means I'm interested in this
14:11bottom 7/8 of the distribution,
14:13it means I'm allowed to multiply
14:16all my nastiest probabilities
14:17by 8 / 7 by 1 over alpha.
14:20And then I just keep on doing
14:21that until I run out of Rd.,
14:22until I run out of probability mass
14:24because in the end it still has
14:26to be a probability distribution.
14:28So in this instance,
14:29I multiply all these outcomes by a
14:31weighting factor which is 8 sevenths here
14:33until I then run out of run out of road.
14:36And so then then that just leaves the
14:38only these three bars as being something
14:40which is contributing to my to my values.
14:42And you can see that that's an exactly
14:44equivalent to the three bars that we
14:46have here in terms of the value at risk.
14:48So these are equivalent ways of thinking
14:51about, about thinking about this,
14:53about the effect of of these tales.
14:56And they're both very, I think,
14:58very useful constructs to think
15:00about the about these, these,
15:03these these nasty possible outcomes.
15:06OK, so just to summarise on,
15:07on Sevar,
15:08it's what's called a coherent risk measure.
15:11And that's these axioms I was
15:12referring to that that we want from
15:14insurance which have to do with
15:15things like you want the risk to
15:17decrease if we diversify your assets,
15:19something that's what the value
15:21at risk does not have.
15:23It emphasises the lower tail.
15:25So we're always interested in the
15:26nasty things that can happen.
15:28If alpha's one,
15:29it's the regular mean.
15:30We just think about the overall mean of
15:32the distribution that was the Infinity.
15:33Here, as alpha tends to zero,
15:35we only care about the worst possible case,
15:38which is the the minimum that can happen.
15:41And we have this nice equivalence
15:43to these distorted these probability
15:45distortion measures in which
15:47we favour that outcomes.
15:49OK,
15:49so that's when we can see the whole
15:51distribution in front of us like
15:52you have in a regular gambling case.
15:54You know if you're just specify
15:56that what happens if we the way
15:58we started thinking about this was
16:00to think about the sequential case
16:02where we spin the coin and then we
16:04either get it either get a head or
16:06tail and then we can spin the coin again.
16:08So how does that work in this in this domain?
16:10And you'll see a sort of surprise comes
16:12up that we then have to cope with.
16:14So here we started off with the first
16:17flip of the coin and so these you know
16:19if we get the tail we get to €2.00,
16:21we get the head, we get a chance
16:23to carry on to know and then we get
16:25to chances to spin the coin again.
16:27So and then if you spin the coin
16:29again you get to know again if
16:31you get a tail you get €4.00.
16:33If you get the head,
16:33you get, excuse me,
16:34the chance to spin the coin again,
16:36You spin the coin again,
16:37you get €8 and then and so forth and
16:40just carries on down and down and down.
16:42So as I mentioned now what we want to
16:45do when we're thinking about the the
16:47risk is we distort our probabilities.
16:49So we start at the beginning.
16:51We say OK well now I said that
16:53if alpha is 7 / 7 / 8,
16:55we get to distort the properties
16:57by 8 by by 8 / 7.
16:58Then we can distort those
17:00properties some maximum value,
17:02which means that we make it
17:03more likely to get the tail and
17:05less likely to get the head.
17:06So we make this bar the the the
17:08left bar slightly higher and
17:09the right bar slightly lower.
17:11That's our distortion.
17:12Our risk sensitivity has said,
17:15OK, we think that even though
17:16it should really be 5050,
17:18the the real answer is 5050.
17:20In our subjective evaluation of this,
17:22we boost the nasty one and and
17:24slightly suppress the the the nice
17:26one and the amount that we suppress
17:28it by then though is is is is also
17:30reflected by the to to make sure
17:32that the property is also up to 1.
17:34So you might think it'd be
17:36very natural thing.
17:37Well,
17:37now we have another choice and
17:38we do the same distortion again,
17:40and then we do the same
17:41distortion again and so forth.
17:43But that does actually
17:46generate a a version of sebar,
17:48but it doesn't generate the
17:50version of sebar that we started
17:51off with thinking about.
17:52So here I say what you want to do is just
17:54look only at the lower possible tail.
17:56You can see that if we just
17:58keep on distorting by the same
17:59fraction every single time,
18:00then we're going to actually get instead of
18:03getting distorting the the tails like this,
18:06we're actually going to get a
18:07contribution from all the possible outcomes.
18:09But now each of the outcomes instead
18:12of instead of being boosted by,
18:14instead of being going down like
18:17one like a half 1/4 and so forth,
18:19it tends to go,
18:20it actually goes down like
18:213737 squared and so forth.
18:22There's a sort of technical reason for that.
18:24You can see that that doesn't
18:26have the property that I talked
18:27about in which we just sort
18:28of slice off this bottom,
18:30this bottom aspect of the distribution.
18:32It is a, it is a risk measure that
18:33we some that we could also use.
18:35And in fact in many cases it's a very,
18:38it's a very severe risk measure.
18:41It's a more severe risk measure.
18:42But the measure we wanted to talk
18:44about instead actually requires us to
18:46do a different sort of calculation,
18:47which I think is really important for
18:50thinking about how risk processing
18:51works in this this sequential way.
18:53So instead what happens is after we've,
18:57after we've boosted the, after we,
18:58we're lucky and we we got ahead.
19:00At this point, if you think about it,
19:02we're trying to accumulate the
19:04amount of luck that we can have
19:06over a whole sequence of choices.
19:07This is the sequential aspect.
19:09And if we start off and we're already lucky,
19:12it means we've already consumed
19:13some of our good luck.
19:14Which means that now we have to be a
19:16little bit more risk averse in the
19:18future in order that the total amount
19:20of luck that we're expecting to get or
19:22that good or bad luck we're expecting
19:24to get is pegged to right at the beginning.
19:27So that means that now
19:29having been this much risk,
19:30having been this lucky in this case,
19:32we got our first tail,
19:34we got Einstein first,
19:35we now have to be a more risk averse.
19:39So alpha started out at 7/8 and now it
19:42turns out that it has to be boosted.
19:44It has to be.
19:45The amount of risk aversion
19:46has to be boosted,
19:47which means that the alpha value
19:49decreases from being 7/8 to being 3/4.
19:52So now when we do our probability distortion,
19:55we're now we distort the we now make
19:58it even more likely now with Four
20:00Thirds more likely rather than rather
20:02than 8 sevenths more likely that we're
20:04going to get the unfortunate outcome,
20:06which is the the the the tail in this case,
20:10and we make it less likely that
20:11we're going to get the head.
20:13And now if we do get the head,
20:14we've been lucky again.
20:16We've consumed even more of our good luck.
20:18And so now the we become even
20:20more risk averse.
20:21The alpha value goes down further to 1/2.
20:25And so now when we do the distortion
20:26it turns out we do maximal distortion.
20:28So now the tail instead of being
20:31probably 5050 in our minds it's gone
20:34up to the probably has gone up to 1.
20:36The probably getting the head,
20:37the sorry the probably getting
20:38the head has gone to zero.
20:40And that is then means that we
20:41therefore can never get the,
20:42we never get any more further down the tree.
20:45And so in order to compute the
20:48Sivar in this way,
20:50when we think about a sequential problem,
20:52we have to keep on revaluing our alphas.
20:55If we're lucky,
20:56it means we become more risk averse,
20:58which means alpha gets lower.
21:00If we're unlucky,
21:00it means in fact we can become
21:02more risk seeking in the future
21:04because we're sort of
21:05trying to peg the total amount of risk
21:07that we suffer along the whole path
21:09along the way towards towards the end.
21:12So there's this notion
21:13here of pre commitment.
21:15When we start the problem we think how
21:17much risk are we willing to endure
21:20or and then as we then are lucky or
21:22unlucky we don't have to adjust the
21:25way that we we endure this the way
21:29that we evaluate future outcomes.
21:32So in pre committed C bar we're
21:34privileging a start saying we're
21:35saying this is where we're defining
21:37risk from because then because
21:39we're then revaluing our alpha,
21:40our risk aversion in order
21:42to peg where we're going.
21:43So you might think of that as
21:44being like a home or a or a nest
21:46for an animal for instance.
21:47And then we have to change alpha
21:49and the way we change it is like a
21:51justified form of the gambler's fallacy.
21:53If you're unlucky,
21:54you've been unlucky for a while,
21:56then you then in some sense
21:57you can be more risk.
21:59You can be more a little
22:00bit more risk seeking,
22:01you mean less risk averse.
22:02If you've been lucky then you're expecting
22:04to be more unlucky in the future,
22:06so therefore your alpha decreases in that
22:08way in order to peg the total amount
22:11of risk you have along a whole path.
22:13Alpha equals zero and one are special,
22:15so alpha equals one is means.
22:18It's just the mean and then
22:19you never revalue that.
22:20You just keep on without
22:22value of alpha equals one,
22:23alpha equals 0 is the minimum and you
22:24stick with that too because you can
22:26never you can never get more risk.
22:27You know you you basically
22:29if you you've run out of Rd.
22:30you're always thinking about the
22:32worst possible outcome that can
22:34ever happen and so you have to
22:35then in order to do this you don't
22:37have to have this either.
22:38So monitor how much luck you've
22:41had along a path or we just think
22:44about changing the value of alpha
22:45as we go along and then we make it
22:47in the way I showed you for Saint
22:49Petersburg problem where we make
22:51alpha where there we made alpha
22:52smaller and smaller because we kept
22:53on being lucky and lucky and lucky.
22:55Every time we got the head until
22:56the end we ran out of road and then
22:58we ran out of the at the after the,
23:00you know, evaluation of this,
23:01we ran out of at the third outcome.
23:05So how does that look in a more
23:07conventional sort of random walk?
23:09So here's a simple random walk where
23:11we have a agent which can go left or right,
23:15or try to stay where it is.
23:16There are two rewards,
23:17one on the right hand side,
23:19a small reward worth +11,
23:21on the left hand side worth +2.
23:24And then here's one of Chris's Lava pits,
23:25which is,
23:26which is threatening.
23:28And you have again a small
23:30probability of an error
23:31in the choices. So here if you
23:33have completely uniform choice,
23:35you go left, right or try to
23:36stay where you are equally often.
23:38Then if this is our start state,
23:40this is the distribution of outcomes
23:42you would actually get with some
23:43with a discount factor of .9.
23:45So then because in the end you
23:46get trapped by the lava pit and
23:47then that's the end of the,
23:49that's the end of the game.
23:50And so here from the stored state,
23:52this is the distribution.
23:53So we're thinking about C bar,
23:54We're obviously thinking about
23:55the tails of this PC bar.
23:57We're thinking about the tails of
24:00this distribution to think about.
24:02So how can we evaluate the
24:03locations in this in this world?
24:05Well, if you have the this uniform
24:08policy and here our alpha value is 1.
24:11So we're just a regular reinforcement
24:13learner thinking about the average
24:14value of each of the states.
24:16So you can see that here I've shown
24:17them in colour from -10 up to plus 10.
24:19So the ones on the right are relatively
24:21good because you have this reward of
24:23one it you tend to a while before
24:24you you end up in the in the lavapia,
24:26which means that that value
24:28is discounted by a lot.
24:30If alpha is 0 you always think the worst
24:32possible thing can happen will happen.
24:35So the way I'm showing you that
24:36is there are these grey arrows
24:38here and so though inside this,
24:40inside these, inside these the choices,
24:43it says how frequently you try to go left,
24:46right or or stay where you are.
24:49The re weighting system says,
24:51well I'm going to think about the outcome,
24:52which is the worst possible outcome
24:54because my alpha is 0 and that puts all
24:56the weight on going left because the
24:58nastiest thing that can happen is going left.
25:00And so here you can see that all
25:02the values are then much much worse,
25:04and indeed you then just go left.
25:05Every time you just end up in the lava pit.
25:08And then in for intermediate values.
25:10You can see intermediate values of alpha,
25:13you can see how states get evaluated.
25:15And again you can see this effect.
25:17When I said that if you are lucky,
25:20that means in this instance that
25:21means you're going white.
25:22Because right states are better,
25:24then you tend to decrease your
25:26value of alpha.
25:26So these these arrows,
25:28these little grey arrows,
25:29outside the choices that you make,
25:31they tend to point downwards.
25:33If you're unlucky,
25:34which in this instance
25:35means going going left,
25:37then you tend to become a bit more,
25:38you become a bit less risk averse,
25:40which means that the arrows
25:42then point upwards.
25:43And so you can see that as we become
25:45more and more risk averse so this
25:46alpha value we have this very nice
25:48way of looking at the the changes of
25:50how states go from being on the right.
25:52For instance go from being good
25:54to being
25:55go to from being good to being bad.
25:58So you don't only have to
26:00think about evaluation here,
26:02you can also optimise your policy
26:04based on the on your risk aversion.
26:06You try to optimise say what's
26:09the policy which maximises my my
26:11this pre committed C var value
26:13with a given value of alpha.
26:16So if your alpha is 1, then then,
26:22then, then the risk averse.
26:24You're not risk averse at all,
26:25you're just thinking about the mean.
26:26We designed it such that the
26:28from the start state here,
26:29if alpha equals one,
26:30the best thing you can do is just
26:32to go left and you can try and stay
26:35at the at the reward is worth 2 and
26:37as long as you can and that's then
26:39a way of maximizing your reward.
26:41If alpha equals zero, you try.
26:44Well, the IT actually doesn't
26:45matter at all what you try to do,
26:47because there's a chance that if you try,
26:49if you try to stay where you are,
26:51you'll know less will go left.
26:52If you think about the worst outcome,
26:54it's always to go left.
26:55And so you can see that the
26:57alpha value equals 0.
26:57Here,
26:58the optimum policy is just the
26:59same as the uniform policy or
27:01any other policy as well.
27:02You'll always go left.
27:03So in fact this is sort of a form
27:05of learned helplessness where
27:07although you really have control
27:08in this world and some control in
27:10this world because you think about
27:12the worst thing that could happen,
27:14you sort of don't trust your own control.
27:17And therefore you think the the worst
27:18thing that could happen will happen.
27:20And thereby therefore it doesn't
27:21matter what you do, you can't.
27:23There's nothing you can do to
27:25mitigate that that chance and
27:26then in the middle so here we had
27:28this the pre commitment remember
27:30is relative to a start state.
27:32So here our start state is this is
27:34this at alpha equals .3 and you
27:36can see again that now we have a
27:39policy where you know in this in
27:41this particular domain the optimal
27:43policy at that start state is to
27:45go right rather than to go left
27:47because of the problems of the risk.
27:48And then as you as then this is
27:51what you you try to do.
27:52And then and then you try to
27:54stay here as long as you can.
27:55And so you can see that,
27:56as you might expect for everywhere
27:59else in the in this random walk,
28:02apart from the value alpha equals zero,
28:04you have a better outcome.
28:07You have all these values.
28:09All the values of the optimum
28:10policy are much better than the
28:12values of the uniform policy here,
28:14except for this long nastiest
28:15possible outcome,
28:16nastiest possible degree of risk
28:17aversion where you're where you
28:19just think whatever terrible
28:20happened will happen no matter what.
28:25I should just say so.
28:26There's also this this NC,
28:28this other mechanism which doesn't
28:30pre commit to a value but instead
28:32just sticks at a particular
28:34value of alpha the whole time.
28:36That's what I showed you in the in
28:38the Saint Petersburg paradox where
28:39you just waited the the the heads and
28:42tails the same way every single time.
28:43So in this domain that actually
28:45turns out to be for alpha equals one,
28:47it's the same as PC bar for alpha which
28:49is just the mean for alpha equals 0.
28:51Again it just focuses on the minimum,
28:53the worst thing that can happen and
28:55so it also looks the same but in
28:58between in for intermediate values.
28:59Then you can see you can see you
29:02can again get evaluations of states.
29:04And in this instance it turns out
29:06that this NC bar mechanism here
29:08is a generally more risk averse,
29:12so the values are worse than
29:14the values for the PC bar.
29:16So that's not true in the Saint
29:18Petersburg paradox because in
29:19that problem the only way you get
29:20to carry on is by being lucky,
29:22whereas in this problem you can be
29:24lucky or unlucky as you as you carry on.
29:26And then in PC bar if you're
29:28unlucky then you become less,
29:30you become less risk averse.
29:32Whereas in the Saint Petersburg
29:34paradox or in the bot task,
29:37every time you continue you must
29:39have been lucky and therefore you
29:41become more risk averse and so
29:43therefore relatively you the there's
29:46a greater degree of risk aversion.
29:47It's Peterborough paradox.
29:48Whereas in these sorts of other problems,
29:50NC bar is is generally more risk averse.
29:53In these sorts of cases you see
29:55that by these values all being more
29:58red than the than the other ones.
30:00So and then you can work out
30:02the optimal policy has the same
30:04similar characteristics.
30:05OK,
30:06so let's come back to our lava pits where
30:09we had these these cases where we had,
30:12excuse me,
30:13where we where we gave
30:14our subjects this chance,
30:16we we showed them this and
30:17asked them how they would move.
30:19And so we designed this domain so
30:20that it would start to distinguish
30:22different values of alpha.
30:23So different values of risk aversion as
30:25a way of interrogating what subjects
30:27would be like in these in these cases.
30:30So it turns out that the this most direct
30:33path is associated with alpha equals one.
30:36So if you are risk neutral then you would
30:39take this what this this rather risky path.
30:42If your value of alpha is about 0.5,
30:45which means you just think about the
30:47bottom 50% of that distribution,
30:48then you take this intermediate path.
30:50You tend to take this intermediate
30:52path like this and then if you're
30:54much more risk averse,
30:55you care about the bottom 15%
30:56of the of the outcomes,
30:58then you take this,
30:59this much more extreme risk aversion here.
31:01And I think it's interesting
31:02as one of these cases
31:04where it's very hard when you see how
31:06somebody in your lab you know performs this.
31:08If you're a sort of 0.4 a person,
31:11it's very hard to imagine somebody
31:12who would be so risk of so risk
31:14seeking as to take the very short one.
31:16Or if you're the person who takes this very,
31:17very long path, you think it's you think
31:19you know how could anybody take these,
31:20these these short paths themselves.
31:22So I think there's some interesting
31:24phenomena that come up with this.
31:26So we administered 30 of these mazes to
31:30mazes like this to a a group of subjects
31:33and we designed them in order to,
31:35you know, in order to look at
31:36things like how consistent was an
31:38individual subject in the way that
31:40they would be risk averse in these,
31:41in these, in these domains.
31:43And we saw a very nice
31:45degree of of of consistency.
31:48So if it's here,
31:49you can see one another of these
31:50mazes where the start stage is here,
31:52the goal is here.
31:53And so again we have a very sort of a
31:56path which is for the people who are
31:59pretty risk neutral would take which
32:01gets close to these two lava pits.
32:03You have this intermediate path
32:04which is longer,
32:05which is why it would be less favoured,
32:07but only goes close to one
32:08of these lava pits.
32:09And then we have an an even
32:11longer path which looks,
32:12which goes all the way around here
32:14to get to the goal which really
32:16avoids these lava pits dramatically.
32:17And so these are three individual
32:20subjects and so these choices
32:22were themselves associated with
32:23three different values of alpha,
32:25point,
32:26you know like point 2.5 and point 2.9 or so.
32:29And then in another maze the
32:30the the behaviour of the same
32:33subject in a different maze.
32:34So here this is a bit like a Cliff.
32:36There's just two other pits here.
32:38The question is how far around
32:40you know around them do you go.
32:41So one option is just to go directly
32:43to the goal from the start say here to
32:45the goal that's most no risk neutral.
32:48Here's one which is a bit
32:49a bit more risk averse.
32:50You can think well how far away from
32:52the the Cliff you would you would
32:54you choose to be there yourself.
32:55And again,
32:56it's very hard if you're a sort
32:57of risk neutral person to think,
32:59well,
32:59how is it crazy to go so far
33:01away from the from the goal.
33:03We took these 30 mazes that we administered.
33:06We looked at the first half and
33:08the second-half inferred the
33:09values of alpha that our subjects
33:11had for those for those mazes by
33:14fitting the choices that they made.
33:16And you can see that we had a reasonable
33:18degree of consistency between the
33:191st 15 mazes and the 2nd 15 mazes.
33:21So this shows the the alpha,
33:23the peak,
33:24the map out of the,
33:25the the the maximum likelihood
33:27alpha value for the first
33:28and second-half of mazes.
33:30So we see that they are reasonably
33:32well pinned and indeed the the means
33:34are fairly similar to and then if we
33:36look at the across all our subjects.
33:39So now this axis shows
33:40you the value of alpha.
33:41This is now the this is the the
33:45posterior value of alpha
33:46across all the toss we have.
33:48So you know, you know hierarchical fit.
33:50And then we just ordered the subjects
33:51by from alpha the people with the
33:53smallest value of alpha to people
33:55with the largest value of alpha.
33:56And you can see that we nicely
33:58cover the range of possible alphas
33:59in this in this domain and some
34:01people we can't infer alpha so well
34:03just from these these these plots.
34:05And so you can see that then we also in
34:07order to fit them, fit their behaviour.
34:09We have a couple of other statistics as well.
34:12We have they have a temperature,
34:13so an inverse temperature,
34:14or temperature, which is how noisy
34:16is their behaviour generally,
34:18and then a lapse rate which says that
34:19sometimes they try to, they know.
34:21We imagine they might try to go north,
34:22but perhaps they just,
34:23you know, by mistake,
34:24go in a different direction too.
34:25So these are very standard things
34:27you'd have in a model of their behaviour.
34:29But the thing we're focusing on
34:31indeed is this risk sensitivity,
34:32which is then just a histogram of the values
34:34that we can infer from there and ourselves.
34:36It's a nicely aligned,
34:38nicely arrayed across the different
34:40possible values of alpha as you can see.
34:44So we then try to interrogate our
34:47mechanism for changing values of alpha.
34:49And here we had what to us was a bit of
34:52a surprise in terms of what happened.
34:54So here what we're looking at is how
34:57did alpha change on if on one trial,
35:00one maze,
35:01you've got a you've got a win or
35:04the OR the OR or you've got a loss.
35:06So mostly So what this shows,
35:08as we said, if we then infer the
35:10value of alpha on one maze,
35:12if you then one on that maze,
35:13what happens to the next value of alpha?
35:15Are you more risk averse or
35:17more risk seeking on that case?
35:19And so from the PC bar
35:20mechanism I talked about,
35:21what we would have expected is if you are
35:24lucky on that case you didn't get the maze,
35:26you'd become more risk averse.
35:27Next what we actually saw was
35:29the opposite interestingly which
35:30is that after a lava pit,
35:32so after you saw a after you got trapped
35:35in one maze then in fact you became a
35:38bit more risk averse in the next maze.
35:40And so we're we're sort of
35:42contemplating why that might be.
35:43We did see A and and and we are also
35:46looking inside the choices you make
35:48inside a single maze because if you
35:51remember we have noisy actions so
35:53sometimes you're lucky or unlucky
35:55inside a single maze and they do see
35:57APC bar like effect which is that if
35:58you've been lucky then in the future
36:00you're more a little bit more risk
36:02averse and if you've been unlucky
36:03you've been a little bit less risk averse.
36:05So there's a conflict between
36:07different time scales of how
36:09of how this is operating.
36:10And that conflict also comes up a little
36:13bit when we look across the the the
36:15first and second-half of these mazes,
36:17the 1st 15 mazes versus the 2nd 15 mazes.
36:20Whereby if you had the more losses,
36:23if you had more losses in the first half,
36:25we can ask are you more risk averse and
36:27more risk seeking in the second-half.
36:29And there's some small evidence that
36:30in on average or a bit more risk
36:32seeking in the second-half and you've
36:34had more losses in the first half.
36:35So that suggests that this phenomenon
36:37which is a trial like a maze to maze
36:40effect may itself not completely generalise
36:42over the whole context of the mazes.
36:44So really some interesting things to
36:47investigate in this in this domain.
36:50OK it's an interim summary.
36:51So what we have,
36:52what I'll try to show you is this
36:54sort of parametric risk avoidant
36:56behaviour which can come from this pre
36:58committed PC bar and pre commitment.
37:00Is that you think,
37:01well how much risk am I willing?
37:03How much you know?
37:04Which part of this distribution
37:05am I willing to think about right
37:07from the beginning.
37:08And that requires you to
37:09have this gambler's fallacy.
37:10So change the value of alpha as you
37:13as as you are unlucky or unlucky.
37:15So obviously the inference is a
37:16little bit more complicated here,
37:18but in fact many ways almost every
37:20way that we have of thinking about
37:22risk in the sequential case is
37:24going to rely on a more complicated
37:26way of doing evaluation.
37:27Because you know for instance if you
37:29have a non linear a utility function,
37:32then if you think about my
37:33total utility on a path,
37:34you're going to have to monitor what
37:36that total utility you know which is
37:37how you which is the non linearity.
37:39Then you're going to have to monitor,
37:40you're going to have to modify your,
37:42you're going to have to monitor the
37:44total utility so that you can then
37:46manipulate it in this non linear way.
37:48You also see in prospect theory
37:50for instance as well,
37:51if we have this nested what
37:53we sometimes call NC bar,
37:55that's the one where we just fix
37:57the value alpha and just apply the
37:58same value as you go down and down,
38:00then in some cases you can
38:02get excessive risk aversion.
38:03So in the random walk that we saw
38:05there and then again we we can
38:07still think about that at different
38:10values of alpha itself.
38:12We think that there's we're now
38:15worrying about indeterminacy
38:17between your prior expectation,
38:19for instance getting caught in
38:20the maze by a lava pit versus
38:22the degree of risk aversion.
38:24And those two work opposite to each other
38:26in terms of the in terms of PC bar.
38:29So you get caught.
38:30That increases your prior to the
38:32possibility of getting caught,
38:33but it also increases the value of alpha,
38:36makes you a little bit less risk averse.
38:38And so those two things are
38:39fighting with each other we think
38:41in the context of these mazes.
38:43And of course it would be interesting
38:44to look at ambiguity as well as risk.
38:45So here all I did talked about is
38:47cases where you know the probabilities
38:49are frankly expressed as subjects
38:50know exactly what the probability
38:52is of getting caught by the the,
38:54the sorry, they know exactly probably
38:55of having a lapse in terms of the
38:58the way that they move in the maze.
39:00They know the values of everything.
39:01We didn't make it ambiguous.
39:03But of course ambiguity as a sort
39:04of 2nd order probability also makes
39:06you gives you an extra aspect of
39:09probability that you don't know.
39:10And so then if you think about the law,
39:13so a tale of those properties you
39:15don't know that's a way of inducing
39:17ambiguity aversion because of the of
39:19the extra uncertainty that you have,
39:20the 2nd order uncertainty you
39:22have in those cases too.
39:24From a psychiatric point of view,
39:27you what you can see is a sort of an aspect
39:29of sort of pathological avoidance right here.
39:32The way you're evaluating what
39:33could be a relatively benign world
39:35is you're thinking about all the
39:36nasty things that can happen.
39:38That's what that's what what what is
39:40becomes really critically important.
39:41And then if you're living in
39:43a stochastic environment,
39:44which of course we we all do,
39:46then if you're really extremely risk averse,
39:48so alpha is really near to zero,
39:50then that's a route to indifference
39:52or helplessness.
39:53Because it doesn't matter what you try to do,
39:55you're always worried about the
39:56nastiest thing that can happen.
39:58So that makes life super complicated.
40:02OK, so that's online behaviour.
40:04So, so here we think about planning.
40:06We won't imagine what are our subjects
40:08doing as they're thinking about how to move
40:10in that maze with the with the choices.
40:12So there we can do what as Phil
40:14mentioned at the beginning as sort
40:16of forms of something a bit like say
40:18model based reinforcement learning
40:19where we have a model of the world
40:21and we're planning in that model.
40:22We're thinking about the risk that
40:24accumulates along these paths and
40:26changing these values of alpha as we go.
40:28But there's a lot of interest at the
40:30moment in also thinking about offline
40:32processing that can happen during periods of,
40:34for instance,
40:35quiet wakefulness or sleep in animals.
40:37Also into in into trial intervals in in
40:40humans that we've been looking at too.
40:42And so the idea has been that
40:45there's a coordinate,
40:46that there's hippocampal and cortical
40:48replay which themselves are coordinated,
40:50which can be used to do
40:52aspects of offline planning.
40:54Which is to say that we normally
40:56think about a model of the world
40:58that's like a generative model
41:00of the of the environment.
41:01The inverse of that model is a policy.
41:04It's like what should I do in the
41:06environment in order to optimise
41:08my my return or optimise my C
41:10bar return? And so in that case,
41:11the inverse of the model is something
41:13you can calculate offline when you're
41:15not having to use the model to make your
41:17choices as it as it as it as it goes.
41:19And there's evidence in both rodents
41:21and also in humans in the last few
41:25years using typically using Meg that
41:28subjects are actually engaging in
41:30offline processing which actually
41:31has an impact on their behaviour
41:33when it happens in the future.
41:35So in the reinforcement learning world,
41:37this has been closely associated with
41:39an idea from Rich Sutton in the 90s
41:41called Dyna where he thought about
41:43offline processing this replay like
41:45processing to enable exploration and
41:47then got embedded in in in the sort of
41:50forms advanced forms of reinforcement
41:51learning for for in AI in replay
41:54buffers for things like the DQN.
41:56So deep Q learning the networks that for
42:00instance DeepMind used very successfully
42:02for things like Alphago to win it go.
42:05And then slightly more recently,
42:06there's a lovely paper from Marcelo
42:09Mata and Nathaniel Door which was
42:12was speculating that that the replay
42:14that we see in rodents might be
42:16optimised to improve the the way
42:18that that these rodents are planning
42:20in the in the environment.
42:22So given that they discover something
42:23about the world they discover like
42:25a reward they didn't know about or
42:27maybe they've forgotten then then
42:28they have to do some relearning.
42:30Then what Matter and Dole suggested is
42:33that the sequence of which the animal
42:36engages in replay well is informative,
42:38is chosen in order to optimize the way
42:40that the animals will then subsequently
42:42move through the world using a
42:44simpler way of making doing planning.
42:46And they pointed out that that you should
42:48choose to make updates to your model
42:50based on the product of 2 quantities,
42:52gain and need.
42:53So gain is if you were to do a replay
42:56at a particular location in the main,
42:58maybe somewhere where you're not
43:00you have this motion of distal
43:01replay near the Campbell world.
43:03Then the game is how much you would
43:06change your policy if you made an update.
43:08So there's no point in making an update.
43:10It is not going to change your
43:12actions because it will have no
43:14impact on your final return and
43:15the need is how frequently you're
43:17going to visit that state in the
43:19future given your current policy.
43:21So it turns out the product of those two
43:23governs the sequencing that you should
43:25apply to looking at states in the world.
43:27And so if you think about,
43:28you know you discover something,
43:29how should you go about planning
43:31using during this offline,
43:32during these offline cases.
43:34So we thought about, well,
43:37what does optimal planning
43:38look like for Seva?
43:40You have if you're risk risk averse.
43:42So here,
43:44excuse me,
43:44we're showing again another simple
43:46domain where you have a start state.
43:47There's just a single word
43:49at this location here and
43:50there's one of these
43:51lava pits at the at here.
43:53But what these numbers show is if all
43:55you know about is where you start,
43:57you have a model of the world,
43:59but you don't and you know about the
44:00law of a pit and the and the reward,
44:02but you don't know how to plan.
44:03You haven't got a plan of what to do.
44:04We're thinking of the replay
44:06in matter and door world.
44:07The replay is constructing that plan
44:10for you by by by essentially focusing
44:12on a state in the world and then
44:15doing a little little Bellman update.
44:17Just one step of reinforcement learning
44:19and the steps the the order of the
44:22steps is shown by these numbers.
44:23So it turns out that if you prioritise
44:26based on on being risk neutral and
44:29what I mean by prioritisation here is
44:31you're thinking about what planning
44:33should I do that has the most effect on
44:35the value of the start state because
44:37that's the value where you're you're
44:39where you're where you're beginning.
44:41So it turns out that in the if you
44:44prioritise based on this neutrality you
44:46for some reason you do one step at the
44:49this location away from the lava pit
44:52and then all the subsequent steps you do,
44:54in this case the subsequent 7 steps or
44:57seven six steps essentially plan in
44:59this instance backwards from the goal
45:01from the reward back to the beginning.
45:03And this notion about backward sequencing
45:06like reverse replay in in the in the
45:09in the hippocampal world is also seen
45:11in something called Prioritised sweeping,
45:13which is an old idea in Reinforcement
45:16Learning from Andrew Moore where
45:18you'd optimise the the sequence of of
45:21updates you would do if you prioritise
45:23instead based on a value of alpha,
45:25which is much lower,
45:26so much more risk averse.
45:28Now you can see that you spend
45:30all your planning time instead of
45:32planning how to get to the reward.
45:34You spend all your planning time
45:36thinking about the about the lava pit,
45:38thinking about where you can.
45:39You know how to avoid the lava
45:40pit if you were there,
45:41so the first is the same one,
45:43but then all the subsequent ones
45:44are all avoiding the lava pit and
45:46have nothing to do with getting to
45:48the reward So you can see how you're
45:50even the structure of of thinking
45:52offline is going to be really could
45:54could get really dominated by the
45:56by these nasty things that could by
45:58the nasty things that could happen.
45:59And if alpha equals 0,
46:01there's no point in doing planning
46:02at all because you can't mitigate
46:04the child the the risk of getting
46:05to the log pit as well.
46:06So you just sit there and do
46:08you just can't help yourself.
46:11So as I mentioned,
46:12this is not only for humans.
46:14So there's a lovely study that
46:15comes from the from Mitsuko Wataba,
46:18Yushida's Yushida's lab,
46:19where she's a very simple task
46:22for for for mice.
46:24So here she had a simple arena,
46:27just an open like an open
46:29field arena shown here.
46:30And then the mice were put
46:32in for a couple of days.
46:33There's nothing there.
46:34They had 25 minutes for a
46:36session just to run around.
46:37And here's some here's a path of a,
46:39of a, of a one of the mice just
46:41running around this this maze.
46:42Then on the third day after this habituation,
46:45Mitsuko put in a novel object,
46:48just basically a bunch of Lego
46:49blogs near to one corner of
46:52the of the environment and then
46:54monitored how the animals,
46:55then what what the animals then
46:56did over the subsequent days,
46:58so subsequent 4 days with
46:59this same novel object in the
47:01same location of the maze.
47:03And you can see even just eyeballing
47:05the the trajectories that the
47:07animal have this really interesting
47:10mix of essentially neophobia and
47:12neophilia and neophobia is much
47:13more much more apparent here.
47:15So it changes really the structure
47:16of the of the of the movement
47:19through the environment.
47:20So for various reasons,
47:21Mitsuko characterized being within
47:237 centimetres of the object as being
47:25sort of a critical distance as where
47:27the animal is is sort of inspecting this,
47:30is inspecting this object.
47:31And then what what she's showing
47:33here is how much per minute of
47:35these 25 minutes in each of these
47:37sessions does the animals spend
47:39within 7 centimetres of the object.
47:41So in the habituation days is just
47:42within 7 centimetres of that circle.
47:44That's this circle shown here.
47:45And you see that that, you know,
47:47the animals spent some time there.
47:48But there's nothing,
47:49there's nothing failing those locations here.
47:51When she puts in the novel object,
47:54you can see that then that
47:55really dramatically changes the
47:56structure of behaviour.
47:57And here she's ordered the animals that
48:00like 26 animals by the amount of total
48:02time they spend near the near the object.
48:04So these animals,
48:05these early animals spend a sit barely
48:08anytime near the object at all.
48:09These animals which are late here,
48:12they spend much more time near to the
48:14object than the than the first ones do.
48:16And so there's a sense in which
48:17these are very risk averse animals.
48:19They had what we would think of
48:21as being this low value of alpha,
48:22whereas these animals are much more,
48:24much less risk averse,
48:25They're much more willing to go
48:27get close to the to the object.
48:29And so you can see that the way that
48:31they approach the object is also changes.
48:33So here you can see that in the
48:34first day of the object they
48:36what she's done is used.
48:37They use deep lab cut from the mathesis
48:40to classify whether the animal has
48:42his nose pointing to the object
48:43or the tail point of the object.
48:45You see in the early days the animal only
48:47has what they call cautious approach,
48:49so only approaches the object with
48:51its nose in front and its tail behind.
48:54Then over time the animals are then
48:55more willing or some of the animals
48:56are more willing to just engage the
48:58object that they're not protecting
48:59their tail in this particular way.
49:01Very appropriate for tail
49:02risk as you can imagine.
49:04So if we look at the frequency of approach,
49:07so frequency per minute of
49:09approach with the tail behind,
49:11you can see that the that the
49:14all the animals are here.
49:15Again this is set up segmented
49:17into these sessions.
49:18So all the animals start
49:19off with their tail behind.
49:20So this is this cautious approach and then
49:23again using the same sort of the animals,
49:25so the same sorting between one and 26.
49:27You can see that the animals who are timid,
49:29who don't approach the object
49:30they are or barely approach to
49:31spend any time near to the object.
49:33They also never risk their tail.
49:35So their tail is always but is always
49:37they they they're spending no time
49:38with their tail exposed whereas the
49:40brave animals these ones down at the
49:42bottom they not only spend more time
49:44near the object they also do it with
49:46their their tail exposed in this way.
49:48But we were very struck by this huge
49:50individual differences in the in
49:51the in the way that these animals
49:53approach the object and so we're
49:55interested in in in modelling that
49:56so at Kitty Egal they they they
49:59characterize various aspects of the
50:01of the behaviour so the fraction of
50:04time they're close to the object.
50:06I showed you that already here showing
50:07with confident and cautious approach.
50:09So cautious in green, confident in blue.
50:12And again you can see with their
50:13sort of the animals that there's
50:14only green at the top when there's
50:16some blue at the bottom.
50:17And this is only showing the days.
50:18Since the only showing the days
50:20off the object has been evaluated.
50:23You can look at the how long they
50:24spend near the object and again you
50:26can see that that's shown again
50:28shown by this colour.
50:29So the brave ones spend a lot of time,
50:30the the timid ones spend very little
50:32time and how frequently they visit
50:34the object, they they go there.
50:36And again the brave ones visit frequently
50:38the the timid ones are barely visited at all.
50:42So it goes a model of this,
50:43but I'm not going to,
50:44I haven't got time to go through
50:45all the details of the model,
50:46but just to just to give you the,
50:47the, the hint of what's inside it.
50:49So why do they visit the object at all?
50:51Well, that's Neophilia.
50:52They're interested.
50:52There's an exploration bonus we imagine
50:54which is associated with that and we
50:56imagine that this exploration bonus
50:58replenishes as if they don't know,
50:59they don't know that the object is not,
51:01is not,
51:01is not never actually gives them
51:03a real return, right.
51:05The object is just a bunch of Lego blocks.
51:06There's no food or anything
51:08positive associated with it
51:09and we imagine that when the animals
51:12have due confidence approach they
51:13they can stay enjoy more than
51:15they consume the reward faster.
51:17Then we have a hazard function.
51:19Why are they neo phobic?
51:20Well they're why that maybe at some
51:22point a predator or something is
51:24going to jump out from this object
51:26or something naughty might happen
51:27and we imagine that that increases
51:29over time spent near the object.
51:31So the longer they spend near the object,
51:33the more that they're worried
51:35about predation.
51:35And then that we imagine that that then
51:37resets when they move away from the object.
51:39And we imagine that it's less
51:41dangerous when they do cautious
51:42approach than confident approaches
51:43of why they want to approach in
51:46this cautious way in the 1st place.
51:47And we critical to this is that
51:50the uncertainty about that,
51:51about their about whether there's
51:52a predator or not only will reduce
51:54if they actually visit the object.
51:56If they don't visit the object
51:57or don't spend time there,
51:58they're not going to find out that
51:59in fact the object is completely
52:01benign and never hurts them.
52:02And so we have this nice parcel,
52:04this important path dependence whereby
52:06the timid animals don't visit for long,
52:08they don't find out the object is
52:10safe and therefore they they carry
52:11on not visiting for long because
52:13they haven't found out this,
52:14this safety itself.
52:15And then we have this risk of
52:17aversion 2 and then when we then
52:20build a model of their behaviour.
52:22So here I just characterised that
52:23sort of abstracted away from
52:25the animal data themselves.
52:26You can see we sort of capture
52:27the sort of the, the,
52:28the general trends in the animal in the,
52:30in the, in the.
52:31With this abstraction you can
52:32see we do a really good job.
52:33We have quite a lot of parameters I must say.
52:35We can do a really good job of
52:37fitting their data by essentially
52:38synergising the amount by which
52:40they're to which they're risk averse,
52:42this PC bar mechanism and also
52:44the amount by which to which they
52:47are with their prior over what
52:49the object is like and that prior
52:51is not not influenced enough.
52:53If they don't visit the object,
52:54they don't disturb the object.
52:55It's it's safe in the way that I described.
52:58OK.
52:59So because I'm running out of time,
53:00let me just go to the general discussion
53:03that's really discussion about that.
53:05So just to sum up then on this risk aversion,
53:08I think we can,
53:09it's nice to think from a sort of
53:11computational psychiatric point of
53:12view about the things that the thing,
53:14the way that evaluation happens in
53:16the context of this risk aversion.
53:17So you think of sort of people who
53:20are highly risk averse in some sense.
53:21Maybe they're solving a different
53:23problem from others.
53:24And so here we've shown that
53:26you that optimally,
53:27if you have a really low value
53:28of alpha or in some context
53:30this this nested C bar,
53:31NC bar, then you'll see this
53:34dysfunctional avoidance.
53:34And also this rumination process
53:36in the sense that you'll keep on
53:37worrying about all the nasty things
53:39that can happen if alpha is near 0.
53:40You have action,
53:41indifference and helplessness,
53:42and that's the correct answer.
53:44That's the right thing to do.
53:45If your value of alpha is so low
53:47and you live in a stochastic world,
53:49how much rumination you should do?
53:51There's some sort of threshold.
53:52How much planning you want to to to do,
53:54how much improvement you need to have is
53:56something which again is under your control.
53:58Maybe you want to really squeeze
54:00out all possibilities.
54:01Then you're going to have to do an
54:02awful lot of rumination to worry
54:04about all the really low probability
54:05outcomes that can happen.
54:07And then for humans we have this problem
54:08that we live in a very complicated world.
54:10We can always imagine another
54:12catastrophe around the corner.
54:13If you pay a lot of attention
54:15to low probability outcomes,
54:16then we can always invent nasty low
54:19probability outcomes that will cause
54:20you to to to to to have problems.
54:22And then as then in the
54:23case of the the rodents,
54:24we can see there's an effect on this
54:27exploration exploitation trade off
54:28in the sense that the animals that
54:29don't explore can't find out about
54:31safety and therefore they can never,
54:32they will never be able to to to
54:35essentially treat the object in its
54:37natural way in terms of to another
54:39source of problems and risk in terms
54:41of evaluation is that maybe when
54:43we're thinking about this rumination,
54:45we think maybe there's some subjects
54:47who try to do this ruminative planning,
54:50they try to think, well, OK,
54:51if I'm at the native object,
54:52here's what I would do to go away from it.
54:54But it's so aggressive to think about it.
54:56They will never consummate that planning.
54:58They never stop doing that
54:59planning in this way.
55:00And so that's an idea that Quentin
55:02Hughes and I worked on a long,
55:04long time ago was that this,
55:05this is a sort of internal behavioural
55:08inhibition associated with a,
55:10with a thought,
55:10if you like,
55:11about a piece of planning.
55:12So maybe that leads you never
55:13to consummate the planning,
55:14which means you have to do
55:15it again and again and again.
55:17So again leading to a sort
55:18of rumination itself,
55:19you can imagine that you don't
55:21adjust for luck appropriately.
55:22So if you're unlucky you don't
55:24think that I'm now,
55:25I can now afford to be a bit more
55:27risk avert risk neutral again.
55:28So again you'll then have more
55:30negative evaluation,
55:31you should have itself and then in terms
55:33of the the maybe the environment you have,
55:36the way that you're evaluating risk is
55:38not appropriate the environment you have.
55:39I think one nice way to think
55:41about that is in terms of over
55:43generalizing representations.
55:43So with something again you see in
55:45depression which is I've shown you
55:46that this sort of infects states so
55:48if you think that something nasty
55:49might happen then the value of that
55:51state gets associated with the nastiest
55:53thing that can possibly happen.
55:54So if you over generalize
55:56your representations,
55:56you're putting nice States and
55:58nasty states together and therefore
56:00the value of the nasty states over
56:02infects the values of the nice
56:03states you could possibly have.
56:04So lots of things to investigate in
56:06in in in risk in the in the future
56:08using hopefully these different
56:10aspects of sequential evaluation.
56:11So thank you very much.