# Yale Psychiatry Grand Rounds: "Multiple Steps to the Precipice: Risk Aversion and Worry in Sequential Decision-Making"

January 19, 2024## Information

January 19, 2024

"Multiple Steps to the Precipice: Risk Aversion and Worry in Sequential Decision-Making"

Peter Dayan, FRS, Director, Max Planck Institute for Biological Cybernetics

ID11196

To CiteDCA Citation Guide

- 00:14And just swap your screen and
- 00:16then we'll be done. Exactly.
- 00:18We have this all nicely prepared,
- 00:19of course. That's OK. Perfect. Super.
- 00:25OK, Well, thank you very much indeed.
- 00:27Sorry about that. That that hiccup.
- 00:28No, nothing is quite as smooth as you hope.
- 00:30Thanks so much for that
- 00:31really generous introduction.
- 00:32You know, it's a really great
- 00:33pleasure and honour to be here.
- 00:34I really followed Phil's work
- 00:35over many years as well,
- 00:36really learned an awful lot from it.
- 00:37So. So it's really great to be
- 00:39here and thanks for the thanks.
- 00:41That's it.
- 00:41So the work I'm going to talk about
- 00:43is joint with a number of people.
- 00:45So Chris Gagney,
- 00:46who was a post doc in tubing
- 00:47and is now a now works for a
- 00:49company called Hume in New York,
- 00:51two research assistants in in tubing
- 00:53and Kevin Shen and Yannick Striker.
- 00:55And then I might also talk
- 00:56about some work with two of my
- 00:58other colleagues in Tubing and
- 01:00Kevin Lloyd and Shin Sui.
- 01:03So to introduce this,
- 01:05imagine the following game.
- 01:06You're controlling this rather crude
- 01:09refrigerator like a robot here,
- 01:12and your job is to get to
- 01:15this treasure chest here.
- 01:16And there's a word for getting to
- 01:18the treasure chest worth worth
- 01:20five points to our subjects.
- 01:22There's a cost for falling into
- 01:24these these things which Chris
- 01:26loves to call these lava pits.
- 01:27There's this,
- 01:29this this is the Iceland version of
- 01:31this with the with the the volcanoes you
- 01:34have when you try to move north-south,
- 01:35east and West,
- 01:36there are some blockages
- 01:38shown by these brick walls.
- 01:39And there's also an error chance
- 01:41of an error of a of a of an
- 01:43eighth when you try to move.
- 01:44So if you try to go north,
- 01:46there's an eighth chance you'll
- 01:47move in one of the other directions
- 01:48instead and then we have a discount
- 01:50factor to try and encourage you
- 01:51to get to the goal quickly.
- 01:52So the question then we pose our
- 01:54subjects is which route would
- 01:56you take given this?
- 01:57So there's a three obvious routes.
- 01:59I think there's this route that
- 02:00goes down here through all the
- 02:02lava hits to get to the reward,
- 02:03the most direct route.
- 02:04There's a route which goes as sort
- 02:06of the intermediate route which goes
- 02:08around here and then goes close
- 02:10to this lava but not not the the
- 02:12main bulk of lava to get to here like this.
- 02:14And then there's this long route
- 02:16that goes around here all the way
- 02:17and then gets to the novel pit that
- 02:19gets to the to the goal in that way.
- 02:21So we administered this to
- 02:23to our subjects in the lab.
- 02:24I promised I wouldn't tell tell you
- 02:26who they are because he's kind of
- 02:28revealing about about your colleagues
- 02:30when you do this and you can see
- 02:31that there are subjects divided about 1/3,
- 02:33a third,
- 02:34a third maybe a few fewer.
- 02:36So some people took this very
- 02:37direct route to get to the goal.
- 02:39Another group took this intermediate
- 02:41one and you can see here the where
- 02:43they're being deviated off this route
- 02:45by these by these random spots.
- 02:48And then some other subjects
- 02:49took all the way around.
- 02:51And so the question for this talk is,
- 02:52what is it that goes on in terms of
- 02:55evaluating the risk associated with these,
- 02:57with these parts?
- 02:58And how do you make these?
- 02:59How do you make these choices?
- 03:01In this instance,
- 03:01we're very interested in the
- 03:03case that you're making choices,
- 03:04not just a single choice,
- 03:06but by committing to this path here,
- 03:08you're successively adjusted.
- 03:11You have to adjust yourself so these
- 03:13many steps of risk that you get.
- 03:15And I think that in,
- 03:15you know a lot of the work that
- 03:17that that we and other people have
- 03:18done in reinforcement learning is
- 03:20thinking about sequential decision
- 03:21problems where you don't only make
- 03:23one choice, you make many choices.
- 03:25And when those choices are
- 03:26are infected by risk,
- 03:27risk can accumulate on paths
- 03:29in rather interesting ways.
- 03:31And that really is the context of my
- 03:32talk of my talk to think about what
- 03:34the consequences are of that and how we
- 03:36should think about that as the whole.
- 03:38So the original,
- 03:39some of the original thinking about
- 03:41risk was actually came from the
- 03:43Bernoulli's thinking about what's
- 03:44what then became known as or what is
- 03:47known as the Saint Petersburg problem.
- 03:48The way that you pose this is you're
- 03:50tossing a fair coin and then you
- 03:52look at the number of heads that
- 03:54you get before you get a tail.
- 03:56So if you get one head before a tail,
- 03:57you get to €2 or two monetary units.
- 04:00If you get 2 heads, you get 4,
- 04:02three heads, 8 and so forth.
- 04:03And the question is how much would
- 04:05you be willing to pay me to give
- 04:07you an instance of this game.
- 04:08And the the reason why it's a problem
- 04:11or a paradox is that the expected value,
- 04:14so the mean value of these of
- 04:16this sequence of of outcomes,
- 04:18this mean value of of of being
- 04:20playing this game like this is
- 04:22actually infinite because with a
- 04:23probably over half you get €2.00
- 04:25the probably of 1/4 you get €4.00
- 04:28probably an 8 you get €8 and so forth.
- 04:30And so the sum value each of these,
- 04:32each of these possibilities is
- 04:34worth €1.00 and that would then
- 04:35just go off to the off to Infinity.
- 04:38And so the expected value is about
- 04:40is Infinity,
- 04:41but the amount that most people think
- 04:42how much you'd be willing to pay most
- 04:44people will pay you know somewhere
- 04:45between 4:00 and 8:00 EUR or four
- 04:47and $8 to play a game like this.
- 04:49And so that's the paradox,
- 04:49is to try and understand why.
- 04:52But I think the paradox becomes
- 04:53sharper or at least the task becomes
- 04:55sharper when you think of it in the
- 04:57sequential manner that it really
- 04:58is originally could also be posed.
- 05:00So here you're tossing the first
- 05:02coin and at stake is €2.00.
- 05:04If you get a, if you get a a tail,
- 05:07that's what you're going to walk
- 05:08away with is just two EUR.
- 05:09On the other hand, if we're lucky,
- 05:11we get a head.
- 05:12This is the world's smallest gold coin,
- 05:13which is that Einstein, It's a Swiss coin.
- 05:16Then you get a head.
- 05:18That means that now you get stake is €4.00.
- 05:21And again you're tossing this
- 05:22coin and you're thinking,
- 05:23you know what's going to happen.
- 05:24I get a head or a tail.
- 05:26I'm lucky.
- 05:26I'll get a head and then now the
- 05:28stake becomes €8 and so forth and then
- 05:31you get a tail and then and then in
- 05:33this instance you'd walk away with the €8.
- 05:35And so you can imagine that as you're
- 05:38getting you know essentially more and
- 05:39more money is at stake as you do this.
- 05:42I'm sure many of you are familiar
- 05:44with the balloon adaptive risk task,
- 05:46the balloon adaptive risk,
- 05:47the bot task, which has something
- 05:49very similar where you're pumping up
- 05:50a balloon and you know at some point,
- 05:52you know one pump is going to make
- 05:53it burst and you lose everything.
- 05:54And the question is when do you,
- 05:55when do you quit?
- 05:56And the Saint Petersburg problem,
- 05:57it's you have to pay before you ever start.
- 06:01OK. So the plan for the talk is talk
- 06:03a bit about risk aversion in general,
- 06:05how it comes up,
- 06:06talk about the measure of risk,
- 06:08which I think is a particularly useful
- 06:10measure for the sort of work that that we do.
- 06:13And I think also that it applies
- 06:14also in animal cases too.
- 06:15And I'll give you a little example
- 06:17of that at the end of my talk,
- 06:18I hope if I have time,
- 06:20so talk about tail risk in
- 06:22sequential problems,
- 06:23then talk about risk of
- 06:25those online behaviour.
- 06:26So thinking about our subjects
- 06:27making their choices in the in
- 06:30that little maze that you know
- 06:31with the with the robot and the
- 06:33and the lava pits and so forth,
- 06:35say a word about risk,
- 06:36averse offline planning.
- 06:37So the idea is if you're in an
- 06:40environment in which risk is,
- 06:42which is replete with risk,
- 06:43then maybe there are things that you
- 06:45can do ahead of time to try and mitigate it.
- 06:47Maybe that's going to change the
- 06:48way you go about thinking about
- 06:50the about the aspects of the world,
- 06:52doing some offline planning to
- 06:54prepare yourself correctly and
- 06:55then think about what that looks
- 06:57like in the context of of of risk,
- 07:00risk diversion and risk sensitivity.
- 07:02And then also as I say if I have
- 07:03a chance I'll talk a word,
- 07:04say a word about a some modelling
- 07:07we've done of a some lovely data on
- 07:10how mice do apparently risk sensitive
- 07:13exploration with some data from
- 07:16whatabi Yoshida Mitsuko's work in in Harvard.
- 07:20OK,
- 07:21so decision making and risk.
- 07:22So as you all know,
- 07:24risk is a very critical aspect of decision
- 07:26making and it comes up anytime that
- 07:29we have uncertain or probabilistic outcomes.
- 07:32So here you know you're here in
- 07:33Saint Petersburg,
- 07:34we're spinning a coin in other contexts,
- 07:36we have other sorts of ways
- 07:37of generating these,
- 07:38these these probabilities.
- 07:41Obviously whole industries
- 07:42have been designed around it.
- 07:44So things like insurance markets.
- 07:45So this is the famous,
- 07:47this is Lloyds of London,
- 07:48a little picture of Lloyds of London.
- 07:50And I think that it's likely
- 07:52plays a very crucial role in
- 07:53many aspects of psychopathology.
- 07:54And this is a study that has
- 07:56been done by very many groups,
- 07:57including obviously working
- 07:59in in in in Yale too.
- 08:01So things like anxiety and mania
- 08:04are obviously issues about what
- 08:06might happen could could be there
- 08:07in OCD you'd see that as well
- 08:09something again something that
- 08:10Phil has actually worked on too.
- 08:12And you also you have this notion of
- 08:14these sort of ruminative what ifs.
- 08:16So in the context of the complex world
- 08:18that we occupy there are many ways in
- 08:21which we can be many risks that can
- 08:23with very low probability events there
- 08:26will cast swerves on the ice in a in
- 08:28a in Tubian this morning very icy.
- 08:31So you can imagine when you're you know
- 08:32walking on the pavement there is a
- 08:34chance that something nasty can happen.
- 08:35If you pay a lot of attention to these
- 08:37very low probability probability outcomes,
- 08:39then then of course that's going to be
- 08:42problematical for your expectations
- 08:44about what might about what might happen.
- 08:46And when you do that,
- 08:47when you know you commit to a long
- 08:49series of choices, then as I as I said,
- 08:52you have to worry about how risk
- 08:54accumulates along these along these paths.
- 08:56So it's been beautifully studied
- 08:58using single shot gambling paradigms.
- 09:00So here's a classic example where
- 09:02you have a choice of either a Shaw
- 09:04$5 or a 5050 chance of $10 or a 5050
- 09:07chance of $16.00.
- 09:08I'm sorry in this case
- 09:10here so many paradigms.
- 09:12Obviously Canavan diversity done a
- 09:13lot of work on that in in in Yale.
- 09:14IFAT has done a lot of beautiful
- 09:17work along these lines too.
- 09:19But what we want to look at is
- 09:21the sequential problems and not
- 09:22only not only single shot games.
- 09:24And so we'll see how that comes out.
- 09:27So in order to make progress,
- 09:29we have to define what sort of what
- 09:31measure of risk we're going to use.
- 09:33So there are a number of measures that
- 09:36have been studied in the literature.
- 09:37So prospect theory, for instance,
- 09:38very famously gives us a ways of thinking
- 09:41about how to combine your utilities and
- 09:43probabilities and these risk cases.
- 09:46But there's also a lot of work
- 09:48from the insurance industry,
- 09:49which of course has been,
- 09:50you know,
- 09:50which was worried about many
- 09:52aspects of risk for a long time
- 09:54and in a very quantitative way.
- 09:55And one of the and they've
- 09:57sort of come up with ideas,
- 09:59or the mathematical aspect of that has come
- 10:01up with ideas about how to systematize risk.
- 10:04And one of the systematic ways that they
- 10:07think about is to think about tail events.
- 10:10So here we think of the distribution
- 10:12of possible returns as just
- 10:13some sort of histogram.
- 10:14And then we the the risks
- 10:17that we worry about,
- 10:18the risks we care about are risks which
- 10:20are found typically in the lower tail.
- 10:22They're the nastiest things that can happen.
- 10:23So for instance,
- 10:24many of you will know that you could
- 10:27think about there these Markovits
- 10:28utilities where you add to the
- 10:30mean some fraction of the variance,
- 10:32but the variance of the distribution
- 10:34includes not only the lower tail but
- 10:36also the upper tail that thinks about
- 10:37the whole structure of the distribution.
- 10:39Whereas the things that we
- 10:40worry about are the tail risks.
- 10:42They're the nastiest things
- 10:42that could possibly happen.
- 10:43So things like and that's naturally medicine,
- 10:46finance, engineering and maybe also
- 10:48things like predation in animals too.
- 10:51So how does that work?
- 10:52So just illustrate this
- 10:53with our very simple case,
- 10:55the Saint Petersburg problem.
- 10:57So yeah, So what I'm now doing is
- 10:59showing you all the outcomes and their
- 11:01weighted by the and their probabilities.
- 11:03So this is 5050 for two EUR up to,
- 11:05you know, gets vanishingly small with this,
- 11:07this average value outcome
- 11:09being worth Infinity.
- 11:10And if you think about the tail,
- 11:12what we might do is to say
- 11:13let's choose in this instance,
- 11:14let's say the lower 7/8 of the distribution.
- 11:17So that's just these three dark blue bars.
- 11:20And that cuts off the upper
- 11:221/8 of this distribution,
- 11:24which is all the other,
- 11:24the much nicer outcomes you could
- 11:27possibly have and and this and then
- 11:31this the the value of the outcome
- 11:33at the which is which is defined
- 11:36by this by this lower 7/8 tail.
- 11:38That's a quantile.
- 11:39That's just a 7/8 quantile
- 11:40of this distribution.
- 11:42That's a risk measure itself
- 11:44called the Value at Risk or VAR,
- 11:47shown here.
- 11:47It turns out that the value at
- 11:49risk doesn't satisfy some of these
- 11:51nice qualities that we expect that
- 11:54the from the insurance industry
- 11:56nicely worked out by Artzner,
- 11:57Rockefeller and EUR 7 many others as well.
- 12:00But a measure which also thinks
- 12:02about the lower tail and does
- 12:05satisfy these axioms is called
- 12:07the conditional Value at Risk,
- 12:08which is simply the average
- 12:10value in that lower tail.
- 12:12So the idea is you say I'm
- 12:14worried about the tail,
- 12:15we have an alpha value saying
- 12:16which tail am I worried about?
- 12:17The 7/8 tail.
- 12:18If it's the if it's the 100% tail,
- 12:20the one tail,
- 12:21it's just the whole distribution.
- 12:23Here it's the seven eighths tail.
- 12:24I've cut off all the really
- 12:25nice outcomes and I'm left only
- 12:27with the nastiest outcomes.
- 12:28And as that gets more extreme,
- 12:30I think about more and more or
- 12:31less and less of the distribution,
- 12:33just more and more of the nastiest
- 12:34things that can happen are going to be
- 12:36the things that I imagine happening.
- 12:37And that then defines the
- 12:39average value in those,
- 12:40defines this conditional value at
- 12:42risk or this C bar value itself.
- 12:44So how does that look?
- 12:47As we reduce alpha so alpha equals one,
- 12:49we have the whole distribution.
- 12:50That's Infinity.
- 12:51If alpha is 15 sixteenths,
- 12:53we just get these four bars,
- 12:557/8 the three bars,
- 12:563/4 these two bars, and alpha is 1/2.
- 12:59We just have this one bar left
- 13:01and so as alpha gets smaller we're
- 13:03getting more and more risk averse.
- 13:05We're thinking about this lower
- 13:07tail of the outcomes that we could,
- 13:08that we could possibly have.
- 13:10So formally you can write that down as
- 13:13being the expected value in this lower tails.
- 13:16That's.
- 13:16Then you could just write down these,
- 13:18these,
- 13:18this expected value underneath this
- 13:20quantile of the of the distribution.
- 13:22But there's another way
- 13:23of thinking about this,
- 13:24exactly the same calculation,
- 13:26almost like a dual view,
- 13:27which also relates to the way
- 13:29that prospect theory controls
- 13:31or thinks about probabilities,
- 13:32which is to have a what they call
- 13:35a probability distortion function.
- 13:37So here I've also now written down explicitly
- 13:40these probabilities of these outcomes,
- 13:42so half, 1/4 and so forth.
- 13:44And what you do with probably
- 13:46distortion is to say I'm allowed
- 13:49to multiply the values or change
- 13:51the values of the nastier outcomes.
- 13:54I boost those probabilities and I
- 13:57suppress the higher probabilities,
- 13:59and the idea inside this conditional
- 14:03value at risk is that there's a
- 14:05maximum value of possible distortion.
- 14:08So if my alpha value is 7/8,
- 14:10which means I'm interested in this
- 14:11bottom 7/8 of the distribution,
- 14:13it means I'm allowed to multiply
- 14:16all my nastiest probabilities
- 14:17by 8 / 7 by 1 over alpha.
- 14:20And then I just keep on doing
- 14:21that until I run out of Rd.,
- 14:22until I run out of probability mass
- 14:24because in the end it still has
- 14:26to be a probability distribution.
- 14:28So in this instance,
- 14:29I multiply all these outcomes by a
- 14:31weighting factor which is 8 sevenths here
- 14:33until I then run out of run out of road.
- 14:36And so then then that just leaves the
- 14:38only these three bars as being something
- 14:40which is contributing to my to my values.
- 14:42And you can see that that's an exactly
- 14:44equivalent to the three bars that we
- 14:46have here in terms of the value at risk.
- 14:48So these are equivalent ways of thinking
- 14:51about, about thinking about this,
- 14:53about the effect of of these tales.
- 14:56And they're both very, I think,
- 14:58very useful constructs to think
- 15:00about the about these, these,
- 15:03these these nasty possible outcomes.
- 15:06OK, so just to summarise on,
- 15:07on Sevar,
- 15:08it's what's called a coherent risk measure.
- 15:11And that's these axioms I was
- 15:12referring to that that we want from
- 15:14insurance which have to do with
- 15:15things like you want the risk to
- 15:17decrease if we diversify your assets,
- 15:19something that's what the value
- 15:21at risk does not have.
- 15:23It emphasises the lower tail.
- 15:25So we're always interested in the
- 15:26nasty things that can happen.
- 15:28If alpha's one,
- 15:29it's the regular mean.
- 15:30We just think about the overall mean of
- 15:32the distribution that was the Infinity.
- 15:33Here, as alpha tends to zero,
- 15:35we only care about the worst possible case,
- 15:38which is the the minimum that can happen.
- 15:41And we have this nice equivalence
- 15:43to these distorted these probability
- 15:45distortion measures in which
- 15:47we favour that outcomes.
- 15:49OK,
- 15:49so that's when we can see the whole
- 15:51distribution in front of us like
- 15:52you have in a regular gambling case.
- 15:54You know if you're just specify
- 15:56that what happens if we the way
- 15:58we started thinking about this was
- 16:00to think about the sequential case
- 16:02where we spin the coin and then we
- 16:04either get it either get a head or
- 16:06tail and then we can spin the coin again.
- 16:08So how does that work in this in this domain?
- 16:10And you'll see a sort of surprise comes
- 16:12up that we then have to cope with.
- 16:14So here we started off with the first
- 16:17flip of the coin and so these you know
- 16:19if we get the tail we get to €2.00,
- 16:21we get the head, we get a chance
- 16:23to carry on to know and then we get
- 16:25to chances to spin the coin again.
- 16:27So and then if you spin the coin
- 16:29again you get to know again if
- 16:31you get a tail you get €4.00.
- 16:33If you get the head,
- 16:33you get, excuse me,
- 16:34the chance to spin the coin again,
- 16:36You spin the coin again,
- 16:37you get €8 and then and so forth and
- 16:40just carries on down and down and down.
- 16:42So as I mentioned now what we want to
- 16:45do when we're thinking about the the
- 16:47risk is we distort our probabilities.
- 16:49So we start at the beginning.
- 16:51We say OK well now I said that
- 16:53if alpha is 7 / 7 / 8,
- 16:55we get to distort the properties
- 16:57by 8 by by 8 / 7.
- 16:58Then we can distort those
- 17:00properties some maximum value,
- 17:02which means that we make it
- 17:03more likely to get the tail and
- 17:05less likely to get the head.
- 17:06So we make this bar the the the
- 17:08left bar slightly higher and
- 17:09the right bar slightly lower.
- 17:11That's our distortion.
- 17:12Our risk sensitivity has said,
- 17:15OK, we think that even though
- 17:16it should really be 5050,
- 17:18the the real answer is 5050.
- 17:20In our subjective evaluation of this,
- 17:22we boost the nasty one and and
- 17:24slightly suppress the the the nice
- 17:26one and the amount that we suppress
- 17:28it by then though is is is is also
- 17:30reflected by the to to make sure
- 17:32that the property is also up to 1.
- 17:34So you might think it'd be
- 17:36very natural thing.
- 17:37Well,
- 17:37now we have another choice and
- 17:38we do the same distortion again,
- 17:40and then we do the same
- 17:41distortion again and so forth.
- 17:43But that does actually
- 17:46generate a a version of sebar,
- 17:48but it doesn't generate the
- 17:50version of sebar that we started
- 17:51off with thinking about.
- 17:52So here I say what you want to do is just
- 17:54look only at the lower possible tail.
- 17:56You can see that if we just
- 17:58keep on distorting by the same
- 17:59fraction every single time,
- 18:00then we're going to actually get instead of
- 18:03getting distorting the the tails like this,
- 18:06we're actually going to get a
- 18:07contribution from all the possible outcomes.
- 18:09But now each of the outcomes instead
- 18:12of instead of being boosted by,
- 18:14instead of being going down like
- 18:17one like a half 1/4 and so forth,
- 18:19it tends to go,
- 18:20it actually goes down like
- 18:213737 squared and so forth.
- 18:22There's a sort of technical reason for that.
- 18:24You can see that that doesn't
- 18:26have the property that I talked
- 18:27about in which we just sort
- 18:28of slice off this bottom,
- 18:30this bottom aspect of the distribution.
- 18:32It is a, it is a risk measure that
- 18:33we some that we could also use.
- 18:35And in fact in many cases it's a very,
- 18:38it's a very severe risk measure.
- 18:41It's a more severe risk measure.
- 18:42But the measure we wanted to talk
- 18:44about instead actually requires us to
- 18:46do a different sort of calculation,
- 18:47which I think is really important for
- 18:50thinking about how risk processing
- 18:51works in this this sequential way.
- 18:53So instead what happens is after we've,
- 18:57after we've boosted the, after we,
- 18:58we're lucky and we we got ahead.
- 19:00At this point, if you think about it,
- 19:02we're trying to accumulate the
- 19:04amount of luck that we can have
- 19:06over a whole sequence of choices.
- 19:07This is the sequential aspect.
- 19:09And if we start off and we're already lucky,
- 19:12it means we've already consumed
- 19:13some of our good luck.
- 19:14Which means that now we have to be a
- 19:16little bit more risk averse in the
- 19:18future in order that the total amount
- 19:20of luck that we're expecting to get or
- 19:22that good or bad luck we're expecting
- 19:24to get is pegged to right at the beginning.
- 19:27So that means that now
- 19:29having been this much risk,
- 19:30having been this lucky in this case,
- 19:32we got our first tail,
- 19:34we got Einstein first,
- 19:35we now have to be a more risk averse.
- 19:39So alpha started out at 7/8 and now it
- 19:42turns out that it has to be boosted.
- 19:44It has to be.
- 19:45The amount of risk aversion
- 19:46has to be boosted,
- 19:47which means that the alpha value
- 19:49decreases from being 7/8 to being 3/4.
- 19:52So now when we do our probability distortion,
- 19:55we're now we distort the we now make
- 19:58it even more likely now with Four
- 20:00Thirds more likely rather than rather
- 20:02than 8 sevenths more likely that we're
- 20:04going to get the unfortunate outcome,
- 20:06which is the the the the tail in this case,
- 20:10and we make it less likely that
- 20:11we're going to get the head.
- 20:13And now if we do get the head,
- 20:14we've been lucky again.
- 20:16We've consumed even more of our good luck.
- 20:18And so now the we become even
- 20:20more risk averse.
- 20:21The alpha value goes down further to 1/2.
- 20:25And so now when we do the distortion
- 20:26it turns out we do maximal distortion.
- 20:28So now the tail instead of being
- 20:31probably 5050 in our minds it's gone
- 20:34up to the probably has gone up to 1.
- 20:36The probably getting the head,
- 20:37the sorry the probably getting
- 20:38the head has gone to zero.
- 20:40And that is then means that we
- 20:41therefore can never get the,
- 20:42we never get any more further down the tree.
- 20:45And so in order to compute the
- 20:48Sivar in this way,
- 20:50when we think about a sequential problem,
- 20:52we have to keep on revaluing our alphas.
- 20:55If we're lucky,
- 20:56it means we become more risk averse,
- 20:58which means alpha gets lower.
- 21:00If we're unlucky,
- 21:00it means in fact we can become
- 21:02more risk seeking in the future
- 21:04because we're sort of
- 21:05trying to peg the total amount of risk
- 21:07that we suffer along the whole path
- 21:09along the way towards towards the end.
- 21:12So there's this notion
- 21:13here of pre commitment.
- 21:15When we start the problem we think how
- 21:17much risk are we willing to endure
- 21:20or and then as we then are lucky or
- 21:22unlucky we don't have to adjust the
- 21:25way that we we endure this the way
- 21:29that we evaluate future outcomes.
- 21:32So in pre committed C bar we're
- 21:34privileging a start saying we're
- 21:35saying this is where we're defining
- 21:37risk from because then because
- 21:39we're then revaluing our alpha,
- 21:40our risk aversion in order
- 21:42to peg where we're going.
- 21:43So you might think of that as
- 21:44being like a home or a or a nest
- 21:46for an animal for instance.
- 21:47And then we have to change alpha
- 21:49and the way we change it is like a
- 21:51justified form of the gambler's fallacy.
- 21:53If you're unlucky,
- 21:54you've been unlucky for a while,
- 21:56then you then in some sense
- 21:57you can be more risk.
- 21:59You can be more a little
- 22:00bit more risk seeking,
- 22:01you mean less risk averse.
- 22:02If you've been lucky then you're expecting
- 22:04to be more unlucky in the future,
- 22:06so therefore your alpha decreases in that
- 22:08way in order to peg the total amount
- 22:11of risk you have along a whole path.
- 22:13Alpha equals zero and one are special,
- 22:15so alpha equals one is means.
- 22:18It's just the mean and then
- 22:19you never revalue that.
- 22:20You just keep on without
- 22:22value of alpha equals one,
- 22:23alpha equals 0 is the minimum and you
- 22:24stick with that too because you can
- 22:26never you can never get more risk.
- 22:27You know you you basically
- 22:29if you you've run out of Rd.
- 22:30you're always thinking about the
- 22:32worst possible outcome that can
- 22:34ever happen and so you have to
- 22:35then in order to do this you don't
- 22:37have to have this either.
- 22:38So monitor how much luck you've
- 22:41had along a path or we just think
- 22:44about changing the value of alpha
- 22:45as we go along and then we make it
- 22:47in the way I showed you for Saint
- 22:49Petersburg problem where we make
- 22:51alpha where there we made alpha
- 22:52smaller and smaller because we kept
- 22:53on being lucky and lucky and lucky.
- 22:55Every time we got the head until
- 22:56the end we ran out of road and then
- 22:58we ran out of the at the after the,
- 23:00you know, evaluation of this,
- 23:01we ran out of at the third outcome.
- 23:05So how does that look in a more
- 23:07conventional sort of random walk?
- 23:09So here's a simple random walk where
- 23:11we have a agent which can go left or right,
- 23:15or try to stay where it is.
- 23:16There are two rewards,
- 23:17one on the right hand side,
- 23:19a small reward worth +11,
- 23:21on the left hand side worth +2.
- 23:24And then here's one of Chris's Lava pits,
- 23:25which is,
- 23:26which is threatening.
- 23:28And you have again a small
- 23:30probability of an error
- 23:31in the choices. So here if you
- 23:33have completely uniform choice,
- 23:35you go left, right or try to
- 23:36stay where you are equally often.
- 23:38Then if this is our start state,
- 23:40this is the distribution of outcomes
- 23:42you would actually get with some
- 23:43with a discount factor of .9.
- 23:45So then because in the end you
- 23:46get trapped by the lava pit and
- 23:47then that's the end of the,
- 23:49that's the end of the game.
- 23:50And so here from the stored state,
- 23:52this is the distribution.
- 23:53So we're thinking about C bar,
- 23:54We're obviously thinking about
- 23:55the tails of this PC bar.
- 23:57We're thinking about the tails of
- 24:00this distribution to think about.
- 24:02So how can we evaluate the
- 24:03locations in this in this world?
- 24:05Well, if you have the this uniform
- 24:08policy and here our alpha value is 1.
- 24:11So we're just a regular reinforcement
- 24:13learner thinking about the average
- 24:14value of each of the states.
- 24:16So you can see that here I've shown
- 24:17them in colour from -10 up to plus 10.
- 24:19So the ones on the right are relatively
- 24:21good because you have this reward of
- 24:23one it you tend to a while before
- 24:24you you end up in the in the lavapia,
- 24:26which means that that value
- 24:28is discounted by a lot.
- 24:30If alpha is 0 you always think the worst
- 24:32possible thing can happen will happen.
- 24:35So the way I'm showing you that
- 24:36is there are these grey arrows
- 24:38here and so though inside this,
- 24:40inside these, inside these the choices,
- 24:43it says how frequently you try to go left,
- 24:46right or or stay where you are.
- 24:49The re weighting system says,
- 24:51well I'm going to think about the outcome,
- 24:52which is the worst possible outcome
- 24:54because my alpha is 0 and that puts all
- 24:56the weight on going left because the
- 24:58nastiest thing that can happen is going left.
- 25:00And so here you can see that all
- 25:02the values are then much much worse,
- 25:04and indeed you then just go left.
- 25:05Every time you just end up in the lava pit.
- 25:08And then in for intermediate values.
- 25:10You can see intermediate values of alpha,
- 25:13you can see how states get evaluated.
- 25:15And again you can see this effect.
- 25:17When I said that if you are lucky,
- 25:20that means in this instance that
- 25:21means you're going white.
- 25:22Because right states are better,
- 25:24then you tend to decrease your
- 25:26value of alpha.
- 25:26So these these arrows,
- 25:28these little grey arrows,
- 25:29outside the choices that you make,
- 25:31they tend to point downwards.
- 25:33If you're unlucky,
- 25:34which in this instance
- 25:35means going going left,
- 25:37then you tend to become a bit more,
- 25:38you become a bit less risk averse,
- 25:40which means that the arrows
- 25:42then point upwards.
- 25:43And so you can see that as we become
- 25:45more and more risk averse so this
- 25:46alpha value we have this very nice
- 25:48way of looking at the the changes of
- 25:50how states go from being on the right.
- 25:52For instance go from being good
- 25:54to being
- 25:55go to from being good to being bad.
- 25:58So you don't only have to
- 26:00think about evaluation here,
- 26:02you can also optimise your policy
- 26:04based on the on your risk aversion.
- 26:06You try to optimise say what's
- 26:09the policy which maximises my my
- 26:11this pre committed C var value
- 26:13with a given value of alpha.
- 26:16So if your alpha is 1, then then,
- 26:22then, then the risk averse.
- 26:24You're not risk averse at all,
- 26:25you're just thinking about the mean.
- 26:26We designed it such that the
- 26:28from the start state here,
- 26:29if alpha equals one,
- 26:30the best thing you can do is just
- 26:32to go left and you can try and stay
- 26:35at the at the reward is worth 2 and
- 26:37as long as you can and that's then
- 26:39a way of maximizing your reward.
- 26:41If alpha equals zero, you try.
- 26:44Well, the IT actually doesn't
- 26:45matter at all what you try to do,
- 26:47because there's a chance that if you try,
- 26:49if you try to stay where you are,
- 26:51you'll know less will go left.
- 26:52If you think about the worst outcome,
- 26:54it's always to go left.
- 26:55And so you can see that the
- 26:57alpha value equals 0.
- 26:57Here,
- 26:58the optimum policy is just the
- 26:59same as the uniform policy or
- 27:01any other policy as well.
- 27:02You'll always go left.
- 27:03So in fact this is sort of a form
- 27:05of learned helplessness where
- 27:07although you really have control
- 27:08in this world and some control in
- 27:10this world because you think about
- 27:12the worst thing that could happen,
- 27:14you sort of don't trust your own control.
- 27:17And therefore you think the the worst
- 27:18thing that could happen will happen.
- 27:20And thereby therefore it doesn't
- 27:21matter what you do, you can't.
- 27:23There's nothing you can do to
- 27:25mitigate that that chance and
- 27:26then in the middle so here we had
- 27:28this the pre commitment remember
- 27:30is relative to a start state.
- 27:32So here our start state is this is
- 27:34this at alpha equals .3 and you
- 27:36can see again that now we have a
- 27:39policy where you know in this in
- 27:41this particular domain the optimal
- 27:43policy at that start state is to
- 27:45go right rather than to go left
- 27:47because of the problems of the risk.
- 27:48And then as you as then this is
- 27:51what you you try to do.
- 27:52And then and then you try to
- 27:54stay here as long as you can.
- 27:55And so you can see that,
- 27:56as you might expect for everywhere
- 27:59else in the in this random walk,
- 28:02apart from the value alpha equals zero,
- 28:04you have a better outcome.
- 28:07You have all these values.
- 28:09All the values of the optimum
- 28:10policy are much better than the
- 28:12values of the uniform policy here,
- 28:14except for this long nastiest
- 28:15possible outcome,
- 28:16nastiest possible degree of risk
- 28:17aversion where you're where you
- 28:19just think whatever terrible
- 28:20happened will happen no matter what.
- 28:25I should just say so.
- 28:26There's also this this NC,
- 28:28this other mechanism which doesn't
- 28:30pre commit to a value but instead
- 28:32just sticks at a particular
- 28:34value of alpha the whole time.
- 28:36That's what I showed you in the in
- 28:38the Saint Petersburg paradox where
- 28:39you just waited the the the heads and
- 28:42tails the same way every single time.
- 28:43So in this domain that actually
- 28:45turns out to be for alpha equals one,
- 28:47it's the same as PC bar for alpha which
- 28:49is just the mean for alpha equals 0.
- 28:51Again it just focuses on the minimum,
- 28:53the worst thing that can happen and
- 28:55so it also looks the same but in
- 28:58between in for intermediate values.
- 28:59Then you can see you can see you
- 29:02can again get evaluations of states.
- 29:04And in this instance it turns out
- 29:06that this NC bar mechanism here
- 29:08is a generally more risk averse,
- 29:12so the values are worse than
- 29:14the values for the PC bar.
- 29:16So that's not true in the Saint
- 29:18Petersburg paradox because in
- 29:19that problem the only way you get
- 29:20to carry on is by being lucky,
- 29:22whereas in this problem you can be
- 29:24lucky or unlucky as you as you carry on.
- 29:26And then in PC bar if you're
- 29:28unlucky then you become less,
- 29:30you become less risk averse.
- 29:32Whereas in the Saint Petersburg
- 29:34paradox or in the bot task,
- 29:37every time you continue you must
- 29:39have been lucky and therefore you
- 29:41become more risk averse and so
- 29:43therefore relatively you the there's
- 29:46a greater degree of risk aversion.
- 29:47It's Peterborough paradox.
- 29:48Whereas in these sorts of other problems,
- 29:50NC bar is is generally more risk averse.
- 29:53In these sorts of cases you see
- 29:55that by these values all being more
- 29:58red than the than the other ones.
- 30:00So and then you can work out
- 30:02the optimal policy has the same
- 30:04similar characteristics.
- 30:05OK,
- 30:06so let's come back to our lava pits where
- 30:09we had these these cases where we had,
- 30:12excuse me,
- 30:13where we where we gave
- 30:14our subjects this chance,
- 30:16we we showed them this and
- 30:17asked them how they would move.
- 30:19And so we designed this domain so
- 30:20that it would start to distinguish
- 30:22different values of alpha.
- 30:23So different values of risk aversion as
- 30:25a way of interrogating what subjects
- 30:27would be like in these in these cases.
- 30:30So it turns out that the this most direct
- 30:33path is associated with alpha equals one.
- 30:36So if you are risk neutral then you would
- 30:39take this what this this rather risky path.
- 30:42If your value of alpha is about 0.5,
- 30:45which means you just think about the
- 30:47bottom 50% of that distribution,
- 30:48then you take this intermediate path.
- 30:50You tend to take this intermediate
- 30:52path like this and then if you're
- 30:54much more risk averse,
- 30:55you care about the bottom 15%
- 30:56of the of the outcomes,
- 30:58then you take this,
- 30:59this much more extreme risk aversion here.
- 31:01And I think it's interesting
- 31:02as one of these cases
- 31:04where it's very hard when you see how
- 31:06somebody in your lab you know performs this.
- 31:08If you're a sort of 0.4 a person,
- 31:11it's very hard to imagine somebody
- 31:12who would be so risk of so risk
- 31:14seeking as to take the very short one.
- 31:16Or if you're the person who takes this very,
- 31:17very long path, you think it's you think
- 31:19you know how could anybody take these,
- 31:20these these short paths themselves.
- 31:22So I think there's some interesting
- 31:24phenomena that come up with this.
- 31:26So we administered 30 of these mazes to
- 31:30mazes like this to a a group of subjects
- 31:33and we designed them in order to,
- 31:35you know, in order to look at
- 31:36things like how consistent was an
- 31:38individual subject in the way that
- 31:40they would be risk averse in these,
- 31:41in these, in these domains.
- 31:43And we saw a very nice
- 31:45degree of of of consistency.
- 31:48So if it's here,
- 31:49you can see one another of these
- 31:50mazes where the start stage is here,
- 31:52the goal is here.
- 31:53And so again we have a very sort of a
- 31:56path which is for the people who are
- 31:59pretty risk neutral would take which
- 32:01gets close to these two lava pits.
- 32:03You have this intermediate path
- 32:04which is longer,
- 32:05which is why it would be less favoured,
- 32:07but only goes close to one
- 32:08of these lava pits.
- 32:09And then we have an an even
- 32:11longer path which looks,
- 32:12which goes all the way around here
- 32:14to get to the goal which really
- 32:16avoids these lava pits dramatically.
- 32:17And so these are three individual
- 32:20subjects and so these choices
- 32:22were themselves associated with
- 32:23three different values of alpha,
- 32:25point,
- 32:26you know like point 2.5 and point 2.9 or so.
- 32:29And then in another maze the
- 32:30the the behaviour of the same
- 32:33subject in a different maze.
- 32:34So here this is a bit like a Cliff.
- 32:36There's just two other pits here.
- 32:38The question is how far around
- 32:40you know around them do you go.
- 32:41So one option is just to go directly
- 32:43to the goal from the start say here to
- 32:45the goal that's most no risk neutral.
- 32:48Here's one which is a bit
- 32:49a bit more risk averse.
- 32:50You can think well how far away from
- 32:52the the Cliff you would you would
- 32:54you choose to be there yourself.
- 32:55And again,
- 32:56it's very hard if you're a sort
- 32:57of risk neutral person to think,
- 32:59well,
- 32:59how is it crazy to go so far
- 33:01away from the from the goal.
- 33:03We took these 30 mazes that we administered.
- 33:06We looked at the first half and
- 33:08the second-half inferred the
- 33:09values of alpha that our subjects
- 33:11had for those for those mazes by
- 33:14fitting the choices that they made.
- 33:16And you can see that we had a reasonable
- 33:18degree of consistency between the
- 33:191st 15 mazes and the 2nd 15 mazes.
- 33:21So this shows the the alpha,
- 33:23the peak,
- 33:24the map out of the,
- 33:25the the the maximum likelihood
- 33:27alpha value for the first
- 33:28and second-half of mazes.
- 33:30So we see that they are reasonably
- 33:32well pinned and indeed the the means
- 33:34are fairly similar to and then if we
- 33:36look at the across all our subjects.
- 33:39So now this axis shows
- 33:40you the value of alpha.
- 33:41This is now the this is the the
- 33:45posterior value of alpha
- 33:46across all the toss we have.
- 33:48So you know, you know hierarchical fit.
- 33:50And then we just ordered the subjects
- 33:51by from alpha the people with the
- 33:53smallest value of alpha to people
- 33:55with the largest value of alpha.
- 33:56And you can see that we nicely
- 33:58cover the range of possible alphas
- 33:59in this in this domain and some
- 34:01people we can't infer alpha so well
- 34:03just from these these these plots.
- 34:05And so you can see that then we also in
- 34:07order to fit them, fit their behaviour.
- 34:09We have a couple of other statistics as well.
- 34:12We have they have a temperature,
- 34:13so an inverse temperature,
- 34:14or temperature, which is how noisy
- 34:16is their behaviour generally,
- 34:18and then a lapse rate which says that
- 34:19sometimes they try to, they know.
- 34:21We imagine they might try to go north,
- 34:22but perhaps they just,
- 34:23you know, by mistake,
- 34:24go in a different direction too.
- 34:25So these are very standard things
- 34:27you'd have in a model of their behaviour.
- 34:29But the thing we're focusing on
- 34:31indeed is this risk sensitivity,
- 34:32which is then just a histogram of the values
- 34:34that we can infer from there and ourselves.
- 34:36It's a nicely aligned,
- 34:38nicely arrayed across the different
- 34:40possible values of alpha as you can see.
- 34:44So we then try to interrogate our
- 34:47mechanism for changing values of alpha.
- 34:49And here we had what to us was a bit of
- 34:52a surprise in terms of what happened.
- 34:54So here what we're looking at is how
- 34:57did alpha change on if on one trial,
- 35:00one maze,
- 35:01you've got a you've got a win or
- 35:04the OR the OR or you've got a loss.
- 35:06So mostly So what this shows,
- 35:08as we said, if we then infer the
- 35:10value of alpha on one maze,
- 35:12if you then one on that maze,
- 35:13what happens to the next value of alpha?
- 35:15Are you more risk averse or
- 35:17more risk seeking on that case?
- 35:19And so from the PC bar
- 35:20mechanism I talked about,
- 35:21what we would have expected is if you are
- 35:24lucky on that case you didn't get the maze,
- 35:26you'd become more risk averse.
- 35:27Next what we actually saw was
- 35:29the opposite interestingly which
- 35:30is that after a lava pit,
- 35:32so after you saw a after you got trapped
- 35:35in one maze then in fact you became a
- 35:38bit more risk averse in the next maze.
- 35:40And so we're we're sort of
- 35:42contemplating why that might be.
- 35:43We did see A and and and we are also
- 35:46looking inside the choices you make
- 35:48inside a single maze because if you
- 35:51remember we have noisy actions so
- 35:53sometimes you're lucky or unlucky
- 35:55inside a single maze and they do see
- 35:57APC bar like effect which is that if
- 35:58you've been lucky then in the future
- 36:00you're more a little bit more risk
- 36:02averse and if you've been unlucky
- 36:03you've been a little bit less risk averse.
- 36:05So there's a conflict between
- 36:07different time scales of how
- 36:09of how this is operating.
- 36:10And that conflict also comes up a little
- 36:13bit when we look across the the the
- 36:15first and second-half of these mazes,
- 36:17the 1st 15 mazes versus the 2nd 15 mazes.
- 36:20Whereby if you had the more losses,
- 36:23if you had more losses in the first half,
- 36:25we can ask are you more risk averse and
- 36:27more risk seeking in the second-half.
- 36:29And there's some small evidence that
- 36:30in on average or a bit more risk
- 36:32seeking in the second-half and you've
- 36:34had more losses in the first half.
- 36:35So that suggests that this phenomenon
- 36:37which is a trial like a maze to maze
- 36:40effect may itself not completely generalise
- 36:42over the whole context of the mazes.
- 36:44So really some interesting things to
- 36:47investigate in this in this domain.
- 36:50OK it's an interim summary.
- 36:51So what we have,
- 36:52what I'll try to show you is this
- 36:54sort of parametric risk avoidant
- 36:56behaviour which can come from this pre
- 36:58committed PC bar and pre commitment.
- 37:00Is that you think,
- 37:01well how much risk am I willing?
- 37:03How much you know?
- 37:04Which part of this distribution
- 37:05am I willing to think about right
- 37:07from the beginning.
- 37:08And that requires you to
- 37:09have this gambler's fallacy.
- 37:10So change the value of alpha as you
- 37:13as as you are unlucky or unlucky.
- 37:15So obviously the inference is a
- 37:16little bit more complicated here,
- 37:18but in fact many ways almost every
- 37:20way that we have of thinking about
- 37:22risk in the sequential case is
- 37:24going to rely on a more complicated
- 37:26way of doing evaluation.
- 37:27Because you know for instance if you
- 37:29have a non linear a utility function,
- 37:32then if you think about my
- 37:33total utility on a path,
- 37:34you're going to have to monitor what
- 37:36that total utility you know which is
- 37:37how you which is the non linearity.
- 37:39Then you're going to have to monitor,
- 37:40you're going to have to modify your,
- 37:42you're going to have to monitor the
- 37:44total utility so that you can then
- 37:46manipulate it in this non linear way.
- 37:48You also see in prospect theory
- 37:50for instance as well,
- 37:51if we have this nested what
- 37:53we sometimes call NC bar,
- 37:55that's the one where we just fix
- 37:57the value alpha and just apply the
- 37:58same value as you go down and down,
- 38:00then in some cases you can
- 38:02get excessive risk aversion.
- 38:03So in the random walk that we saw
- 38:05there and then again we we can
- 38:07still think about that at different
- 38:10values of alpha itself.
- 38:12We think that there's we're now
- 38:15worrying about indeterminacy
- 38:17between your prior expectation,
- 38:19for instance getting caught in
- 38:20the maze by a lava pit versus
- 38:22the degree of risk aversion.
- 38:24And those two work opposite to each other
- 38:26in terms of the in terms of PC bar.
- 38:29So you get caught.
- 38:30That increases your prior to the
- 38:32possibility of getting caught,
- 38:33but it also increases the value of alpha,
- 38:36makes you a little bit less risk averse.
- 38:38And so those two things are
- 38:39fighting with each other we think
- 38:41in the context of these mazes.
- 38:43And of course it would be interesting
- 38:44to look at ambiguity as well as risk.
- 38:45So here all I did talked about is
- 38:47cases where you know the probabilities
- 38:49are frankly expressed as subjects
- 38:50know exactly what the probability
- 38:52is of getting caught by the the,
- 38:54the sorry, they know exactly probably
- 38:55of having a lapse in terms of the
- 38:58the way that they move in the maze.
- 39:00They know the values of everything.
- 39:01We didn't make it ambiguous.
- 39:03But of course ambiguity as a sort
- 39:04of 2nd order probability also makes
- 39:06you gives you an extra aspect of
- 39:09probability that you don't know.
- 39:10And so then if you think about the law,
- 39:13so a tale of those properties you
- 39:15don't know that's a way of inducing
- 39:17ambiguity aversion because of the of
- 39:19the extra uncertainty that you have,
- 39:20the 2nd order uncertainty you
- 39:22have in those cases too.
- 39:24From a psychiatric point of view,
- 39:27you what you can see is a sort of an aspect
- 39:29of sort of pathological avoidance right here.
- 39:32The way you're evaluating what
- 39:33could be a relatively benign world
- 39:35is you're thinking about all the
- 39:36nasty things that can happen.
- 39:38That's what that's what what what is
- 39:40becomes really critically important.
- 39:41And then if you're living in
- 39:43a stochastic environment,
- 39:44which of course we we all do,
- 39:46then if you're really extremely risk averse,
- 39:48so alpha is really near to zero,
- 39:50then that's a route to indifference
- 39:52or helplessness.
- 39:53Because it doesn't matter what you try to do,
- 39:55you're always worried about the
- 39:56nastiest thing that can happen.
- 39:58So that makes life super complicated.
- 40:02OK, so that's online behaviour.
- 40:04So, so here we think about planning.
- 40:06We won't imagine what are our subjects
- 40:08doing as they're thinking about how to move
- 40:10in that maze with the with the choices.
- 40:12So there we can do what as Phil
- 40:14mentioned at the beginning as sort
- 40:16of forms of something a bit like say
- 40:18model based reinforcement learning
- 40:19where we have a model of the world
- 40:21and we're planning in that model.
- 40:22We're thinking about the risk that
- 40:24accumulates along these paths and
- 40:26changing these values of alpha as we go.
- 40:28But there's a lot of interest at the
- 40:30moment in also thinking about offline
- 40:32processing that can happen during periods of,
- 40:34for instance,
- 40:35quiet wakefulness or sleep in animals.
- 40:37Also into in into trial intervals in in
- 40:40humans that we've been looking at too.
- 40:42And so the idea has been that
- 40:45there's a coordinate,
- 40:46that there's hippocampal and cortical
- 40:48replay which themselves are coordinated,
- 40:50which can be used to do
- 40:52aspects of offline planning.
- 40:54Which is to say that we normally
- 40:56think about a model of the world
- 40:58that's like a generative model
- 41:00of the of the environment.
- 41:01The inverse of that model is a policy.
- 41:04It's like what should I do in the
- 41:06environment in order to optimise
- 41:08my my return or optimise my C
- 41:10bar return? And so in that case,
- 41:11the inverse of the model is something
- 41:13you can calculate offline when you're
- 41:15not having to use the model to make your
- 41:17choices as it as it as it as it goes.
- 41:19And there's evidence in both rodents
- 41:21and also in humans in the last few
- 41:25years using typically using Meg that
- 41:28subjects are actually engaging in
- 41:30offline processing which actually
- 41:31has an impact on their behaviour
- 41:33when it happens in the future.
- 41:35So in the reinforcement learning world,
- 41:37this has been closely associated with
- 41:39an idea from Rich Sutton in the 90s
- 41:41called Dyna where he thought about
- 41:43offline processing this replay like
- 41:45processing to enable exploration and
- 41:47then got embedded in in in the sort of
- 41:50forms advanced forms of reinforcement
- 41:51learning for for in AI in replay
- 41:54buffers for things like the DQN.
- 41:56So deep Q learning the networks that for
- 42:00instance DeepMind used very successfully
- 42:02for things like Alphago to win it go.
- 42:05And then slightly more recently,
- 42:06there's a lovely paper from Marcelo
- 42:09Mata and Nathaniel Door which was
- 42:12was speculating that that the replay
- 42:14that we see in rodents might be
- 42:16optimised to improve the the way
- 42:18that that these rodents are planning
- 42:20in the in the environment.
- 42:22So given that they discover something
- 42:23about the world they discover like
- 42:25a reward they didn't know about or
- 42:27maybe they've forgotten then then
- 42:28they have to do some relearning.
- 42:30Then what Matter and Dole suggested is
- 42:33that the sequence of which the animal
- 42:36engages in replay well is informative,
- 42:38is chosen in order to optimize the way
- 42:40that the animals will then subsequently
- 42:42move through the world using a
- 42:44simpler way of making doing planning.
- 42:46And they pointed out that that you should
- 42:48choose to make updates to your model
- 42:50based on the product of 2 quantities,
- 42:52gain and need.
- 42:53So gain is if you were to do a replay
- 42:56at a particular location in the main,
- 42:58maybe somewhere where you're not
- 43:00you have this motion of distal
- 43:01replay near the Campbell world.
- 43:03Then the game is how much you would
- 43:06change your policy if you made an update.
- 43:08So there's no point in making an update.
- 43:10It is not going to change your
- 43:12actions because it will have no
- 43:14impact on your final return and
- 43:15the need is how frequently you're
- 43:17going to visit that state in the
- 43:19future given your current policy.
- 43:21So it turns out the product of those two
- 43:23governs the sequencing that you should
- 43:25apply to looking at states in the world.
- 43:27And so if you think about,
- 43:28you know you discover something,
- 43:29how should you go about planning
- 43:31using during this offline,
- 43:32during these offline cases.
- 43:34So we thought about, well,
- 43:37what does optimal planning
- 43:38look like for Seva?
- 43:40You have if you're risk risk averse.
- 43:42So here,
- 43:44excuse me,
- 43:44we're showing again another simple
- 43:46domain where you have a start state.
- 43:47There's just a single word
- 43:49at this location here and
- 43:50there's one of these
- 43:51lava pits at the at here.
- 43:53But what these numbers show is if all
- 43:55you know about is where you start,
- 43:57you have a model of the world,
- 43:59but you don't and you know about the
- 44:00law of a pit and the and the reward,
- 44:02but you don't know how to plan.
- 44:03You haven't got a plan of what to do.
- 44:04We're thinking of the replay
- 44:06in matter and door world.
- 44:07The replay is constructing that plan
- 44:10for you by by by essentially focusing
- 44:12on a state in the world and then
- 44:15doing a little little Bellman update.
- 44:17Just one step of reinforcement learning
- 44:19and the steps the the order of the
- 44:22steps is shown by these numbers.
- 44:23So it turns out that if you prioritise
- 44:26based on on being risk neutral and
- 44:29what I mean by prioritisation here is
- 44:31you're thinking about what planning
- 44:33should I do that has the most effect on
- 44:35the value of the start state because
- 44:37that's the value where you're you're
- 44:39where you're where you're beginning.
- 44:41So it turns out that in the if you
- 44:44prioritise based on this neutrality you
- 44:46for some reason you do one step at the
- 44:49this location away from the lava pit
- 44:52and then all the subsequent steps you do,
- 44:54in this case the subsequent 7 steps or
- 44:57seven six steps essentially plan in
- 44:59this instance backwards from the goal
- 45:01from the reward back to the beginning.
- 45:03And this notion about backward sequencing
- 45:06like reverse replay in in the in the
- 45:09in the hippocampal world is also seen
- 45:11in something called Prioritised sweeping,
- 45:13which is an old idea in Reinforcement
- 45:16Learning from Andrew Moore where
- 45:18you'd optimise the the sequence of of
- 45:21updates you would do if you prioritise
- 45:23instead based on a value of alpha,
- 45:25which is much lower,
- 45:26so much more risk averse.
- 45:28Now you can see that you spend
- 45:30all your planning time instead of
- 45:32planning how to get to the reward.
- 45:34You spend all your planning time
- 45:36thinking about the about the lava pit,
- 45:38thinking about where you can.
- 45:39You know how to avoid the lava
- 45:40pit if you were there,
- 45:41so the first is the same one,
- 45:43but then all the subsequent ones
- 45:44are all avoiding the lava pit and
- 45:46have nothing to do with getting to
- 45:48the reward So you can see how you're
- 45:50even the structure of of thinking
- 45:52offline is going to be really could
- 45:54could get really dominated by the
- 45:56by these nasty things that could by
- 45:58the nasty things that could happen.
- 45:59And if alpha equals 0,
- 46:01there's no point in doing planning
- 46:02at all because you can't mitigate
- 46:04the child the the risk of getting
- 46:05to the log pit as well.
- 46:06So you just sit there and do
- 46:08you just can't help yourself.
- 46:11So as I mentioned,
- 46:12this is not only for humans.
- 46:14So there's a lovely study that
- 46:15comes from the from Mitsuko Wataba,
- 46:18Yushida's Yushida's lab,
- 46:19where she's a very simple task
- 46:22for for for mice.
- 46:24So here she had a simple arena,
- 46:27just an open like an open
- 46:29field arena shown here.
- 46:30And then the mice were put
- 46:32in for a couple of days.
- 46:33There's nothing there.
- 46:34They had 25 minutes for a
- 46:36session just to run around.
- 46:37And here's some here's a path of a,
- 46:39of a, of a one of the mice just
- 46:41running around this this maze.
- 46:42Then on the third day after this habituation,
- 46:45Mitsuko put in a novel object,
- 46:48just basically a bunch of Lego
- 46:49blogs near to one corner of
- 46:52the of the environment and then
- 46:54monitored how the animals,
- 46:55then what what the animals then
- 46:56did over the subsequent days,
- 46:58so subsequent 4 days with
- 46:59this same novel object in the
- 47:01same location of the maze.
- 47:03And you can see even just eyeballing
- 47:05the the trajectories that the
- 47:07animal have this really interesting
- 47:10mix of essentially neophobia and
- 47:12neophilia and neophobia is much
- 47:13more much more apparent here.
- 47:15So it changes really the structure
- 47:16of the of the of the movement
- 47:19through the environment.
- 47:20So for various reasons,
- 47:21Mitsuko characterized being within
- 47:237 centimetres of the object as being
- 47:25sort of a critical distance as where
- 47:27the animal is is sort of inspecting this,
- 47:30is inspecting this object.
- 47:31And then what what she's showing
- 47:33here is how much per minute of
- 47:35these 25 minutes in each of these
- 47:37sessions does the animals spend
- 47:39within 7 centimetres of the object.
- 47:41So in the habituation days is just
- 47:42within 7 centimetres of that circle.
- 47:44That's this circle shown here.
- 47:45And you see that that, you know,
- 47:47the animals spent some time there.
- 47:48But there's nothing,
- 47:49there's nothing failing those locations here.
- 47:51When she puts in the novel object,
- 47:54you can see that then that
- 47:55really dramatically changes the
- 47:56structure of behaviour.
- 47:57And here she's ordered the animals that
- 48:00like 26 animals by the amount of total
- 48:02time they spend near the near the object.
- 48:04So these animals,
- 48:05these early animals spend a sit barely
- 48:08anytime near the object at all.
- 48:09These animals which are late here,
- 48:12they spend much more time near to the
- 48:14object than the than the first ones do.
- 48:16And so there's a sense in which
- 48:17these are very risk averse animals.
- 48:19They had what we would think of
- 48:21as being this low value of alpha,
- 48:22whereas these animals are much more,
- 48:24much less risk averse,
- 48:25They're much more willing to go
- 48:27get close to the to the object.
- 48:29And so you can see that the way that
- 48:31they approach the object is also changes.
- 48:33So here you can see that in the
- 48:34first day of the object they
- 48:36what she's done is used.
- 48:37They use deep lab cut from the mathesis
- 48:40to classify whether the animal has
- 48:42his nose pointing to the object
- 48:43or the tail point of the object.
- 48:45You see in the early days the animal only
- 48:47has what they call cautious approach,
- 48:49so only approaches the object with
- 48:51its nose in front and its tail behind.
- 48:54Then over time the animals are then
- 48:55more willing or some of the animals
- 48:56are more willing to just engage the
- 48:58object that they're not protecting
- 48:59their tail in this particular way.
- 49:01Very appropriate for tail
- 49:02risk as you can imagine.
- 49:04So if we look at the frequency of approach,
- 49:07so frequency per minute of
- 49:09approach with the tail behind,
- 49:11you can see that the that the
- 49:14all the animals are here.
- 49:15Again this is set up segmented
- 49:17into these sessions.
- 49:18So all the animals start
- 49:19off with their tail behind.
- 49:20So this is this cautious approach and then
- 49:23again using the same sort of the animals,
- 49:25so the same sorting between one and 26.
- 49:27You can see that the animals who are timid,
- 49:29who don't approach the object
- 49:30they are or barely approach to
- 49:31spend any time near to the object.
- 49:33They also never risk their tail.
- 49:35So their tail is always but is always
- 49:37they they they're spending no time
- 49:38with their tail exposed whereas the
- 49:40brave animals these ones down at the
- 49:42bottom they not only spend more time
- 49:44near the object they also do it with
- 49:46their their tail exposed in this way.
- 49:48But we were very struck by this huge
- 49:50individual differences in the in
- 49:51the in the way that these animals
- 49:53approach the object and so we're
- 49:55interested in in in modelling that
- 49:56so at Kitty Egal they they they
- 49:59characterize various aspects of the
- 50:01of the behaviour so the fraction of
- 50:04time they're close to the object.
- 50:06I showed you that already here showing
- 50:07with confident and cautious approach.
- 50:09So cautious in green, confident in blue.
- 50:12And again you can see with their
- 50:13sort of the animals that there's
- 50:14only green at the top when there's
- 50:16some blue at the bottom.
- 50:17And this is only showing the days.
- 50:18Since the only showing the days
- 50:20off the object has been evaluated.
- 50:23You can look at the how long they
- 50:24spend near the object and again you
- 50:26can see that that's shown again
- 50:28shown by this colour.
- 50:29So the brave ones spend a lot of time,
- 50:30the the timid ones spend very little
- 50:32time and how frequently they visit
- 50:34the object, they they go there.
- 50:36And again the brave ones visit frequently
- 50:38the the timid ones are barely visited at all.
- 50:42So it goes a model of this,
- 50:43but I'm not going to,
- 50:44I haven't got time to go through
- 50:45all the details of the model,
- 50:46but just to just to give you the,
- 50:47the, the hint of what's inside it.
- 50:49So why do they visit the object at all?
- 50:51Well, that's Neophilia.
- 50:52They're interested.
- 50:52There's an exploration bonus we imagine
- 50:54which is associated with that and we
- 50:56imagine that this exploration bonus
- 50:58replenishes as if they don't know,
- 50:59they don't know that the object is not,
- 51:01is not,
- 51:01is not never actually gives them
- 51:03a real return, right.
- 51:05The object is just a bunch of Lego blocks.
- 51:06There's no food or anything
- 51:08positive associated with it
- 51:09and we imagine that when the animals
- 51:12have due confidence approach they
- 51:13they can stay enjoy more than
- 51:15they consume the reward faster.
- 51:17Then we have a hazard function.
- 51:19Why are they neo phobic?
- 51:20Well they're why that maybe at some
- 51:22point a predator or something is
- 51:24going to jump out from this object
- 51:26or something naughty might happen
- 51:27and we imagine that that increases
- 51:29over time spent near the object.
- 51:31So the longer they spend near the object,
- 51:33the more that they're worried
- 51:35about predation.
- 51:35And then that we imagine that that then
- 51:37resets when they move away from the object.
- 51:39And we imagine that it's less
- 51:41dangerous when they do cautious
- 51:42approach than confident approaches
- 51:43of why they want to approach in
- 51:46this cautious way in the 1st place.
- 51:47And we critical to this is that
- 51:50the uncertainty about that,
- 51:51about their about whether there's
- 51:52a predator or not only will reduce
- 51:54if they actually visit the object.
- 51:56If they don't visit the object
- 51:57or don't spend time there,
- 51:58they're not going to find out that
- 51:59in fact the object is completely
- 52:01benign and never hurts them.
- 52:02And so we have this nice parcel,
- 52:04this important path dependence whereby
- 52:06the timid animals don't visit for long,
- 52:08they don't find out the object is
- 52:10safe and therefore they they carry
- 52:11on not visiting for long because
- 52:13they haven't found out this,
- 52:14this safety itself.
- 52:15And then we have this risk of
- 52:17aversion 2 and then when we then
- 52:20build a model of their behaviour.
- 52:22So here I just characterised that
- 52:23sort of abstracted away from
- 52:25the animal data themselves.
- 52:26You can see we sort of capture
- 52:27the sort of the, the,
- 52:28the general trends in the animal in the,
- 52:30in the, in the.
- 52:31With this abstraction you can
- 52:32see we do a really good job.
- 52:33We have quite a lot of parameters I must say.
- 52:35We can do a really good job of
- 52:37fitting their data by essentially
- 52:38synergising the amount by which
- 52:40they're to which they're risk averse,
- 52:42this PC bar mechanism and also
- 52:44the amount by which to which they
- 52:47are with their prior over what
- 52:49the object is like and that prior
- 52:51is not not influenced enough.
- 52:53If they don't visit the object,
- 52:54they don't disturb the object.
- 52:55It's it's safe in the way that I described.
- 52:58OK.
- 52:59So because I'm running out of time,
- 53:00let me just go to the general discussion
- 53:03that's really discussion about that.
- 53:05So just to sum up then on this risk aversion,
- 53:08I think we can,
- 53:09it's nice to think from a sort of
- 53:11computational psychiatric point of
- 53:12view about the things that the thing,
- 53:14the way that evaluation happens in
- 53:16the context of this risk aversion.
- 53:17So you think of sort of people who
- 53:20are highly risk averse in some sense.
- 53:21Maybe they're solving a different
- 53:23problem from others.
- 53:24And so here we've shown that
- 53:26you that optimally,
- 53:27if you have a really low value
- 53:28of alpha or in some context
- 53:30this this nested C bar,
- 53:31NC bar, then you'll see this
- 53:34dysfunctional avoidance.
- 53:34And also this rumination process
- 53:36in the sense that you'll keep on
- 53:37worrying about all the nasty things
- 53:39that can happen if alpha is near 0.
- 53:40You have action,
- 53:41indifference and helplessness,
- 53:42and that's the correct answer.
- 53:44That's the right thing to do.
- 53:45If your value of alpha is so low
- 53:47and you live in a stochastic world,
- 53:49how much rumination you should do?
- 53:51There's some sort of threshold.
- 53:52How much planning you want to to to do,
- 53:54how much improvement you need to have is
- 53:56something which again is under your control.
- 53:58Maybe you want to really squeeze
- 54:00out all possibilities.
- 54:01Then you're going to have to do an
- 54:02awful lot of rumination to worry
- 54:04about all the really low probability
- 54:05outcomes that can happen.
- 54:07And then for humans we have this problem
- 54:08that we live in a very complicated world.
- 54:10We can always imagine another
- 54:12catastrophe around the corner.
- 54:13If you pay a lot of attention
- 54:15to low probability outcomes,
- 54:16then we can always invent nasty low
- 54:19probability outcomes that will cause
- 54:20you to to to to to have problems.
- 54:22And then as then in the
- 54:23case of the the rodents,
- 54:24we can see there's an effect on this
- 54:27exploration exploitation trade off
- 54:28in the sense that the animals that
- 54:29don't explore can't find out about
- 54:31safety and therefore they can never,
- 54:32they will never be able to to to
- 54:35essentially treat the object in its
- 54:37natural way in terms of to another
- 54:39source of problems and risk in terms
- 54:41of evaluation is that maybe when
- 54:43we're thinking about this rumination,
- 54:45we think maybe there's some subjects
- 54:47who try to do this ruminative planning,
- 54:50they try to think, well, OK,
- 54:51if I'm at the native object,
- 54:52here's what I would do to go away from it.
- 54:54But it's so aggressive to think about it.
- 54:56They will never consummate that planning.
- 54:58They never stop doing that
- 54:59planning in this way.
- 55:00And so that's an idea that Quentin
- 55:02Hughes and I worked on a long,
- 55:04long time ago was that this,
- 55:05this is a sort of internal behavioural
- 55:08inhibition associated with a,
- 55:10with a thought,
- 55:10if you like,
- 55:11about a piece of planning.
- 55:12So maybe that leads you never
- 55:13to consummate the planning,
- 55:14which means you have to do
- 55:15it again and again and again.
- 55:17So again leading to a sort
- 55:18of rumination itself,
- 55:19you can imagine that you don't
- 55:21adjust for luck appropriately.
- 55:22So if you're unlucky you don't
- 55:24think that I'm now,
- 55:25I can now afford to be a bit more
- 55:27risk avert risk neutral again.
- 55:28So again you'll then have more
- 55:30negative evaluation,
- 55:31you should have itself and then in terms
- 55:33of the the maybe the environment you have,
- 55:36the way that you're evaluating risk is
- 55:38not appropriate the environment you have.
- 55:39I think one nice way to think
- 55:41about that is in terms of over
- 55:43generalizing representations.
- 55:43So with something again you see in
- 55:45depression which is I've shown you
- 55:46that this sort of infects states so
- 55:48if you think that something nasty
- 55:49might happen then the value of that
- 55:51state gets associated with the nastiest
- 55:53thing that can possibly happen.
- 55:54So if you over generalize
- 55:56your representations,
- 55:56you're putting nice States and
- 55:58nasty states together and therefore
- 56:00the value of the nasty states over
- 56:02infects the values of the nice
- 56:03states you could possibly have.
- 56:04So lots of things to investigate in
- 56:06in in in risk in the in the future
- 56:08using hopefully these different
- 56:10aspects of sequential evaluation.
- 56:11So thank you very much.