Saige Rutherford “Value-based Machine Learning: Optimizing for Utility Over Intelligence”
March 07, 2023Information
- ID
- 9609
- To Cite
- DCA Citation Guide
Transcript
- 00:06Sorry, next speaker is sage
- 00:07Rutherford from Donders.
- 00:08She's going to talk about value based
- 00:11machine learning and she's got a little
- 00:13bit of negativity in her talk too.
- 00:19OK, can people hear me?
- 00:23Please.
- 00:27OK. Good morning.
- 00:28I'm excited to be here talking to you
- 00:31guys about a similar but slightly,
- 00:34you know, different topic.
- 00:35I'd take a little bit more of a
- 00:38philosophical big picture view.
- 00:42So today I'm going to
- 00:43introduce a couple things.
- 00:44I hope to go over some of the
- 00:47goals and values of our field.
- 00:49I want to talk about kind of the
- 00:51current state of the field and use my
- 00:52own journey in the field. As an example,
- 00:54I want to go over a few definitions.
- 00:57I'm using words that have a
- 00:58lot of different definitions,
- 00:59things like optimization,
- 01:01accuracy, and utility.
- 01:02So I want to make sure
- 01:03we're on the same page,
- 01:04then want to dive into what I mean
- 01:06by intelligence and accuracy.
- 01:07I want to talk about how we're measuring it,
- 01:09how we optimize for it,
- 01:11and what some of the limitations
- 01:12of this kind of.
- 01:13Framework that we've been focusing
- 01:14on are then dive into utility,
- 01:16talk about again how we measure it,
- 01:18how we might optimize for it and what
- 01:20I see is the benefits of thinking
- 01:22from this perspective and then talk
- 01:24a little bit about what's next.
- 01:26But I see some of the roadblocks,
- 01:28future directions and take home messages.
- 01:31So I'd like to begin with a warning.
- 01:33I already kind of mentioned this
- 01:34talk is going to be a little
- 01:36bit different than others.
- 01:37If you were reading it as a paper,
- 01:39it would probably be like an
- 01:40opinion piece or something.
- 01:41It's very philosophical,
- 01:42even though I come from like statistics
- 01:44and computer science the last year.
- 01:46So this is really been a thought
- 01:48experiment for me and I've really
- 01:50been thinking about things from
- 01:51a very philosophical perspective.
- 01:53The second warning is that it
- 01:55might come off as a little bit
- 01:57provocative and critical of the field.
- 01:59I hope to ground that in examples of my.
- 02:01Homework and I think as scientists
- 02:03we need to be critical of our own
- 02:05work so we will proceed with caution.
- 02:07The second note is I know that
- 02:08this is the Whistler Workshop on
- 02:10brain functional organization,
- 02:12connectivity and behavior.
- 02:13When I was thinking about all
- 02:15these ideas I was really thinking
- 02:16more from a machine learning,
- 02:18not necessarily the brain perspective.
- 02:20So when I use words like artificial
- 02:23intelligence, machine learning accuracy,
- 02:25you can actually just replace it with
- 02:27brain behavior predictive modeling.
- 02:29So you can think about using all of these.
- 02:31It's interchangeably.
- 02:32It's just a bit of a mouthful to
- 02:34keep saying your predictive modeling.
- 02:37OK.
- 02:37So to 1st talk about a little bit
- 02:39where we're at and the goals of our field.
- 02:42I think the goals are quite well defined.
- 02:45We have this you know maybe high dimensional,
- 02:47maybe low dimensional space,
- 02:48but we have axes representing
- 02:50things like biological measurements.
- 02:52This would include our neuroimaging data.
- 02:54We also have also have dimensions
- 02:56representing our behavioral tests,
- 02:58our self report measures,
- 03:00maybe environmental factors,
- 03:02lifestyle choices and our goal is
- 03:04really to learn functions that map.
- 03:07Between the space such that I think
- 03:09we hope to use one thing to predict another.
- 03:11Maybe we're using brain predict
- 03:13behavior or vice versa.
- 03:14So I think our goals have been for
- 03:17a few years fairly well defined.
- 03:19However, if you move to our values,
- 03:22I think maybe you know,
- 03:24we might think that we know them.
- 03:25But I think especially when it comes
- 03:27to prioritizing them and ranking
- 03:28what values are most important to us.
- 03:30This is where it's a bit more complicated.
- 03:32So these include things like validity,
- 03:34reliability, explain ability,
- 03:38fairness, accountability, usability,
- 03:41impact, and you might say, well,
- 03:44all of these things are important to me.
- 03:45But as you'll see throughout this talk,
- 03:47when you want to start optimizing
- 03:49for one of these.
- 03:50We really do need to rank them
- 03:51and have a priority and I think
- 03:53that ranking is a little bit less
- 03:54clear and it's really something
- 03:56we need to do to move us forward.
- 03:58So I hope to inspire you with
- 04:00that in this talk.
- 04:01I just want to cover a little
- 04:03bit of my journey in this field
- 04:04and how I got to this topic.
- 04:06I worked at the data scientist at the
- 04:08University of Michigan for five years.
- 04:09I was really working on a lot of the
- 04:11models Simon talked about like brain age,
- 04:13predicting cognition.
- 04:13I was here at Western 2018 giving
- 04:16a talk called the developmental
- 04:17mega sample where my message was
- 04:20really we just need more big data.
- 04:22If we combine all of these samples
- 04:23and work with this low dimensional
- 04:25brain basis set that will answer
- 04:26all of our questions and sent.
- 04:28You use working on that.
- 04:30I then learned that our models
- 04:31look a lot like this,
- 04:32what I call the spaghetti plot of the brain,
- 04:34and we're not really learning anything.
- 04:36We're not even predicting things that well.
- 04:37So there's a problem there.
- 04:39I came back to Whistler in 2020,
- 04:41was on the deep learning,
- 04:42bad thought, OK,
- 04:43deep learning is the answer maybe you know?
- 04:46But it turned out to really just
- 04:47be a more complicated method
- 04:48to answer the same question.
- 04:50I then moved to the other
- 04:51ones before I started my PhD,
- 04:53and I've really been working on individual
- 04:55differences and training large models,
- 04:57transferring,
- 04:57transferring them to clinical data.
- 04:59So that's these models started to sort
- 05:01of be better than spaghetti plots.
- 05:04There was a bit more careful
- 05:05modeling of uncertainty and things.
- 05:07But here I am in 2023 talking
- 05:10about value based machine learning
- 05:11and that's because there's still
- 05:12just I felt like something missing
- 05:14in the models that I was using.
- 05:16Maybe I'm learning a bit better of a model,
- 05:17but there's still something like
- 05:19missing that I care about in the
- 05:21work that is not being captured.
- 05:22And that's really over the last
- 05:24year I've been thinking about
- 05:26this value based consumer.
- 05:27And this is sort of represented
- 05:29in my journey.
- 05:29But just to make it a bit more general,
- 05:31I think if you embark on a brain
- 05:34behavior predictive modeling analysis,
- 05:35this is kind of the journey
- 05:37that you would take.
- 05:38You know, as a PhD student or data analyst,
- 05:41you begin by combining a bunch of
- 05:43different open datasets from all these
- 05:45amazing resources that we've shared.
- 05:47However, once you put them together,
- 05:49you realize there's not a
- 05:51lot of phenotypic overlap.
- 05:52Maybe if you're lucky you have age,
- 05:53sex, and cognition.
- 05:54You still go on to fit a bunch of different.
- 05:57Models ranging from simple things
- 05:59like linear regression to more
- 06:01complex things like deep learning.
- 06:03You don't realize that there's
- 06:04not a lot of signal in the data.
- 06:06You can barely predict H you know,
- 06:08maybe within three to five years over.
- 06:10You still are scientists and have to
- 06:12go on to publish your results anyways,
- 06:14taking kind of two paths.
- 06:15One might be being slightly optimistic,
- 06:18you know, slightly overselling
- 06:20the interpretation of potential,
- 06:21or being a bit more honest,
- 06:23like Simon sharing your honesty point.
- 06:25Maybe using MRI doesn't help that much,
- 06:28but however,
- 06:28you have trouble finding a journal
- 06:30that will publish this perspective.
- 06:32You then repeat or you leave.
- 06:33Um, for data science industry or field
- 06:36or machine learning can have more impact.
- 06:39You know, psychiatrists are fields,
- 06:40so I'm not giving up on it yet,
- 06:42but the cycle repeats again.
- 06:45If you're reading a brain behavior
- 06:46predictive modeling paper,
- 06:47you could probably any paper
- 06:49fill out this thing,
- 06:50go part and it would go something
- 06:53like this fluid intelligence.
- 06:55Clinical potential one day.
- 06:57Reference to the America All Nature paper,
- 07:00even though they don't do predictions,
- 07:01just unitary association.
- 07:06We need a bigger sample size.
- 07:09Using the HP AC or UK bio bank data.
- 07:14Correlation .28 between predicted and
- 07:17observed for reliability, grain age.
- 07:20No compound correction could be motion.
- 07:23These are, you know, Simon,
- 07:24you really illustrated this.
- 07:25And so over the last few slides that I've
- 07:27just wanted to point out that there's
- 07:29not like we're doing things wrong.
- 07:30It's just we've sort of been stuck at a
- 07:32field and we have this great question
- 07:34of wanting to relate brain to behavior,
- 07:36but we're sort of at the standstill.
- 07:38We're making these very tiny
- 07:39little improvements.
- 07:40And I think it's because we've
- 07:42really been focusing on the wrong
- 07:43thing and that's trying to get
- 07:44more and more accurate models.
- 07:46And then yes,
- 07:47of course that's important and useful,
- 07:48but it's not the only thing we
- 07:49care about and we need to consider.
- 07:51Other values that we have in this work.
- 07:53So that really led me into thinking about
- 07:55what I call value based machine learning.
- 07:57I think most of us probably
- 07:58know what machine learning is.
- 07:59So I'll just explain the value
- 08:00based part a little bit more.
- 08:02I discovered this when looking at
- 08:04different models of healthcare.
- 08:05So in the US and I think in
- 08:07other parts of the world,
- 08:08we were kind of implementing this model
- 08:10of healthcare that was called fee for
- 08:13service and that model just optimized
- 08:15to lower cost and this led to things
- 08:17like patients receiving worse care,
- 08:19doctors were spending less
- 08:20and less time being forced to
- 08:21see more patients in one day.
- 08:23They then were like this isn't what
- 08:25we want and move to this value
- 08:26based healthcare and that model
- 08:28you know considered cost because
- 08:29that's obviously a factor but they
- 08:31also optimized to improve patient
- 08:33outcomes and that became just a
- 08:35much better model of healthcare.
- 08:37So I've really looked to other
- 08:39fields for inspiration and I just
- 08:41want to go over a few other kind of
- 08:42paradigm shifts have been have been
- 08:44inspiring me and this to me I just
- 08:46mentioned the fee for healthcare
- 08:48service that moving to the value based
- 08:50healthcare also in psychiatry we've
- 08:51kind of not that we've completely.
- 08:53They entered the DSM,
- 08:54but we've moved from the DSM,
- 08:56which is categorical and binary,
- 08:58or it's working within the ER DOC,
- 09:00at least in research,
- 09:01which is more dimensional and continuous.
- 09:03If you think about definitions of health,
- 09:05we've moved away from the
- 09:07biomedical definition of health,
- 09:08which really just focused on lack of illness.
- 09:11Or if we didn't have any data on you,
- 09:13meaning you didn't go to the hospital
- 09:14or you didn't go to the doctor's office,
- 09:16that meant that you were healthy,
- 09:17which wasn't a great measure
- 09:19because especially for different
- 09:20demographic groups that, you know,
- 09:21didn't represent health well.
- 09:23So we moved away.
- 09:24And that towards the biopsychosocial,
- 09:25which again focused more on
- 09:27functioning like set section,
- 09:29realizing that that was a
- 09:31better decision definition help.
- 09:32Now in machine learning,
- 09:33we've really been stuck at this
- 09:35accuracy focus which has led to
- 09:36us this tunnel vision focus.
- 09:38And I'm going to argue that we
- 09:39need to shift towards utility,
- 09:41which is really just a much bigger holistic
- 09:43perspective of why are we doing this,
- 09:45what do we need to be doing differently.
- 09:48I guess why we need a paradigm shift,
- 09:51I think historically we've really like I
- 09:53mentioned in the current status slide,
- 09:55we've really overpromised solutions
- 09:57and underperformed and bringing
- 09:59these solutions into to reality
- 10:01even within the scientific field.
- 10:03And I think if we shift our priorities
- 10:05away from accuracy towards utility,
- 10:07it's going to allow us to make
- 10:09our goals more concrete and then
- 10:11you know we'll better communicate
- 10:13our results and where we're at.
- 10:15So I just want to move on
- 10:18to defining optimization.
- 10:19Most of you are probably familiar,
- 10:20but this is kind of the step when
- 10:22we're training machine learning,
- 10:23we're going to have some type of
- 10:25function and we're going to minimize
- 10:26that or maximize that function.
- 10:28Maybe that's the mean squared error or
- 10:30accuracy and this is kind of just the
- 10:32step where we tell our models what is
- 10:34the right and wrong direction to be heading.
- 10:37Within the machine learning lifecycle,
- 10:40optimization comes up at
- 10:41a few different points.
- 10:43And we kind of begin with defining
- 10:44our objectives.
- 10:45That's thinking about like what are our
- 10:46targets, what are we trying to predict.
- 10:48We then go through a phase of acquiring data,
- 10:50preparing it.
- 10:51We then move into training the model,
- 10:54testing the model.
- 10:55And maybe this is not so much in science,
- 10:57but if you're an industry,
- 10:58you move on to a phase of of monitoring it,
- 11:01deploying it, making it accessible to people.
- 11:04Now optimization comes up.
- 11:05Mostly we think about it.
- 11:07In this stage, you know,
- 11:08like when we have lost function,
- 11:10we're training and testing the model
- 11:11and maybe when we're deploying
- 11:13it to make sure it's at least
- 11:15working properly over time.
- 11:16But I think the optimization
- 11:17actually comes up most places
- 11:19throughout this life cycle,
- 11:20even though we don't think about it there.
- 11:22And I would argue that actually
- 11:24the most important place to think
- 11:25about it is that this initial phase
- 11:27when we're defining what we're
- 11:28doing and what our objectives are,
- 11:30and this is the place that we'll
- 11:32be able to input what our values
- 11:34are and what we care most about.
- 11:36So moving on to what accuracy is,
- 11:39it has a lot of, you know,
- 11:41obvious mathematical definitions
- 11:41that I really think about it in this
- 11:44quest for super high performance.
- 11:45And in this quest for super high performance,
- 11:47it has a very narrow objective of
- 11:49becoming more and more accurate
- 11:50and has a very immediate or short
- 11:52short term action plan for how
- 11:54to achieve this goal,
- 11:55and that's to minimizing the loss
- 11:58function on a particular set of data.
- 12:00If you think about utility,
- 12:02however,
- 12:02utility I think is more closely
- 12:04aligned with the model's purpose
- 12:06and that's answering the research
- 12:08question and adding
- 12:10more real world value.
- 12:11It's going to look at the bigger picture
- 12:13and make very creative adjustments
- 12:15to align with the ultimate research
- 12:17goal and real world application,
- 12:19and I hope to provide some
- 12:21concrete examples of this.
- 12:22So bringing all of these different
- 12:24things together, I really feel it's
- 12:26an opportunity to sort of pause,
- 12:28reframe our research questions
- 12:29and ask why we're doing this,
- 12:31where we want to be and how we can
- 12:33get there and and sort of create
- 12:36some criteria for for that journey.
- 12:38You know, in the process of setting
- 12:39up some of our optimization problems,
- 12:41I think we've all been so excited.
- 12:43Like this is so cool,
- 12:44we protect the brain from behavior,
- 12:45but we've convinced ourselves
- 12:46that really makes sense,
- 12:47just optimize for accuracy.
- 12:49But that's just because it's more
- 12:50easily mathematically formulated.
- 12:52The utility.
- 12:53And if you consider the bigger picture again,
- 12:56you think about you know our our
- 12:57goal isn't to become infinitely
- 12:59more accurate or intelligent.
- 13:01Really,
- 13:01our goal is to do useful things
- 13:03that makes life easier for humans,
- 13:04and that would be closely
- 13:07aligned with utility.
- 13:08OK,
- 13:08so I just want to dive a little bit more
- 13:10into accuracy and some of the limitations.
- 13:13But first I have to
- 13:14mention how we measure it.
- 13:15I think it's become clear,
- 13:16but it's really just a single
- 13:18metric that represents the models
- 13:19ability to predict some kind of
- 13:21observation and a test set of data.
- 13:23If you're working in a
- 13:24classification setting,
- 13:25this might actually be accuracy.
- 13:27If you're in a regression setting,
- 13:28this might be things like mean
- 13:30squared error or the correlation
- 13:32between predicted and observed.
- 13:34And this is an extreme simplification again
- 13:36of the model's performance and traits.
- 13:38And it really doesn't capture
- 13:39any other values.
- 13:40Things like reliability, validity,
- 13:43complexity, fairness, usability.
- 13:47We can also think of it again
- 13:50in this loss function setting.
- 13:51It's one to point out and kind of the
- 13:53same one metric that we're using in
- 13:55our training and testing to say like
- 13:57this has been not the perfect model,
- 13:58but then it's the same metric we
- 14:00use in this out of sample test set
- 14:03to determine the generalizability.
- 14:05We can also think of accuracy
- 14:07in terms of benchmarking.
- 14:09This is a little bit more abstract.
- 14:10It's when we go to compare one
- 14:12paper or one model, you know,
- 14:13my lab versus your lab,
- 14:15I have a better model than yours.
- 14:16And it's this, you know,
- 14:17comparison of 1 model to another.
- 14:19And we're searching for the very
- 14:21best model and that's determined by
- 14:23being the most accurate or having,
- 14:25you know,
- 14:26the best performance on this
- 14:28one single metric.
- 14:29And this kind of contributes to this.
- 14:31What I see is this very slow progress,
- 14:33just, you know,
- 14:34most models aren't even statistically
- 14:36significantly improving.
- 14:37Accuracy, it's like, you know,
- 14:39a 1% increase of things.
- 14:41So it's kind of led us to this.
- 14:42We feel like we're making progress,
- 14:44but it's really slow and we're
- 14:45kind of actually not
- 14:46true, really making any progress.
- 14:48We're just sort of inching along.
- 14:50So I think some of the limitations of
- 14:52accuracy have already become considered,
- 14:54but I'll just go over to them a little bit.
- 14:57I think there is not a definition of success.
- 15:00And if we don't have a definition of success,
- 15:02you know our goal isn't to become
- 15:04infinitely more accurate or intelligent
- 15:05and without this clear definition
- 15:07of our goals and vision of success,
- 15:09probably ever know when we reach it.
- 15:11What does it even mean to become
- 15:14infinitely more intelligent or accurate?
- 15:15What purpose would it serve to
- 15:17be in a world full of agents,
- 15:19machines and humans that are super
- 15:21intelligent and just to throw out
- 15:23good hearts law that says when a
- 15:24measure becomes a target it ceases
- 15:26to be a good measure measure?
- 15:30Sorry for those of you who don't know,
- 15:31headless.
- 15:32So it's one of my favorite TV
- 15:33shows and I just wanted to,
- 15:34I think you can understand this example
- 15:37because it's really just about soccer
- 15:39and this is just a very abstract
- 15:41definition of of limitations of accuracy.
- 15:43If you consider the goals of the soccer
- 15:45team that's playing in the Premier League,
- 15:46their long term goal is to really,
- 15:48you know,
- 15:48keep winning and have healthy players that
- 15:50are getting along well with each other.
- 15:52And if you consider one player that's
- 15:54like star player who's really just
- 15:56thinking about their abilities,
- 15:57their performance.
- 15:58That's Jamie Tart and from from Ted Lasso.
- 16:02It might be good in the short term,
- 16:04like they're going to win the game,
- 16:05but every other player on the
- 16:06team's going to hate them.
- 16:07It's going to kind of contribute to this
- 16:10bit toxic culture versus the team captain,
- 16:13right?
- 16:13Ken,
- 16:14who's not just considering their own needs
- 16:16and is a bit more broadly focused on.
- 16:18They're really better for the overall
- 16:20goal of the long term of of sustainability.
- 16:23And a few more lastly,
- 16:25concrete examples of why accuracy
- 16:27hasn't been working for us.
- 16:29We've obviously been chasing
- 16:31hierarchy high accuracy,
- 16:32but having high accuracy does not
- 16:34imply that we have reproducibility,
- 16:37that our models are meaningful,
- 16:38meaning that the feature we're
- 16:40using are better than random.
- 16:41It doesn't come with any explain ability,
- 16:44and having equal accuracy does not imply
- 16:46that two models learned in the same way.
- 16:52OK. So away from accuracy towards utility,
- 16:56which I think is where we need to be
- 16:58before I talk about measuring utility,
- 17:01it's really important to say that
- 17:03it's before you can measure utility,
- 17:05you really have to look at yourself
- 17:07and say what are our core values?
- 17:08And these are the values that I
- 17:10mentioned earlier and not only
- 17:12identify what the values are,
- 17:13but rank them in priority,
- 17:15what's the most important to you?
- 17:17And I also just want to point
- 17:18out that this is not going to be
- 17:20something that's consistent across
- 17:21all research questions, all models.
- 17:23It's very complex,
- 17:24dependent and even within your own
- 17:26work it might change depending on day,
- 17:29week or paper.
- 17:30So it's a very iterative process.
- 17:34For the purposes of measuring utility,
- 17:36we haven't really been doing
- 17:37it very much in our own field.
- 17:38So I sort of looked to a couple of other
- 17:41fields that happened during this work more.
- 17:43And the first field that I found
- 17:45was actually the ethical AI field.
- 17:47And I found this really wonderful paper
- 17:49that started to sort of assess what the
- 17:51values were in the field and they did this.
- 17:53This was the paper that won best paper
- 17:55at the leading ethical AI conference,
- 17:58and it's called the values encoded
- 18:00in machine learning research.
- 18:01And what they did this graph
- 18:02is a little bit hard to see.
- 18:04But the values of the front
- 18:06represent things like accuracy.
- 18:07And even though this paper was,
- 18:08that was evaluated in papers that had
- 18:11done with this ethical AI conference.
- 18:13So things like user rights,
- 18:15ethical principles,
- 18:15which are in red and purple
- 18:17are much lower on the chart.
- 18:19And so this is not really solving the
- 18:21problem, but was sort of just saying,
- 18:23you know, what are our values?
- 18:24Are they represented in our work?
- 18:25So that's kind of the place
- 18:27that you have to start.
- 18:29At the same conference they kind of
- 18:31another group took it a step further
- 18:33and said we need to go a little bit
- 18:35beyond just accepting like what the
- 18:37existing values are in the literature.
- 18:39We need to design A framework for how we can,
- 18:42you know,
- 18:42assess them consistently going forward.
- 18:44So they introduced a paper called
- 18:46towards the multi stakeholder
- 18:48value based assessment framework
- 18:50for algorithmic systems and I just
- 18:52want to briefly walk you through it
- 18:54because it has really inspired me.
- 18:56So you first have to start by kind
- 18:57of laying out what your values are
- 18:59and they decided to put this on the
- 19:02wheel because some of your values are
- 19:03going to conflict with each other.
- 19:05Now this might not all these values
- 19:07might not work for our field,
- 19:08but you could think about updating
- 19:10them for our field.
- 19:11So here they're primary value that
- 19:13they cared about was privacy and
- 19:16privacy is kind of in conflict
- 19:18with transparency maybe.
- 19:18So values that are on opposite
- 19:20sides of the wheel,
- 19:22you know you can't prioritize
- 19:23them at the same time.
- 19:24So first step is identifying.
- 19:26What's your top value?
- 19:27You then have to kind of go
- 19:29on to taking that value,
- 19:31identifying the criteria of
- 19:32that value and how it manifests.
- 19:34So in the example of privacy,
- 19:37some of the criteria might look like a
- 19:39data protection right to erase the data,
- 19:41and some of the more specific manifestations
- 19:44might be things like purpose,
- 19:46statement of data collection,
- 19:47statement of how long the data is kept.
- 19:49So clearly this is a bit messier
- 19:51of a process,
- 19:52that one simple formula for accuracy,
- 19:54but it's messy and necessary.
- 19:59After we kind of measured it and
- 20:00know what we're optimizing for,
- 20:02we can move on to actually optimizing it.
- 20:04I'm going to provide some concrete examples,
- 20:07again sort of from the ethical
- 20:09AI machine learning field,
- 20:11but I'll try to also bring it to our field.
- 20:13So in this example,
- 20:15fairness was a priority value.
- 20:16This is from a paper that was published,
- 20:18I think about two years ago in
- 20:21nature medicine, and they took.
- 20:24The MRI's and looked to predict
- 20:27the patient pain score instead
- 20:29of the radiologist diagnosis.
- 20:31And this is still a supervised
- 20:33learning problem.
- 20:34So they're from an image predicting a score,
- 20:37but the score that most people
- 20:39predict is this radiologist diagnosis,
- 20:41which is a standard measure of pain severity.
- 20:44But in the same data set they
- 20:46also had patient self reported
- 20:47pain and what does this simple,
- 20:49you know,
- 20:49switching up the target led to with them
- 20:52discovering that relative to the radiologist.
- 20:54Diagnosis,
- 20:54which only accounted for 9% of
- 20:57racial disparities and pain using
- 20:59the self reported pain accounted
- 21:01for 43% or almost five times
- 21:03more racial disparities and pain.
- 21:05So still the supervised learning problem,
- 21:07they're actually still optimizing for
- 21:08something like mean squared error,
- 21:10but in just switching this some
- 21:12kind of a creative way,
- 21:13they were able to more align
- 21:15with what their value was,
- 21:16in this case fairness.
- 21:19This is another example of fairness.
- 21:20I'll go through it quickly.
- 21:23They looked at equal opportunity
- 21:26and multi objective optimization.
- 21:28This is in the same fact conference
- 21:30of ethical AI and this was again
- 21:32a supervised learning framework
- 21:33and they wanted to not necessarily
- 21:35pay a trade off in fairness for
- 21:37accuracy that realized that both
- 21:39were really important.
- 21:40So they set up a really nice joint
- 21:43optimization problem where they
- 21:44could find the optimal solution that
- 21:46represented both values as a bit complicated.
- 21:49I have a lot more examples about
- 21:51this if you want to talk more later.
- 21:53The next last example I'm going
- 21:55to talk about is,
- 21:56other than fairness being a value
- 21:57that we want to optimize for,
- 21:59maybe things like efficiency and
- 22:01usability might be our priority value.
- 22:03I've been inspired by the human computer
- 22:07interface literature specifically.
- 22:08There's been a lot of work of learning to
- 22:10optimize for teamwork or recognizing that,
- 22:13and this kind of goes in the medical example.
- 22:15It's never going to be just AI alone.
- 22:17There's always going to be a
- 22:19person there that's communicating.
- 22:20And how can we optimize for
- 22:22that relationship? How can we?
- 22:24Not only get people to like
- 22:26doctors to use the model,
- 22:27but how can we introduce the
- 22:29information to them at the right time?
- 22:31So there was a couple of papers
- 22:33from Microsoft that really tackled
- 22:34this problem deeply and came up
- 22:36with some really nice solutions.
- 22:38And just to share one example of
- 22:40the brain predictive modeling field,
- 22:42this comes from my own work where I've
- 22:45really focused on not just like sharing code,
- 22:47which is a nice step,
- 22:48but how do I actually make these things
- 22:51accessible and to people without the
- 22:53same computational skills or resources.
- 22:56So my first two papers focused on sharing
- 22:58pre trained models of normative modeling.
- 23:01I then took a step further to kind of
- 23:04write a protocol of how to do it and
- 23:07even a step further designing a website.
- 23:10Where people can just upload CSV,
- 23:12click and drop things. So again,
- 23:14all of these examples are very creative.
- 23:16Not as straightforward solutions as accuracy,
- 23:19but they really get us closer
- 23:21to what we truly care about.
- 23:23Hopefully some of the benefits
- 23:25have come across already.
- 23:26But just to say this concretely,
- 23:28I think if we think more
- 23:30about optimizing for utility,
- 23:31it becomes more collaborative,
- 23:33efficient and there's just
- 23:35more well defined purpose.
- 23:37Is functional has ruled out the
- 23:39meaning rather than attractive,
- 23:41shallow and surface level
- 23:42appeal that intelligence has.
- 23:44It's really an opportunity to
- 23:46think deeply and align our models
- 23:48with what we really care about.
- 23:50And creative thinking and problem
- 23:52solving this recall is required.
- 23:54It's going to be more of a challenge,
- 23:56but that also means I think our
- 23:57solution will be much more satisfying.
- 23:59OK,
- 24:00there's obviously some roadblocks
- 24:01that are involved in this process.
- 24:03We have a lot of cognitive biases that
- 24:06might make us focus on much simpler
- 24:08problems as problem complexity increases,
- 24:10which tend to shift our responsibility
- 24:13and think along the lines of
- 24:14this is somebody else's problem.
- 24:16Maybe especially in academia,
- 24:17we think this is more of an industry problem.
- 24:19But actually in especially mental health,
- 24:22there's not that many people
- 24:23in industry doing this.
- 24:24It's it's mostly within academia.
- 24:26So I think it really,
- 24:28it is our responsibility.
- 24:29Obviously making the utility
- 24:31explicit is more difficult,
- 24:33but when you were scientists,
- 24:34we love a challenge,
- 24:35so I challenge us.
- 24:37The next roadblock is kind of where
- 24:39things happen in development versus
- 24:41where they're really deployed
- 24:42and when they become useful.
- 24:44So during development,
- 24:45we have often a single decision maker.
- 24:47This is often the person,
- 24:49the machine learning engineer or
- 24:51the PhD student training the model.
- 24:53We're often in a situation where
- 24:54the data is stationary because we're
- 24:56working with secondary data analysis.
- 24:58So it's kind of fixed.
- 24:59It is the data.
- 25:00We can't necessarily change it,
- 25:02but really in the real world,
- 25:04there's going to be many people involved.
- 25:05There's going to be.
- 25:07Simon mentioned maybe a,
- 25:08A you know medical doctor,
- 25:10a patient involved.
- 25:11All of these people will have different
- 25:12values and different value priority queues.
- 25:14So it's going to there's no way around it.
- 25:16It's going to be complicated and
- 25:18requires a lot of iteration and open
- 25:21discussion and and valuing diverse you
- 25:23know opinions and mapping them out.
- 25:26Also our data of course in the real
- 25:29world doesn't often incomplete
- 25:31and SCN is always evolving.
- 25:33And again another thing,
- 25:35I've kind of brought this up
- 25:37when I mentioned the multi
- 25:38objective optimization.
- 25:39Sometimes we think about our values
- 25:41as in conflict with another.
- 25:44Like if I focus on fairness,
- 25:45I'll pay off the price and accuracy.
- 25:47And then if I go to having, you know,
- 25:50more than just two values that I care about,
- 25:53it becomes infinitely harder and you
- 25:55can't optimize for everything at once.
- 25:56And I think this can be fixed with just
- 25:58really sitting down and for each problem
- 26:00we're trying to solve focusing on.
- 26:02What is most important to us?
- 26:03And again, this is not always
- 26:06going to be the same value.
- 26:08And done a few future directions
- 26:11in this work.
- 26:12I've been really inspired
- 26:13by a lot of other fields.
- 26:14I've been thinking about utility,
- 26:16defining it and optimizing for it already
- 26:18and we really need to learn from them.
- 26:20I showed you 2 examples
- 26:22from machine learning,
- 26:23the ethical AI papers and the
- 26:25human computer interaction papers.
- 26:27There's also behavioral economics which
- 26:29is really attached mapping utility
- 26:31functions and modeling them mathematically.
- 26:34So I think we can look to them for
- 26:36inspiration and then the value
- 26:37based Healthcare is really tackled.
- 26:39How to measure very complex outcomes
- 26:41so we can also be inspired by them?
- 26:45And again,
- 26:46everything that we're talking about
- 26:47with utility and our value priorities
- 26:50is going to depend on the context.
- 26:52And we really need to keep open
- 26:54communication and guidelines
- 26:55about making these decisions.
- 26:56I think the field has been
- 26:58really great about, you know,
- 26:59setting up like standards for
- 27:01reporting methods or, you know,
- 27:02recognizing these things.
- 27:03But we haven't really worked on it in
- 27:05this post analysis machine learning space.
- 27:07So I think we're ready for it.
- 27:10And just some take home messages,
- 27:12we've really had too much of a
- 27:14tunnel vision focus on the accuracy
- 27:16of our predictive models,
- 27:17which has made us lose track of why
- 27:19we're doing this and it's sort of
- 27:21created this lack of model utility.
- 27:23I think it needs to be a priority
- 27:25to define our values,
- 27:26which is going to build a better plan
- 27:28for moving towards these goals and values.
- 27:30And optimizing for utility
- 27:31is really an abstract,
- 27:32creative process that requires
- 27:34diverse perspectives and input,
- 27:36and it's going to be a very iterative,
- 27:37ongoing process.
- 27:39And with that,
- 27:40I just want to acknowledge my team and
- 27:42another one that's been supportive.
- 27:44Also the team I work with in Ann Arbor,
- 27:46MI and my dog Charlie Mott,
- 27:48who is really accompanied me on
- 27:49all of these thought experiments.
- 27:59Any questions for a sage? Right.
- 28:03I quite like the message. Give me a lot of.
- 28:10One of the maybe I missed something.
- 28:14I'm talking about. Applying and driving
- 28:16where there's lability is part of
- 28:19the the set apartment component of.
- 28:24Yeah. So the question was like.
- 28:29OK. Yeah, lots to think about with this very.
- 28:32Yeah as I mentioned the philosophical talk,
- 28:34but the question was I'm saying
- 28:36we need to optimize for utility.
- 28:38But then you brought up like explain
- 28:40ability and how that fits in.
- 28:41It's sort of utility I see as a as
- 28:44a manifestation of what our values
- 28:46are and that's context dependent.
- 28:48Explain ability might be that value might
- 28:51actually be utility or usefulness but I'm
- 28:54sort of using utility as this like you know,
- 28:57you can sort of think of it.
- 28:58Our values.
- 28:59So explain ability falls into it,
- 29:01but it might not always be explain ability.
- 29:05For example, a lot of times these models are
- 29:08are beautiful people they can provide us.
- 29:11Combine just ridiculous
- 29:14source provides insight.
- 29:16Doesn't really tell the plot moves
- 29:18that that my ideas would do.
- 29:20We actually learned learned about
- 29:22the system from the particular model.
- 29:24So it's going and sort of
- 29:25that that works and why.
- 29:27Yeah, I think explain ability is
- 29:29really important and also accuracy
- 29:31could be could fall within you know
- 29:34utility like having an accurate model
- 29:36is really useful if it's completely,
- 29:38you know always worse than chance,
- 29:40it's not that useful model.
- 29:41So I'm not saying that accuracy is wrong,
- 29:44but it's just it's only been focusing
- 29:45on that and there's a lot of other
- 29:47values that we have that I think we
- 29:49need to consider and maybe they need
- 29:51to turn the first like explain ability
- 29:53especially with the neuroscience might
- 29:55be one that you know takes priority.
- 29:58I think the comment on that,
- 30:00there's a there's a lot of that in
- 30:01medicine now like there's tons of
- 30:03you know they use drugs all the
- 30:04time on off label things and they
- 30:06don't really know why they work,
- 30:07but they work and the key
- 30:09thing is that they work right.
- 30:10And then in terms of if you put
- 30:13this in terms of radiologic.
- 30:16Thought I guess depending on the application,
- 30:19right, whether you have a screening
- 30:21application or a diagnostic application,
- 30:22you'll accept, you know,
- 30:23different rates of false positives
- 30:25or false negatives.
- 30:26So I think that it is essential
- 30:28to keep in mind what the task is,
- 30:31what goals are,
- 30:32and there's levels of explain
- 30:33ability, right, you know?
- 30:36Probably the mechanistic insight. Because.
- 30:40It's very important that you guys.
- 30:45Observation. Yeah.
- 30:50Yeah, there's countless
- 30:52examples of that not being true.
- 30:55Well, I mean do you think that?
- 30:58You know something?
- 31:00White matter tract alleviate depression.
- 31:02You know, maybe it works some of the time.
- 31:05So hey, it works. And if it doesn't
- 31:06work we'll try something else.
- 31:08They can stick a vagal nerve
- 31:10stimulator and stop epilepsy.
- 31:11Do they know how that works now?
- 31:12But it works.
- 31:15With leads to a much better.
- 31:18Yeah, sure.
- 31:19That's why we do the mechanism.
- 31:25Or whatever. I don't
- 31:26want to take over stages.
- 31:29I just want to say also I think like I
- 31:31think they're Ness has become a value.
- 31:32There's been a lot of you know,
- 31:34Todd's nature papers, a great example.
- 31:35Simon brought up a couple of ones.
- 31:37But is, you know, is it expandability?
- 31:40It's, you know, doesn't hold across a
- 31:42lot of different subgroups and things.
- 31:43We're not even measuring a
- 31:44lot of these constructs.
- 31:45Well, so yeah, I I don't have.
- 31:48I only have examples, not answers.
- 31:50But my purpose is really to say we
- 31:52need to be talking and thinking.
- 31:54About this before moving forward,
- 31:55Simon Ohh and yeah, repeat Simon's
- 31:57question or speak that we have.
- 32:03Are we actually at a stage where we
- 32:06can make compromises between accuracy
- 32:08and utility if our accuracy for most
- 32:12clinical diagnosis is less than 80% or
- 32:16for behavioral predictions like R = .28?
- 32:19I mean, if we give up anything
- 32:21from already close to nothing,
- 32:23we are left with nothing?
- 32:26What the increase of the questions that
- 32:28you know should we not focus on accuracy
- 32:30if we're already really bad at accuracy.
- 32:32Again I I sort of started thinking about
- 32:34this not only in the neuroscience like
- 32:36brain behavior predictive modeling field.
- 32:38I really was thinking about it as machine
- 32:40learning and whole edit at a hole in
- 32:42our our hyper focus on becoming super
- 32:44intelligent and a general intelligence.
- 32:46So I was thinking a bit broader than that
- 32:48and that's really where this is coming from.
- 32:50But again I think accuracy can still be
- 32:53a value, it's just not our only value.
- 32:57Well, it has, Breakspear pointed out.
- 32:59The bar is pretty low, so
- 33:03we can predict paying quite well.
- 33:05Or I mean, maybe arguably.
- 33:10Thank you. Much better, Jesse.
- 33:15See for example example. Explain.
- 33:21Umm. Would you recommend?
- 33:25Altering how you do it on the input side,
- 33:27like you know kind of features you use,
- 33:29how you segregate groups or whatever,
- 33:31or you advocate doing something
- 33:34on the implementation side,
- 33:36changing their algorithm.
- 33:38Like what do you think is the. I think
- 33:40there's not one answer to it.
- 33:42I think going back to that machine
- 33:44learning lifecycle, it really has to
- 33:46start with defining the objective.
- 33:48What are we predicting the right
- 33:50thing with the right features,
- 33:51with the right outcome.
- 33:52You know, and that radiologist example,
- 33:54all they did, they were still using
- 33:56the same input of knee MRI and all
- 33:58they did was change the target.
- 34:00So instead of predicting the radiologist
- 34:02diagnosis, they predicted pain,
- 34:03you know, self reported pain.
- 34:05But that's that's not something
- 34:07I can give an answer. Always.
- 34:09It's always going to be dependent
- 34:11on what's available.
- 34:12What are you prioritizing for?
- 34:14It's just it really requires
- 34:16careful thought about,
- 34:17you know what,
- 34:18what are you trying to change
- 34:19and what do you value us?