DIGITAL GENETICS DRIVING PRECISION NEUROSCIENCE WITH AI AND BIOSENSORS
March 31, 2025Information
- ID
- 12955
- To Cite
- DCA Citation Guide
Transcript
- 00:00Our next speaker is a
- 00:02true yearly. It really epitomizes
- 00:05the kind of rising star,
- 00:07that the Adam Center is
- 00:09looking
- 00:10forward to,
- 00:12recruit,
- 00:12in in the future. So
- 00:14if you are interested,
- 00:16shoot me an email.
- 00:18Jason Liu,
- 00:20has an a bachelor's degree
- 00:22in applied mathematics from Yale
- 00:24and then received his PhD
- 00:26in computational biology
- 00:28and biomedical informatics,
- 00:30again from Yale, and is
- 00:32now a postdoctoral associate
- 00:34in Marc Gerstein's lab.
- 00:37And
- 00:38looking forward to your talk.
- 00:40Jason is
- 00:41connecting genomics,
- 00:44to,
- 00:44biosensors
- 00:45and really, opening a new
- 00:48new area of genetics that
- 00:50I think of as digital
- 00:52genetics or biosensor genetics, and
- 00:54that sort of allows
- 00:56to link what,
- 00:58this digital twin model,
- 01:01back to biosensors in live
- 01:03patients, which has enormous applications.
- 01:07Great. Thank you so much,
- 01:08doctor, for the introduction.
- 01:11And,
- 01:12I'm very honored to be
- 01:12here today to speak to
- 01:13you all, about some of
- 01:15our exciting work using AI
- 01:17and biosensors
- 01:18to drive precision medicine.
- 01:20So I'll just kind of
- 01:21give a quick introduction in
- 01:23terms of the terminology that
- 01:24I'll be using today.
- 01:26On the left here, you'll
- 01:26see,
- 01:27we'll refer to the clinical
- 01:29diagnosis as macrophenotype,
- 01:31this brown m here.
- 01:33And then, of course, we
- 01:34have the genotype here in
- 01:36the gray g. And a
- 01:37lot of the research here
- 01:39at the Adam Center and,
- 01:40many other people are interested
- 01:42in is linking together
- 01:43the macrophenotype
- 01:45to the genotype to understand
- 01:47the genetic architecture of disease.
- 01:49And, of course,
- 01:50this has been done quite
- 01:52successfully
- 01:53through many different case control
- 01:55GWAS.
- 01:56So just here as a
- 01:57couple of examples,
- 01:58Parkinson's GWAS,
- 02:00over fifty thousand cases,
- 02:02recently published in Nature Genetics
- 02:03and and also
- 02:05neuropsychiatric,
- 02:06conditions here at ADHD has
- 02:08been widely studied by the
- 02:09PGC.
- 02:10So today, most of my
- 02:12talk is gonna focus on
- 02:14ADHD,
- 02:15not directly Parkinson's disease, but
- 02:17I,
- 02:18hope to kind of show
- 02:19that a lot of this
- 02:20work is,
- 02:22adaptable and applicable
- 02:23to the research going on
- 02:25with Parkinson's disease.
- 02:26So
- 02:27despite all this, GWAS that
- 02:29has been occurring, you know,
- 02:31there there still is this,
- 02:33question of missing heritability, which
- 02:35is that if we look
- 02:36at the heritability given by
- 02:38a twin study, it's very
- 02:39high, especially for things like
- 02:41Parkinson's, Alzheimer's, ADHD.
- 02:43But
- 02:44what we actually capture using
- 02:45these case and control GWAS
- 02:47is only a fraction of
- 02:48that. And that difference, the
- 02:50missing heritability, is kind of
- 02:51what we're after.
- 02:52And so the question is,
- 02:54are there ways that we
- 02:55can improve this, capture more
- 02:57of that missing heritability?
- 02:59And so one of the
- 03:00ways, that we wanna do
- 03:02so is through this concept
- 03:03of precision phenotyping
- 03:05or intermediate phenotypes.
- 03:07So in the previous speakers,
- 03:08we've kind of heard already
- 03:09a little bit how genomics
- 03:11is used using, for example,
- 03:12single cell,
- 03:14RNA. You can do EQTLs,
- 03:16to link to the genetics.
- 03:18But today, what I'm gonna
- 03:19mainly speak about is how
- 03:21we can use digital technology
- 03:23and in particular, wearable and
- 03:25smartwatches.
- 03:26There's been kind of an
- 03:27emergence of this,
- 03:29the popularity of smartwatches.
- 03:30Many, many people have it.
- 03:31It's more accessible
- 03:33in terms of cost.
- 03:34And so the question is,
- 03:36if we use the information
- 03:38captured by the smartwatch,
- 03:40can we first, number one,
- 03:42link it to the macro
- 03:43phenotype or the disease, gain
- 03:45some sort of clinical insight?
- 03:46And then second of all,
- 03:48if we can do that
- 03:49successfully,
- 03:49can we then relink it
- 03:51back to the genotype
- 03:52to hopefully gain more statistical
- 03:54power in terms of our
- 03:56ability for genetic discovery?
- 04:01So for the rest of
- 04:02the talk, I'm gonna, kind
- 04:03of give an overview
- 04:04into one of our recent,
- 04:06projects.
- 04:07This is, digital phenotyping with
- 04:09wearables,
- 04:10for a cohort,
- 04:11known as the ABCD,
- 04:13adolescent brain cognitive development study.
- 04:16And, it was recently published
- 04:17in Cell. And,
- 04:19so just the kind of
- 04:20a high level overview of
- 04:22the data, it's a few
- 04:23thousand individuals with psychiatric diagnosis.
- 04:26They also have digital data.
- 04:27In this case, it's Fitbit
- 04:29smartwatches.
- 04:30And then finally, we have,
- 04:32genetic information for all of
- 04:33these individuals.
- 04:37And so oftentimes, the question
- 04:39that people ask is, well,
- 04:41what does that data look
- 04:42like? What does Fitbit data
- 04:43look like? So just on
- 04:45the left here, I'm
- 04:46showing, three individuals,
- 04:48their signal tracks across a
- 04:50variety of different modalities.
- 04:52So for these three individuals,
- 04:53we can see their heart
- 04:54rate, their calories, activity, steps,
- 04:56so on and so forth.
- 04:57These are things that,
- 04:59probably if you have a
- 05:00smartwatch, then you can also,
- 05:02look at as well.
- 05:03But, you know, this data
- 05:05is very noisy, and it
- 05:06has,
- 05:07problems with it in terms
- 05:09of downstream analysis. So how
- 05:10how can we prepare or
- 05:11process this data in a
- 05:13way that makes it suitable
- 05:14for downstream analysis?
- 05:16Well, one of the, very
- 05:18straightforward ways to do so
- 05:19is to summarize the data.
- 05:21So if I have somebody's
- 05:22heart rate, it changes across
- 05:24the day, I can do
- 05:25something very simple, which is
- 05:26to say, just take take
- 05:27the mean heart rate during
- 05:28the day or the mean
- 05:29heart rate at night. And
- 05:31we can look at a
- 05:32variety of different statistical measures
- 05:34and then summarize them into
- 05:35what we're calling the static
- 05:37features.
- 05:38We call them static features
- 05:39because,
- 05:41by nature of being a
- 05:42time series and then summarized,
- 05:43we lose some of that
- 05:45temporal resolution.
- 05:46But the good thing about
- 05:48the static features is they're
- 05:49very easy to understand. They're
- 05:51easy to work with. We
- 05:52can make this nice matrix
- 05:53of individuals by features,
- 05:56and then we can attach
- 05:57on a set of covariates
- 05:58that we would typically use
- 06:00for these types of studies.
- 06:01And so the static features
- 06:03are are one of the
- 06:04ways that we're gonna process
- 06:05the data.
- 06:06But as I just mentioned,
- 06:09the static features, we lose
- 06:10some of the temporal
- 06:12dynamics, meaning,
- 06:14how somebody is changing behaviorally
- 06:16or physiologically
- 06:17on a seconds or minute
- 06:18level. And so in order
- 06:20to preserve some of that,
- 06:22we still want to kind
- 06:23of,
- 06:24move to some sort of
- 06:25feature set that is temporally
- 06:27resolved. And, we're calling that
- 06:29temporally resolved feature set the
- 06:31dynamic features.
- 06:33And there's a variety of
- 06:34steps that we,
- 06:35perform on the raw data
- 06:37to achieve that,
- 06:38dynamic features, and I'll just
- 06:40briefly go over some of
- 06:41those steps now.
- 06:42So first of all, one
- 06:44of the challenges of this,
- 06:46wearable data is everyone's data
- 06:47looks very different. It's collected
- 06:49at different times. And so
- 06:50in the first step, what
- 06:52we're really interested is in
- 06:54aligning individuals
- 06:55across different days, seasonalities.
- 06:59And after we've aligned it,
- 07:00then we want to take,
- 07:02optimal window selection or a
- 07:04slice of that data.
- 07:05Some individuals, they may have
- 07:07three weeks of data. Others
- 07:09might just have a few
- 07:10days of data. And it's
- 07:11so it's important that we're
- 07:13kind of making a fair
- 07:14comparison, and and we have
- 07:16this empirical optimal window selection.
- 07:18And that results, on the
- 07:19right hand side, you can
- 07:20see, with about sixty seven
- 07:22percent of the data remaining,
- 07:25and for kind of over
- 07:26two thousand individuals. You can
- 07:28see if we if we
- 07:29go to a higher threshold
- 07:30of inclusion,
- 07:31the sample size on the
- 07:32y axis drops quite significantly.
- 07:33So this was, kind of
- 07:35an empirically derived,
- 07:37optimal window selection.
- 07:40And then after we've done
- 07:41the optimal window selection,
- 07:43of course, there's still some
- 07:44small amounts of missing data,
- 07:46and we do a very
- 07:47simple linear imputation here. But
- 07:49in addition to the linear
- 07:50imputation,
- 07:51we're also tracking which variables
- 07:54at what time points were
- 07:55actually imputed and which ones
- 07:57are actually observed.
- 07:59And the point of that
- 08:00is to let downstream
- 08:02AI and machine learning models
- 08:04actually decide how reliable these
- 08:06imputed values are.
- 08:08And this kind of process
- 08:09in aggregate gives us what's
- 08:11known as the dynamic features.
- 08:13So we have static features
- 08:14and dynamic features. This is
- 08:15how we've processed smartwatch data
- 08:17to build kind of a
- 08:18digital phenotype.
- 08:20And the next question is,
- 08:21how how do we use
- 08:21it in in modeling?
- 08:24So for the static features,
- 08:25it's it's quite straightforward. You
- 08:27have a a matrix of
- 08:28individuals by features,
- 08:30and we can use traditional
- 08:32machine learning,
- 08:33models here. In this case,
- 08:35we've, tested XGBoost and RandomForce,
- 08:37which are quite popular,
- 08:39machine
- 08:40machine learning models.
- 08:42And the idea is, can
- 08:43we use those static features
- 08:45to then,
- 08:46predict individuals with a particular,
- 08:49disease or if they're a
- 08:50healthy control.
- 08:52And one of the byproducts
- 08:53of doing machine learning like
- 08:54this is,
- 08:55you can see I've I've
- 08:56kind of marked here the,
- 08:58green d score, digital phenotyping
- 09:01score, or an AI generated
- 09:03risk score, it is to
- 09:04move away from binary
- 09:06definitions
- 09:07of disease.
- 09:08We're very used to using
- 09:10zeros and ones to identify
- 09:12people who have or do
- 09:13not have a disease, But
- 09:14this is, kind of one
- 09:15of the byproducts of using
- 09:16machine learning models is we
- 09:18can move towards kind of
- 09:19a continuum or a spectrum,
- 09:21and that can be more
- 09:22inclusive and, more precise in
- 09:24terms of defining individuals.
- 09:28Now in terms of the
- 09:29dynamic features, those are
- 09:31a time series,
- 09:32and they're a little bit
- 09:34different. We we can't use
- 09:35a traditional machine learning model
- 09:36to deal with them. And
- 09:38so instead, what we've adopted
- 09:40here is a convolutional neural
- 09:41net. So this is a
- 09:42a deep learning model.
- 09:44And on the left side
- 09:45here, you'll see those dynamic
- 09:46features. We we treat them
- 09:47actually very similar to an
- 09:49image.
- 09:50You have many different channels,
- 09:52across different time points.
- 09:55And, very similar to, like,
- 09:57a image classification task,
- 09:58we would want to use
- 10:00this stack of digital phenotypes
- 10:02to, again, predict the macro
- 10:04phenotype.
- 10:05Again, we're able to generate
- 10:06this
- 10:07continuous based risk score to
- 10:09hopefully more precisely characterize an
- 10:12individual.
- 10:13And and one technical detail
- 10:15that I'll just highlight here
- 10:16is the use of a
- 10:18variable size convolutional filter here
- 10:20in the bottom. And and
- 10:22the idea of the variable
- 10:23size convolutional filter is behavioral
- 10:26and physiological changes
- 10:28may occur on a minute
- 10:30or a very small time
- 10:31scale, and we're definitely interested
- 10:33in those changes.
- 10:34But we're also interested in
- 10:35more global changes that perhaps
- 10:37occur on the day or
- 10:38the weekly level.
- 10:40And so by using this
- 10:41variable size convolutional filter, we're
- 10:43actually able to capture more
- 10:45of those behavioral and physiological
- 10:47changes at different time scale.
- 10:50So what what are the
- 10:52main results or kind of,
- 10:54findings of using these models
- 10:55in this data?
- 10:57So, again, we're here look
- 10:58not looking exactly at Parkinson's.
- 11:00We're looking at ADHD and
- 11:01anxiety disorder, but, again, many,
- 11:04applications and extensions
- 11:06to Parkinson's.
- 11:07So in in the case
- 11:08of ADHD, the top row
- 11:09in blue,
- 11:11and and anxiety disorder, in
- 11:13the first column here, we
- 11:14have three different box plots.
- 11:15And and these are showing
- 11:16the accuracy of identifying individuals
- 11:19with or without the disease.
- 11:20And we can see that
- 11:22and we can see that
- 11:24the baseline model at the
- 11:25first model is without any
- 11:27wearable information.
- 11:28The second model is using
- 11:30the static features, and then
- 11:31the third model is using
- 11:32actually those deep learning based,
- 11:35features, the time series. And
- 11:37we can see the a
- 11:38kind of a increase in
- 11:39performance and accuracy of the
- 11:40model as we incorporate more
- 11:42of those temporally resolved features.
- 11:45But in addition to just
- 11:46improving the accuracy,
- 11:48we're also able to identify
- 11:49what are the the actual
- 11:51physiological features that drive that
- 11:53prediction.
- 11:54So
- 11:55for example, in ADHD,
- 11:57we find that heart rate
- 11:58is really kind of the
- 11:59the key driver,
- 12:00but in anxiety disorder, sleep
- 12:02quality is.
- 12:03And not only that, we're
- 12:05able to temporarily resolve that
- 12:07importance.
- 12:08Meaning, in in this curve
- 12:10here, we're showing where was
- 12:12heart rate important. Is heart
- 12:13rate important all the time
- 12:14or just some of the
- 12:15time? And and we're able
- 12:16to temporarily resolve the heart
- 12:17rate importance to kind of
- 12:18the early to, late afternoon.
- 12:21And then, of course, for
- 12:22sleep, it's during the night.
- 12:23But in particular, you can
- 12:25see a peak there around
- 12:27five AM. And and, again,
- 12:28these are adolescents with anxiety
- 12:30disorder, and so we're kind
- 12:31of showing that perhaps there's
- 12:32sleep disturbance
- 12:33as they're waking up going
- 12:35to school, and that's really
- 12:36driving
- 12:37potential,
- 12:38phenotypic traits.
- 12:41Okay. So that was kind
- 12:42of the,
- 12:43using this wearable data to
- 12:44make predictions,
- 12:46about phenotype.
- 12:47But then, of course, we're
- 12:48very interested in genetic discovery.
- 12:50So the question then becomes,
- 12:52if we move away from
- 12:53binary traits and go to
- 12:55these continuous based digital phenotypes,
- 12:57are we actually able to
- 12:58gain statistical power in terms
- 13:00of genetic discovery?
- 13:03So in order to evaluate
- 13:05that, the first thing we
- 13:06wanna do is establish kind
- 13:07of a baseline comparison, meaning
- 13:10let's use just the individuals
- 13:12we have and perform a
- 13:13traditional GWAS, meaning a case
- 13:15control study.
- 13:16And this would be using
- 13:17a zero or one oops.
- 13:19Sorry. A zero or one,
- 13:21for the disease and then
- 13:23the genotype here.
- 13:24And when we do such,
- 13:26analysis, this is using twelve
- 13:28hundred individuals.
- 13:29Perhaps unsurprisingly,
- 13:31we we don't find any
- 13:32genetic loci at genome wide
- 13:34significance above this blue line.
- 13:36And twelve hundred individuals is
- 13:38is really not that many
- 13:39individuals. Most GWAS studies might
- 13:41have hundreds or millions of
- 13:43individuals.
- 13:44So,
- 13:45again, this is not using
- 13:47any of the wearable data.
- 13:49Now instead,
- 13:51let's imagine we have this
- 13:52wearable data as kind of
- 13:53a multidimensional
- 13:54array, a digital phenotype,
- 13:57and this vector d.
- 13:59And instead, now let's model
- 14:00that, against the genotype.
- 14:03And we also include this,
- 14:05genotype by macrophenotype
- 14:07interaction term to ensure that
- 14:09any changes with the digital
- 14:10phenotype are actually tied to
- 14:12the disease itself.
- 14:14And what we find here
- 14:15in the same exact set
- 14:16of individuals that you previously
- 14:18saw, the twelve hundred,
- 14:19individuals, now we're able to
- 14:21see much more enrichment in
- 14:22statistical power. We identified two
- 14:24significant genetic loci.
- 14:26And just to highlight that
- 14:28a little bit, this this
- 14:29particular loci
- 14:30is,
- 14:31related to sedentary time, and
- 14:33you can see that there's
- 14:34a clear difference between individuals
- 14:37that are healthy controls and
- 14:38those individuals that have ADHD.
- 14:40But, additionally,
- 14:42in the group that,
- 14:43individuals of ADHD, you can
- 14:45see that sedentary time drop
- 14:47significantly
- 14:48as we go through, the
- 14:49different genotypes.
- 14:51And so, really, the the
- 14:52result here is kind of
- 14:53establishing this relationship
- 14:55between the disease itself, the
- 14:57genetics,
- 14:58but also these digital phenotypes.
- 15:03And I'll present kind of
- 15:05one other way that we've
- 15:06tackled this problem,
- 15:08which is,
- 15:09if we recall from before,
- 15:11using those machine learning and
- 15:13deep learning models, we're able
- 15:14to generate these digital phenotype
- 15:16or AI generated risk scores.
- 15:18And and those scores really
- 15:20are aggregating all of the
- 15:21information from the smartwatch.
- 15:22Instead, now let's use those
- 15:24as a,
- 15:26target for the genetic discovery.
- 15:28So here, the d, the
- 15:29digital phenotyping score as a
- 15:31function of the genetics.
- 15:33And
- 15:34using this method, again, we
- 15:35we increase the statistical power
- 15:37even more. So now ten
- 15:38genetic loci.
- 15:40And many of these, the
- 15:41pink and the blue represent
- 15:43psychiatric associated genes that are
- 15:44known.
- 15:45Some are ADHD associated,
- 15:48but also we identify,
- 15:50an array of novel targets
- 15:52as well.
- 15:53And so just kind of
- 15:54in in in summary here,
- 15:56really, the idea for the
- 15:58digital phenotypes
- 15:59is, number one, we can
- 16:01use it to drive clinical
- 16:03discovery,
- 16:04clinical characterization of individuals and
- 16:06their subtypes,
- 16:07but also to really drive
- 16:09forward the,
- 16:11the genetics discovery.
- 16:13And I would just wanna
- 16:15emphasize that this,
- 16:17this
- 16:18result here of course, there's
- 16:19there's many questions about causality,
- 16:22and that's kind of ongoing
- 16:24work that we we wanna
- 16:25address and, of course, that
- 16:26this is just the starting
- 16:27point of using digital phenotypes.
- 16:29We there's many much more
- 16:31translational work that needs to
- 16:32be done, and validation as
- 16:34well.
- 16:36So,
- 16:37to kind of end and
- 16:38wrap up here, the the
- 16:40future work,
- 16:41of course, is expanding to
- 16:43other modalities and diseases.
- 16:45The hope here is that
- 16:47this framework, this triangle here,
- 16:49can be used for things
- 16:50like Parkinson's disease and for
- 16:52other kind of, related movement
- 16:53disorders and other neurodegeneration.
- 16:56And the idea is to
- 16:57move away from directly associating
- 16:59the genotype to the macro
- 17:01phenotype, but instead
- 17:03using this intermediate phenotype to
- 17:05to characterize
- 17:06both molecularly, physiologically, and behaviorally
- 17:09individuals
- 17:10better, provide those clinical sorry.
- 17:12Provide those clinical insights,
- 17:14and, of course, most importantly,
- 17:16is to retrace
- 17:17back to the genetic information
- 17:19and, link to potential molecular
- 17:22mechanisms.
- 17:23And and there's a lot
- 17:24of ongoing work right now.
- 17:26Of course, we heard a
- 17:27little bit about the spatial
- 17:28data.
- 17:30I'm very excited about the
- 17:31digital health sphere using wearables,
- 17:34smartwatches, but also video data,
- 17:36video capture data,
- 17:37as well as brain imaging
- 17:38data.
- 17:39And so,
- 17:41yes, happy to, collaborate and
- 17:43and,
- 17:44work on this with others
- 17:45more. So I'll I'll end
- 17:46here just with the acknowledgment
- 17:48slide. In particular, I just
- 17:50wanna highlight Mark Gerstein,
- 17:52whose lab I'm in, and
- 17:53also Walter Roberts from Yale
- 17:55Psychiatry who
- 17:57we,
- 17:58they they're the co co
- 18:00senior authors on the study.
- 18:01And then also I just
- 18:02wanna highlight Beatrice who co
- 18:04led this study with me
- 18:05and, Yun Yang, a very
- 18:06talented graduate student who's been
- 18:08working with me, on this
- 18:09project as well, as well
- 18:10as some of the collaborators,
- 18:12from Barcelona and California. So
- 18:14with that, thank you very
- 18:15much.
- 18:16Yep.