Skip to Main Content

DIGITAL GENETICS DRIVING PRECISION NEUROSCIENCE WITH AI AND BIOSENSORS

March 31, 2025
ID
12955

Transcript

  • 00:00Our next speaker is a
  • 00:02true yearly. It really epitomizes
  • 00:05the kind of rising star,
  • 00:07that the Adam Center is
  • 00:09looking
  • 00:10forward to,
  • 00:12recruit,
  • 00:12in in the future. So
  • 00:14if you are interested,
  • 00:16shoot me an email.
  • 00:18Jason Liu,
  • 00:20has an a bachelor's degree
  • 00:22in applied mathematics from Yale
  • 00:24and then received his PhD
  • 00:26in computational biology
  • 00:28and biomedical informatics,
  • 00:30again from Yale, and is
  • 00:32now a postdoctoral associate
  • 00:34in Marc Gerstein's lab.
  • 00:37And
  • 00:38looking forward to your talk.
  • 00:40Jason is
  • 00:41connecting genomics,
  • 00:44to,
  • 00:44biosensors
  • 00:45and really, opening a new
  • 00:48new area of genetics that
  • 00:50I think of as digital
  • 00:52genetics or biosensor genetics, and
  • 00:54that sort of allows
  • 00:56to link what,
  • 00:58this digital twin model,
  • 01:01back to biosensors in live
  • 01:03patients, which has enormous applications.
  • 01:07Great. Thank you so much,
  • 01:08doctor, for the introduction.
  • 01:11And,
  • 01:12I'm very honored to be
  • 01:12here today to speak to
  • 01:13you all, about some of
  • 01:15our exciting work using AI
  • 01:17and biosensors
  • 01:18to drive precision medicine.
  • 01:20So I'll just kind of
  • 01:21give a quick introduction in
  • 01:23terms of the terminology that
  • 01:24I'll be using today.
  • 01:26On the left here, you'll
  • 01:26see,
  • 01:27we'll refer to the clinical
  • 01:29diagnosis as macrophenotype,
  • 01:31this brown m here.
  • 01:33And then, of course, we
  • 01:34have the genotype here in
  • 01:36the gray g. And a
  • 01:37lot of the research here
  • 01:39at the Adam Center and,
  • 01:40many other people are interested
  • 01:42in is linking together
  • 01:43the macrophenotype
  • 01:45to the genotype to understand
  • 01:47the genetic architecture of disease.
  • 01:49And, of course,
  • 01:50this has been done quite
  • 01:52successfully
  • 01:53through many different case control
  • 01:55GWAS.
  • 01:56So just here as a
  • 01:57couple of examples,
  • 01:58Parkinson's GWAS,
  • 02:00over fifty thousand cases,
  • 02:02recently published in Nature Genetics
  • 02:03and and also
  • 02:05neuropsychiatric,
  • 02:06conditions here at ADHD has
  • 02:08been widely studied by the
  • 02:09PGC.
  • 02:10So today, most of my
  • 02:12talk is gonna focus on
  • 02:14ADHD,
  • 02:15not directly Parkinson's disease, but
  • 02:17I,
  • 02:18hope to kind of show
  • 02:19that a lot of this
  • 02:20work is,
  • 02:22adaptable and applicable
  • 02:23to the research going on
  • 02:25with Parkinson's disease.
  • 02:26So
  • 02:27despite all this, GWAS that
  • 02:29has been occurring, you know,
  • 02:31there there still is this,
  • 02:33question of missing heritability, which
  • 02:35is that if we look
  • 02:36at the heritability given by
  • 02:38a twin study, it's very
  • 02:39high, especially for things like
  • 02:41Parkinson's, Alzheimer's, ADHD.
  • 02:43But
  • 02:44what we actually capture using
  • 02:45these case and control GWAS
  • 02:47is only a fraction of
  • 02:48that. And that difference, the
  • 02:50missing heritability, is kind of
  • 02:51what we're after.
  • 02:52And so the question is,
  • 02:54are there ways that we
  • 02:55can improve this, capture more
  • 02:57of that missing heritability?
  • 02:59And so one of the
  • 03:00ways, that we wanna do
  • 03:02so is through this concept
  • 03:03of precision phenotyping
  • 03:05or intermediate phenotypes.
  • 03:07So in the previous speakers,
  • 03:08we've kind of heard already
  • 03:09a little bit how genomics
  • 03:11is used using, for example,
  • 03:12single cell,
  • 03:14RNA. You can do EQTLs,
  • 03:16to link to the genetics.
  • 03:18But today, what I'm gonna
  • 03:19mainly speak about is how
  • 03:21we can use digital technology
  • 03:23and in particular, wearable and
  • 03:25smartwatches.
  • 03:26There's been kind of an
  • 03:27emergence of this,
  • 03:29the popularity of smartwatches.
  • 03:30Many, many people have it.
  • 03:31It's more accessible
  • 03:33in terms of cost.
  • 03:34And so the question is,
  • 03:36if we use the information
  • 03:38captured by the smartwatch,
  • 03:40can we first, number one,
  • 03:42link it to the macro
  • 03:43phenotype or the disease, gain
  • 03:45some sort of clinical insight?
  • 03:46And then second of all,
  • 03:48if we can do that
  • 03:49successfully,
  • 03:49can we then relink it
  • 03:51back to the genotype
  • 03:52to hopefully gain more statistical
  • 03:54power in terms of our
  • 03:56ability for genetic discovery?
  • 04:01So for the rest of
  • 04:02the talk, I'm gonna, kind
  • 04:03of give an overview
  • 04:04into one of our recent,
  • 04:06projects.
  • 04:07This is, digital phenotyping with
  • 04:09wearables,
  • 04:10for a cohort,
  • 04:11known as the ABCD,
  • 04:13adolescent brain cognitive development study.
  • 04:16And, it was recently published
  • 04:17in Cell. And,
  • 04:19so just the kind of
  • 04:20a high level overview of
  • 04:22the data, it's a few
  • 04:23thousand individuals with psychiatric diagnosis.
  • 04:26They also have digital data.
  • 04:27In this case, it's Fitbit
  • 04:29smartwatches.
  • 04:30And then finally, we have,
  • 04:32genetic information for all of
  • 04:33these individuals.
  • 04:37And so oftentimes, the question
  • 04:39that people ask is, well,
  • 04:41what does that data look
  • 04:42like? What does Fitbit data
  • 04:43look like? So just on
  • 04:45the left here, I'm
  • 04:46showing, three individuals,
  • 04:48their signal tracks across a
  • 04:50variety of different modalities.
  • 04:52So for these three individuals,
  • 04:53we can see their heart
  • 04:54rate, their calories, activity, steps,
  • 04:56so on and so forth.
  • 04:57These are things that,
  • 04:59probably if you have a
  • 05:00smartwatch, then you can also,
  • 05:02look at as well.
  • 05:03But, you know, this data
  • 05:05is very noisy, and it
  • 05:06has,
  • 05:07problems with it in terms
  • 05:09of downstream analysis. So how
  • 05:10how can we prepare or
  • 05:11process this data in a
  • 05:13way that makes it suitable
  • 05:14for downstream analysis?
  • 05:16Well, one of the, very
  • 05:18straightforward ways to do so
  • 05:19is to summarize the data.
  • 05:21So if I have somebody's
  • 05:22heart rate, it changes across
  • 05:24the day, I can do
  • 05:25something very simple, which is
  • 05:26to say, just take take
  • 05:27the mean heart rate during
  • 05:28the day or the mean
  • 05:29heart rate at night. And
  • 05:31we can look at a
  • 05:32variety of different statistical measures
  • 05:34and then summarize them into
  • 05:35what we're calling the static
  • 05:37features.
  • 05:38We call them static features
  • 05:39because,
  • 05:41by nature of being a
  • 05:42time series and then summarized,
  • 05:43we lose some of that
  • 05:45temporal resolution.
  • 05:46But the good thing about
  • 05:48the static features is they're
  • 05:49very easy to understand. They're
  • 05:51easy to work with. We
  • 05:52can make this nice matrix
  • 05:53of individuals by features,
  • 05:56and then we can attach
  • 05:57on a set of covariates
  • 05:58that we would typically use
  • 06:00for these types of studies.
  • 06:01And so the static features
  • 06:03are are one of the
  • 06:04ways that we're gonna process
  • 06:05the data.
  • 06:06But as I just mentioned,
  • 06:09the static features, we lose
  • 06:10some of the temporal
  • 06:12dynamics, meaning,
  • 06:14how somebody is changing behaviorally
  • 06:16or physiologically
  • 06:17on a seconds or minute
  • 06:18level. And so in order
  • 06:20to preserve some of that,
  • 06:22we still want to kind
  • 06:23of,
  • 06:24move to some sort of
  • 06:25feature set that is temporally
  • 06:27resolved. And, we're calling that
  • 06:29temporally resolved feature set the
  • 06:31dynamic features.
  • 06:33And there's a variety of
  • 06:34steps that we,
  • 06:35perform on the raw data
  • 06:37to achieve that,
  • 06:38dynamic features, and I'll just
  • 06:40briefly go over some of
  • 06:41those steps now.
  • 06:42So first of all, one
  • 06:44of the challenges of this,
  • 06:46wearable data is everyone's data
  • 06:47looks very different. It's collected
  • 06:49at different times. And so
  • 06:50in the first step, what
  • 06:52we're really interested is in
  • 06:54aligning individuals
  • 06:55across different days, seasonalities.
  • 06:59And after we've aligned it,
  • 07:00then we want to take,
  • 07:02optimal window selection or a
  • 07:04slice of that data.
  • 07:05Some individuals, they may have
  • 07:07three weeks of data. Others
  • 07:09might just have a few
  • 07:10days of data. And it's
  • 07:11so it's important that we're
  • 07:13kind of making a fair
  • 07:14comparison, and and we have
  • 07:16this empirical optimal window selection.
  • 07:18And that results, on the
  • 07:19right hand side, you can
  • 07:20see, with about sixty seven
  • 07:22percent of the data remaining,
  • 07:25and for kind of over
  • 07:26two thousand individuals. You can
  • 07:28see if we if we
  • 07:29go to a higher threshold
  • 07:30of inclusion,
  • 07:31the sample size on the
  • 07:32y axis drops quite significantly.
  • 07:33So this was, kind of
  • 07:35an empirically derived,
  • 07:37optimal window selection.
  • 07:40And then after we've done
  • 07:41the optimal window selection,
  • 07:43of course, there's still some
  • 07:44small amounts of missing data,
  • 07:46and we do a very
  • 07:47simple linear imputation here. But
  • 07:49in addition to the linear
  • 07:50imputation,
  • 07:51we're also tracking which variables
  • 07:54at what time points were
  • 07:55actually imputed and which ones
  • 07:57are actually observed.
  • 07:59And the point of that
  • 08:00is to let downstream
  • 08:02AI and machine learning models
  • 08:04actually decide how reliable these
  • 08:06imputed values are.
  • 08:08And this kind of process
  • 08:09in aggregate gives us what's
  • 08:11known as the dynamic features.
  • 08:13So we have static features
  • 08:14and dynamic features. This is
  • 08:15how we've processed smartwatch data
  • 08:17to build kind of a
  • 08:18digital phenotype.
  • 08:20And the next question is,
  • 08:21how how do we use
  • 08:21it in in modeling?
  • 08:24So for the static features,
  • 08:25it's it's quite straightforward. You
  • 08:27have a a matrix of
  • 08:28individuals by features,
  • 08:30and we can use traditional
  • 08:32machine learning,
  • 08:33models here. In this case,
  • 08:35we've, tested XGBoost and RandomForce,
  • 08:37which are quite popular,
  • 08:39machine
  • 08:40machine learning models.
  • 08:42And the idea is, can
  • 08:43we use those static features
  • 08:45to then,
  • 08:46predict individuals with a particular,
  • 08:49disease or if they're a
  • 08:50healthy control.
  • 08:52And one of the byproducts
  • 08:53of doing machine learning like
  • 08:54this is,
  • 08:55you can see I've I've
  • 08:56kind of marked here the,
  • 08:58green d score, digital phenotyping
  • 09:01score, or an AI generated
  • 09:03risk score, it is to
  • 09:04move away from binary
  • 09:06definitions
  • 09:07of disease.
  • 09:08We're very used to using
  • 09:10zeros and ones to identify
  • 09:12people who have or do
  • 09:13not have a disease, But
  • 09:14this is, kind of one
  • 09:15of the byproducts of using
  • 09:16machine learning models is we
  • 09:18can move towards kind of
  • 09:19a continuum or a spectrum,
  • 09:21and that can be more
  • 09:22inclusive and, more precise in
  • 09:24terms of defining individuals.
  • 09:28Now in terms of the
  • 09:29dynamic features, those are
  • 09:31a time series,
  • 09:32and they're a little bit
  • 09:34different. We we can't use
  • 09:35a traditional machine learning model
  • 09:36to deal with them. And
  • 09:38so instead, what we've adopted
  • 09:40here is a convolutional neural
  • 09:41net. So this is a
  • 09:42a deep learning model.
  • 09:44And on the left side
  • 09:45here, you'll see those dynamic
  • 09:46features. We we treat them
  • 09:47actually very similar to an
  • 09:49image.
  • 09:50You have many different channels,
  • 09:52across different time points.
  • 09:55And, very similar to, like,
  • 09:57a image classification task,
  • 09:58we would want to use
  • 10:00this stack of digital phenotypes
  • 10:02to, again, predict the macro
  • 10:04phenotype.
  • 10:05Again, we're able to generate
  • 10:06this
  • 10:07continuous based risk score to
  • 10:09hopefully more precisely characterize an
  • 10:12individual.
  • 10:13And and one technical detail
  • 10:15that I'll just highlight here
  • 10:16is the use of a
  • 10:18variable size convolutional filter here
  • 10:20in the bottom. And and
  • 10:22the idea of the variable
  • 10:23size convolutional filter is behavioral
  • 10:26and physiological changes
  • 10:28may occur on a minute
  • 10:30or a very small time
  • 10:31scale, and we're definitely interested
  • 10:33in those changes.
  • 10:34But we're also interested in
  • 10:35more global changes that perhaps
  • 10:37occur on the day or
  • 10:38the weekly level.
  • 10:40And so by using this
  • 10:41variable size convolutional filter, we're
  • 10:43actually able to capture more
  • 10:45of those behavioral and physiological
  • 10:47changes at different time scale.
  • 10:50So what what are the
  • 10:52main results or kind of,
  • 10:54findings of using these models
  • 10:55in this data?
  • 10:57So, again, we're here look
  • 10:58not looking exactly at Parkinson's.
  • 11:00We're looking at ADHD and
  • 11:01anxiety disorder, but, again, many,
  • 11:04applications and extensions
  • 11:06to Parkinson's.
  • 11:07So in in the case
  • 11:08of ADHD, the top row
  • 11:09in blue,
  • 11:11and and anxiety disorder, in
  • 11:13the first column here, we
  • 11:14have three different box plots.
  • 11:15And and these are showing
  • 11:16the accuracy of identifying individuals
  • 11:19with or without the disease.
  • 11:20And we can see that
  • 11:22and we can see that
  • 11:24the baseline model at the
  • 11:25first model is without any
  • 11:27wearable information.
  • 11:28The second model is using
  • 11:30the static features, and then
  • 11:31the third model is using
  • 11:32actually those deep learning based,
  • 11:35features, the time series. And
  • 11:37we can see the a
  • 11:38kind of a increase in
  • 11:39performance and accuracy of the
  • 11:40model as we incorporate more
  • 11:42of those temporally resolved features.
  • 11:45But in addition to just
  • 11:46improving the accuracy,
  • 11:48we're also able to identify
  • 11:49what are the the actual
  • 11:51physiological features that drive that
  • 11:53prediction.
  • 11:54So
  • 11:55for example, in ADHD,
  • 11:57we find that heart rate
  • 11:58is really kind of the
  • 11:59the key driver,
  • 12:00but in anxiety disorder, sleep
  • 12:02quality is.
  • 12:03And not only that, we're
  • 12:05able to temporarily resolve that
  • 12:07importance.
  • 12:08Meaning, in in this curve
  • 12:10here, we're showing where was
  • 12:12heart rate important. Is heart
  • 12:13rate important all the time
  • 12:14or just some of the
  • 12:15time? And and we're able
  • 12:16to temporarily resolve the heart
  • 12:17rate importance to kind of
  • 12:18the early to, late afternoon.
  • 12:21And then, of course, for
  • 12:22sleep, it's during the night.
  • 12:23But in particular, you can
  • 12:25see a peak there around
  • 12:27five AM. And and, again,
  • 12:28these are adolescents with anxiety
  • 12:30disorder, and so we're kind
  • 12:31of showing that perhaps there's
  • 12:32sleep disturbance
  • 12:33as they're waking up going
  • 12:35to school, and that's really
  • 12:36driving
  • 12:37potential,
  • 12:38phenotypic traits.
  • 12:41Okay. So that was kind
  • 12:42of the,
  • 12:43using this wearable data to
  • 12:44make predictions,
  • 12:46about phenotype.
  • 12:47But then, of course, we're
  • 12:48very interested in genetic discovery.
  • 12:50So the question then becomes,
  • 12:52if we move away from
  • 12:53binary traits and go to
  • 12:55these continuous based digital phenotypes,
  • 12:57are we actually able to
  • 12:58gain statistical power in terms
  • 13:00of genetic discovery?
  • 13:03So in order to evaluate
  • 13:05that, the first thing we
  • 13:06wanna do is establish kind
  • 13:07of a baseline comparison, meaning
  • 13:10let's use just the individuals
  • 13:12we have and perform a
  • 13:13traditional GWAS, meaning a case
  • 13:15control study.
  • 13:16And this would be using
  • 13:17a zero or one oops.
  • 13:19Sorry. A zero or one,
  • 13:21for the disease and then
  • 13:23the genotype here.
  • 13:24And when we do such,
  • 13:26analysis, this is using twelve
  • 13:28hundred individuals.
  • 13:29Perhaps unsurprisingly,
  • 13:31we we don't find any
  • 13:32genetic loci at genome wide
  • 13:34significance above this blue line.
  • 13:36And twelve hundred individuals is
  • 13:38is really not that many
  • 13:39individuals. Most GWAS studies might
  • 13:41have hundreds or millions of
  • 13:43individuals.
  • 13:44So,
  • 13:45again, this is not using
  • 13:47any of the wearable data.
  • 13:49Now instead,
  • 13:51let's imagine we have this
  • 13:52wearable data as kind of
  • 13:53a multidimensional
  • 13:54array, a digital phenotype,
  • 13:57and this vector d.
  • 13:59And instead, now let's model
  • 14:00that, against the genotype.
  • 14:03And we also include this,
  • 14:05genotype by macrophenotype
  • 14:07interaction term to ensure that
  • 14:09any changes with the digital
  • 14:10phenotype are actually tied to
  • 14:12the disease itself.
  • 14:14And what we find here
  • 14:15in the same exact set
  • 14:16of individuals that you previously
  • 14:18saw, the twelve hundred,
  • 14:19individuals, now we're able to
  • 14:21see much more enrichment in
  • 14:22statistical power. We identified two
  • 14:24significant genetic loci.
  • 14:26And just to highlight that
  • 14:28a little bit, this this
  • 14:29particular loci
  • 14:30is,
  • 14:31related to sedentary time, and
  • 14:33you can see that there's
  • 14:34a clear difference between individuals
  • 14:37that are healthy controls and
  • 14:38those individuals that have ADHD.
  • 14:40But, additionally,
  • 14:42in the group that,
  • 14:43individuals of ADHD, you can
  • 14:45see that sedentary time drop
  • 14:47significantly
  • 14:48as we go through, the
  • 14:49different genotypes.
  • 14:51And so, really, the the
  • 14:52result here is kind of
  • 14:53establishing this relationship
  • 14:55between the disease itself, the
  • 14:57genetics,
  • 14:58but also these digital phenotypes.
  • 15:03And I'll present kind of
  • 15:05one other way that we've
  • 15:06tackled this problem,
  • 15:08which is,
  • 15:09if we recall from before,
  • 15:11using those machine learning and
  • 15:13deep learning models, we're able
  • 15:14to generate these digital phenotype
  • 15:16or AI generated risk scores.
  • 15:18And and those scores really
  • 15:20are aggregating all of the
  • 15:21information from the smartwatch.
  • 15:22Instead, now let's use those
  • 15:24as a,
  • 15:26target for the genetic discovery.
  • 15:28So here, the d, the
  • 15:29digital phenotyping score as a
  • 15:31function of the genetics.
  • 15:33And
  • 15:34using this method, again, we
  • 15:35we increase the statistical power
  • 15:37even more. So now ten
  • 15:38genetic loci.
  • 15:40And many of these, the
  • 15:41pink and the blue represent
  • 15:43psychiatric associated genes that are
  • 15:44known.
  • 15:45Some are ADHD associated,
  • 15:48but also we identify,
  • 15:50an array of novel targets
  • 15:52as well.
  • 15:53And so just kind of
  • 15:54in in in summary here,
  • 15:56really, the idea for the
  • 15:58digital phenotypes
  • 15:59is, number one, we can
  • 16:01use it to drive clinical
  • 16:03discovery,
  • 16:04clinical characterization of individuals and
  • 16:06their subtypes,
  • 16:07but also to really drive
  • 16:09forward the,
  • 16:11the genetics discovery.
  • 16:13And I would just wanna
  • 16:15emphasize that this,
  • 16:17this
  • 16:18result here of course, there's
  • 16:19there's many questions about causality,
  • 16:22and that's kind of ongoing
  • 16:24work that we we wanna
  • 16:25address and, of course, that
  • 16:26this is just the starting
  • 16:27point of using digital phenotypes.
  • 16:29We there's many much more
  • 16:31translational work that needs to
  • 16:32be done, and validation as
  • 16:34well.
  • 16:36So,
  • 16:37to kind of end and
  • 16:38wrap up here, the the
  • 16:40future work,
  • 16:41of course, is expanding to
  • 16:43other modalities and diseases.
  • 16:45The hope here is that
  • 16:47this framework, this triangle here,
  • 16:49can be used for things
  • 16:50like Parkinson's disease and for
  • 16:52other kind of, related movement
  • 16:53disorders and other neurodegeneration.
  • 16:56And the idea is to
  • 16:57move away from directly associating
  • 16:59the genotype to the macro
  • 17:01phenotype, but instead
  • 17:03using this intermediate phenotype to
  • 17:05to characterize
  • 17:06both molecularly, physiologically, and behaviorally
  • 17:09individuals
  • 17:10better, provide those clinical sorry.
  • 17:12Provide those clinical insights,
  • 17:14and, of course, most importantly,
  • 17:16is to retrace
  • 17:17back to the genetic information
  • 17:19and, link to potential molecular
  • 17:22mechanisms.
  • 17:23And and there's a lot
  • 17:24of ongoing work right now.
  • 17:26Of course, we heard a
  • 17:27little bit about the spatial
  • 17:28data.
  • 17:30I'm very excited about the
  • 17:31digital health sphere using wearables,
  • 17:34smartwatches, but also video data,
  • 17:36video capture data,
  • 17:37as well as brain imaging
  • 17:38data.
  • 17:39And so,
  • 17:41yes, happy to, collaborate and
  • 17:43and,
  • 17:44work on this with others
  • 17:45more. So I'll I'll end
  • 17:46here just with the acknowledgment
  • 17:48slide. In particular, I just
  • 17:50wanna highlight Mark Gerstein,
  • 17:52whose lab I'm in, and
  • 17:53also Walter Roberts from Yale
  • 17:55Psychiatry who
  • 17:57we,
  • 17:58they they're the co co
  • 18:00senior authors on the study.
  • 18:01And then also I just
  • 18:02wanna highlight Beatrice who co
  • 18:04led this study with me
  • 18:05and, Yun Yang, a very
  • 18:06talented graduate student who's been
  • 18:08working with me, on this
  • 18:09project as well, as well
  • 18:10as some of the collaborators,
  • 18:12from Barcelona and California. So
  • 18:14with that, thank you very
  • 18:15much.
  • 18:16Yep.