Skip to Main Content

Using Large-Scale Clinical Data for Discovery In Multiple Sclerosis And Epilepsy

April 30, 2021
  • 00:00It's my pleasure to now introduce our
  • 00:03next speaker Doctor Chris Christie,
  • 00:05office Doctor Kostakis,
  • 00:07graduated from the Imperial College
  • 00:09London and earned his PhD at the
  • 00:12University of New South Wales in Sydney.
  • 00:15After postdoctoral training at
  • 00:16the Broad Institute and MTH,
  • 00:18where he undertook some of the
  • 00:21first Genome wide Association
  • 00:22studies in autoimmune disease,
  • 00:24he joined Yale in 2010.
  • 00:26His laboratory uses genetics,
  • 00:28genomics and epidemiological
  • 00:29approaches to identify the
  • 00:31biology underlying autoimmune.
  • 00:32And neurological diseases
  • 00:33dersum floor is yours,
  • 00:35Doctor Kostakis.
  • 00:39Thank you Nicole.
  • 00:40I'm afraid I can't start my video
  • 00:42so it's the tech people would like
  • 00:44to start that, but it's fine.
  • 00:48There we go. Hello everyone,
  • 00:49I also have no disclosures and I
  • 00:51would like to take the next few
  • 00:53minutes to tell you a little bit
  • 00:55about some of the things we've
  • 00:57been thinking about in my lab.
  • 01:02Specifically, how to use large
  • 01:04scale data of both clinical and
  • 01:07genetic data to make discoveries in
  • 01:10diseases that we're interested in,
  • 01:12specifically multiple sclerosis and epilepsy.
  • 01:16And I chose these two projects
  • 01:18to talk about because they are
  • 01:20quite early on in their inception,
  • 01:22so we're still not 100% sure
  • 01:25what the story is,
  • 01:26but I think it is instructive to
  • 01:29look at what we can do with data.
  • 01:33So the first story is multiple sclerosis.
  • 01:36This is a large scale project
  • 01:39out of the from the EU,
  • 01:41primarily funded by one of
  • 01:44the EU horizon programs.
  • 01:45It's led by colleagues at the
  • 01:48Carolyn Skirt Institute in Sweden,
  • 01:50and it covers 10 countries,
  • 01:52including site at Yale and a site at UCSF,
  • 01:56and all the other partners are
  • 02:00in in Europe and our main.
  • 02:03Or our main focus is on multiple sclerosis,
  • 02:08which is a predominantly autoimmune
  • 02:10disease of the brain where the immune
  • 02:14system basically decides that it does not
  • 02:17like the myelin sheath around white matter.
  • 02:21In neurons you get
  • 02:23infiltration of immune cells.
  • 02:25Stereotypically T cells that
  • 02:27cause myelin stripping around.
  • 02:30Blood vessels in the brain and
  • 02:32you get these large lesions in
  • 02:35the brain and progressively get
  • 02:37more and more lesions overtime.
  • 02:39And that leads to a relapsing remitting,
  • 02:42usually mode of disease,
  • 02:43where there is both physical and
  • 02:45cognitive decline and eventually
  • 02:47this becomes permanent and
  • 02:49patients experience ongoing and
  • 02:51progressive disability.
  • 02:52It is a lifelong disease.
  • 02:54There are disease modifying therapies,
  • 02:56but there exists no cure and
  • 02:58it is one of the more common
  • 03:02neurological diseases out there.
  • 03:04And like most such diseases,
  • 03:06you can find some families where the
  • 03:09disease appears to run in the families,
  • 03:12but most cases are sporadic.
  • 03:14It looks like familiar landmass
  • 03:16is not a single gene form of Ms.
  • 03:19It is exactly like the sporadic
  • 03:21form it is polygenic.
  • 03:23It is extremely complex.
  • 03:24We have at least 200 loci mapped from large
  • 03:28scale genome wide Association studies.
  • 03:31We estimate there's probably another
  • 03:33800 to 1000 out there in the genome,
  • 03:36and a large effort now across the
  • 03:39world is has been initiated to try
  • 03:42and figure out what those genes do.
  • 03:45But also to see how we can use
  • 03:47some of this information and one
  • 03:50of the problems has been that
  • 03:52this disease is quite common,
  • 03:54but it's not type 2 diabetes comma,
  • 03:57so it's about one in 1000
  • 03:59in European population,
  • 04:00so it's fairly common,
  • 04:01but no one really has a cohort of
  • 04:0420 or 30,000 patients who have all
  • 04:07been seen for a very long time in one
  • 04:10clinic where data have been collected.
  • 04:13In the same way,
  • 04:15by the same people.
  • 04:16And So what you have to do is Unite
  • 04:19data across many centers.
  • 04:22Often with differing practices
  • 04:24with differing CHRS,
  • 04:25or before that just paper records
  • 04:29and try and put these data
  • 04:32together in some meaningful way.
  • 04:36So you can make large scale inferences
  • 04:38and this goes back to what IRA
  • 04:41said initially about how even as
  • 04:43a biobank we need to be one of
  • 04:45the network of biomax and this is
  • 04:48very much what we've been trying
  • 04:50to do in a disease focused way,
  • 04:52and so this project has been
  • 04:54aiming to do exactly that,
  • 04:56and then from these large scale data,
  • 04:58try and see if there are subsets
  • 05:00of patients who seem to respond
  • 05:02differently to therapy who seem to have
  • 05:05different outcomes that is predictable.
  • 05:07And that might maybe mechanistic because
  • 05:10the problem is like most complex diseases,
  • 05:13Ms is extremely heterogeneous at diagnosis.
  • 05:16There is effectively no prognosis that
  • 05:18one can give to a patient 'cause they may
  • 05:22be severely disabled within five years,
  • 05:25or they may be just fine 20
  • 05:28years down the line,
  • 05:30it's very hard to tell anything
  • 05:32to tell a patient anything,
  • 05:35and that is a major issue.
  • 05:38And So what we've been doing is we have
  • 05:41been warehousing both clinical and
  • 05:43genetic data across these collections.
  • 05:46And what I'm showing you so far
  • 05:49is the progress we've done.
  • 05:52We've made with the genetic data,
  • 05:54which is about 45,000 Ms patients across
  • 05:5810 centers and 26,000 controls to date.
  • 06:01And this by itself has been
  • 06:03a fairly major nightmare,
  • 06:05not least of which has been the paperwork
  • 06:08becauses the GDP are the privacy law
  • 06:10that is come into effect in Europe,
  • 06:13has really done a number of this on
  • 06:16this and we've had like a major.
  • 06:19It took us a year and a half to unwind
  • 06:23the legal implications of that,
  • 06:26but these are real issues that
  • 06:28will have to be faced when we think
  • 06:32about federations of biobanks,
  • 06:34or of case control cohorts across places,
  • 06:37and we're also trying to Unite clinical data.
  • 06:41We have about 60,000 patients worth of
  • 06:44clinical data with different amounts
  • 06:46of data for different patients.
  • 06:49And we are still trying to resolve those,
  • 06:51and ultimately what we want to be
  • 06:54able to do is to build predictors
  • 06:56of outcomes which we have captured
  • 06:58in the clinical data using both.
  • 07:01Other data that we have on the data
  • 07:03on the patients and the genetic data.
  • 07:07Just the genetic data,
  • 07:09which is a fairly standard platform.
  • 07:13Gina type this.
  • 07:14The vast majority of this is
  • 07:16genotyping relevant sequencing.
  • 07:17There are different platforms
  • 07:19on which one can genotype,
  • 07:21but they're fairly standard.
  • 07:22It's a fairly homogeneous data type.
  • 07:24It has taken us about a year to put
  • 07:27these data together because there
  • 07:29is a pretty significant amount of
  • 07:31work involved in actually Q seeing
  • 07:34and processing data,
  • 07:35and so just that has been
  • 07:37a nontrivial challenge.
  • 07:39We have now overcome this.
  • 07:40We now have this unified collection.
  • 07:43Unlike most case control cohorts where
  • 07:45we do genome wide Association studies,
  • 07:47we actually have deeper information
  • 07:49rather than just whether someone
  • 07:51is a case or a control,
  • 07:52and we're now trying to put these
  • 07:55data together so this the next couple
  • 07:57of years I think are going to be
  • 08:00very exciting here as we try and
  • 08:02figure out if there are predictors
  • 08:04for both outcomes and treatment
  • 08:07outcomes in treatment
  • 08:08responses in these patients.
  • 08:10What we have so far in the clinical data,
  • 08:13I will show you there very briefly.
  • 08:16These are all sorts of lifestyle and clinical
  • 08:19data that seem to segregate patients.
  • 08:21This is a principle components analysis
  • 08:23of our entire phenotype matrix,
  • 08:25and you can see that there are.
  • 08:30Phenotypes seem to correlate with age.
  • 08:32In the top left you can see that the
  • 08:36dominant trend in our patients is
  • 08:38actually age and that kind of makes sense.
  • 08:42It's a progressive disease.
  • 08:43It's a lifelong disease.
  • 08:45Older individuals tend to have more symptoms,
  • 08:48and you can definitely see things like that,
  • 08:51but that's an important confounder as well.
  • 08:54Age is an important aspect of disease
  • 08:57that we often don't talk about.
  • 09:00We see more interesting things
  • 09:02if you look at that second panel
  • 09:05from from the left on the top.
  • 09:08There's a correlation with
  • 09:10natural UV exposure,
  • 09:11'cause it turns out the vitamin D is actually
  • 09:13an important component of Ms Pathology.
  • 09:16It is a risk factor.
  • 09:17It appears to be causal in ways
  • 09:19that we don't really understand.
  • 09:22But there are lifestyle exposures
  • 09:23like that as well and they are
  • 09:26definitely coming out of the the data.
  • 09:28We also see smoking behaviors.
  • 09:30Gender is an important component and so on.
  • 09:32So as we start pulling all of
  • 09:34these clinical data together,
  • 09:36we were getting patterns even
  • 09:37from very simple.
  • 09:40Views of data. This is like a very
  • 09:44naive exploratory way to look at data,
  • 09:46but we're seeing patterns even in that way,
  • 09:49just for the remainder of the time.
  • 09:52I'd like to switch for a second and tell
  • 09:55you about a different project that is
  • 09:58still quite similar in flavor I think,
  • 10:01which is about epilepsy.
  • 10:03Identifying predictors of
  • 10:04psychiatric disease, and epilepsy.
  • 10:05This is funded by the NINDS and it
  • 10:08is a collaboration between Yale,
  • 10:10Arhus University,
  • 10:11Helsinki University and the.
  • 10:13Rodents chew epilepsy is a.
  • 10:18Basically it disease where
  • 10:20of seizures in the brain.
  • 10:22It is abnormal electrical activity.
  • 10:26That is often repeated.
  • 10:27You see 2 broad types of seizures on EG.
  • 10:31You see either a generalized seizure
  • 10:34pattern that takes up a large portion
  • 10:37of a hemisphere or the entire brain,
  • 10:40or you see very focal abnormal electrical
  • 10:44activity in one area of the brain.
  • 10:48Again,
  • 10:49it is a common neurological disease
  • 10:52about one and 26 people in the
  • 10:55US have a diagnosis of epilepsy.
  • 10:58It is a complex disease.
  • 11:00There exist certain single gene forms
  • 11:03of it that explain about 14% of cases,
  • 11:07but the other 8586% is this more common
  • 11:11complex form again polygenic many genes.
  • 11:14Heritable,
  • 11:14but not simply heritable.
  • 11:19And what we've been doing has
  • 11:21been working with some colleagues
  • 11:23at Arhus University in Denmark.
  • 11:26Like many of the Nordic countries,
  • 11:28Denmark has an integrated.
  • 11:31Our health care system for which records
  • 11:35are completely available for research,
  • 11:37so the population of Denmark
  • 11:40is about 5,000,000 people.
  • 11:42There are records roughly for
  • 11:44about 2,000,000 people who have
  • 11:46interactions with the hospital system.
  • 11:48We we tend to limit this to people
  • 11:52who've had interactions recently.
  • 11:55By which I mean after 1981.
  • 11:59Becauses people born after 1981
  • 12:01also have blood spots stored
  • 12:04in the Staten Serum Institute
  • 12:06from which we can extract DNA.
  • 12:09So you can do population level
  • 12:11genetics based on the hospital
  • 12:13registers across the entire population.
  • 12:16And so we limited this to this
  • 12:19and one of the things that we
  • 12:23observed about four years ago now.
  • 12:26Is that if you look at individuals
  • 12:30with a diagnosis of epilepsy,
  • 12:34you find a strong overrepresentation
  • 12:38of mental illness diagnosis.
  • 12:41In that population.
  • 12:42So if you look right at the top,
  • 12:45there's about 1.3 million people who
  • 12:47do not have a diagnosis of epilepsy,
  • 12:50and there are reference.
  • 12:51And there's about 10 and a half
  • 12:54thousand people who do have a
  • 12:56diagnosis of epilepsy and they have
  • 12:59somewhere between 1.4 and 1.6 fold.
  • 13:01Higher rates of psychiatric
  • 13:03illness diagnosis.
  • 13:03These are all diagnosis
  • 13:05from hospital registers.
  • 13:06They're not necessarily strong,
  • 13:08strongly followed by individual physician,
  • 13:10so this is not a cohort, these are.
  • 13:13Medical records and that's worth.
  • 13:16Highlighting, however.
  • 13:19Psychiatric illness is itself genetic.
  • 13:21Again, it is complex.
  • 13:23There have been many.
  • 13:25Genetic studies of that and so is epilepsy.
  • 13:28And So what we are trying to
  • 13:30figure out is if we can ask.
  • 13:33Does the epilepsy cause psychiatric illness,
  • 13:36or are these both either independent
  • 13:39effects or both effects of a
  • 13:43shared underlying pathology?
  • 13:45That's an interesting question,
  • 13:47because we think we can then develop.
  • 13:49Predictors for given.
  • 13:54That you have a diagnosis of epilepsy.
  • 13:56What is the probability that
  • 13:59you actually develop?
  • 14:00Psychiatric illness post epilepsy.
  • 14:02It's not even we can see in the data
  • 14:07that not everyone is at equal risk,
  • 14:10but we do not yet understand
  • 14:12who is a higher risk,
  • 14:14and so we're taking everything
  • 14:16from school records which exist
  • 14:18in separate registers which can be
  • 14:20cross referenced to the hospital
  • 14:23registers to genetic profiles.
  • 14:25Two prescription reimbursements to see
  • 14:28whether people have refractory disease,
  • 14:31or whether they've cycled through
  • 14:34many antiepileptic medications.
  • 14:37And we're trying to build these predictors,
  • 14:39becauses it seems rather important
  • 14:41to know who is at substantial
  • 14:43additional risk given a diagnosis
  • 14:46of epilepsy relative to the others.
  • 14:48I will just finish with a small
  • 14:51little vignettes of almost accidental
  • 14:53findings that we've seen again looking
  • 14:55in these registers because we have
  • 14:58an entire population to look at.
  • 15:02We observe that. People with epilepsy
  • 15:07are more likely to have a mother with
  • 15:11epilepsy than a father with epilepsy.
  • 15:15For reasons that we do not understand,
  • 15:18there exists this maternal effect.
  • 15:19If you just see in red,
  • 15:21there's about a 1 1/2 fold.
  • 15:24Enrichment of.
  • 15:26Maternal epilepsy than paternal epilepsy.
  • 15:29We're not sure if this is genetic
  • 15:31or if something else is going on.
  • 15:33This is a very unexpected finding.
  • 15:36It's been reported at least once before in.
  • 15:40In much smaller cohorts,
  • 15:41and there's been a long standing dispute
  • 15:45in the field about whether this is true,
  • 15:48and across about 1.75 million
  • 15:50people in Denmark weekend.
  • 15:52Unequivocally see there is a sort of
  • 15:55fairly meaningful increase in this risk and
  • 15:57quite well underlies this maternal effect.
  • 15:59We don't know, but it ties into our
  • 16:02interest in *** bias in disease,
  • 16:04and we're looking forward
  • 16:06to following this up.
  • 16:07So I want to just leave that there.
  • 16:10Ask and I'll hand back to Nicole,
  • 16:12who I think will introduce the
  • 16:14next speaker or handle questions.