Skip to Main Content

Biomedical Informatics and Data Science at Yale

March 04, 2024

Janeway Society First Friday Seminar Series

March 1, 2024

Topic: Biomedical Informatics and Data Science at Yale

Presenter: Lucila Ohno-Machado, MD, MBA, PhD the Waldemar von Zedtwitz Professor of Medicine and Biomedical Informatics and Data Science; Deputy Dean for Biomedical Informatics; Chair, Section of Biomedical Informatics and Data Science

ID
11412

Transcript

  • 00:00Thank you all for joining today.
  • 00:02Today we have Lucia Ohana Machado,
  • 00:04who's the Professor of Medicine
  • 00:06and Professor of Biomedical
  • 00:07Informatics and Data Science.
  • 00:09She's also the Deputy Dean for
  • 00:11Biomedical Informatics and Chair
  • 00:13of the new Section of Biomedical
  • 00:14Informatics and Data Science.
  • 00:16Lucilla came to us from the
  • 00:19University of California in San Diego
  • 00:21where she was the Associate Dean and Chair
  • 00:23of Bio, Chair of Department of
  • 00:26Biomedical Informatics and she
  • 00:28is going to describe some of the
  • 00:30initiatives that BIDS is running towards
  • 00:33broadening biomedical and clinical
  • 00:36data access and utilization at Yale.
  • 00:40Thank you, Leslie.
  • 00:42Thank you. Let me share the screen
  • 00:47and make sure it's the right one.
  • 00:49We can see it looks great excellent.
  • 00:52So thank you and and because it's a small
  • 00:54group please feel free to interrupt.
  • 00:56I might not always see your hand up
  • 00:59but I asked the the organizers to help
  • 01:02me if there is any question in the
  • 01:05middle in and today what I will present
  • 01:08is mostly what's in it for you at the
  • 01:12Biomedical Informatics and Data Science.
  • 01:13I'm not necessarily presenting
  • 01:16our research or featuring a lot of
  • 01:20projects that are ongoing right now,
  • 01:23but it's essentially how you can
  • 01:25take advantage of new things that are
  • 01:28coming to enhance your own portfolio.
  • 01:33So as you know informatics has been
  • 01:35at you for a very long time.
  • 01:38Since
  • 01:4319, 1985 or 86 there has
  • 01:45been a training grant.
  • 01:47So you can imagine before that there
  • 01:50was already activity in this area
  • 01:52with the Yale Center for Medical
  • 01:55Informatics as as well as many
  • 01:58activities in Computational biology.
  • 01:59But the faculty had appointments
  • 02:02in different departments and and
  • 02:04Yale was was not perceived as a
  • 02:10large informatics academic unit.
  • 02:14So we came to fulfill that that
  • 02:18need and created the section,
  • 02:21the first standing section of
  • 02:23Biomedical Informatics and Data
  • 02:25Science starting January 1st of 2023.
  • 02:28So we're here for a little over
  • 02:31a year and with the the mission
  • 02:34of research training and services
  • 02:36with the research with the faculty
  • 02:38who are currently in department
  • 02:41focusing on AI and medicine in
  • 02:44our natural language processing
  • 02:46including large language models,
  • 02:48privacy enhancing technology,
  • 02:51blockchain technology.
  • 02:52In some examples of large projects
  • 02:55that did transfer to Yale,
  • 02:58the Centre of Excellence in
  • 03:00Genome Science focused on admix
  • 03:03genomes and also recruitment.
  • 03:05A portion of the all of Us precision
  • 03:09medicine program is is not here at
  • 03:12Yale and we're in Puerto Rico as
  • 03:15an associated entity in training.
  • 03:19We have curriculum for Master's PhD students.
  • 03:24We, we do mentor postdocs,
  • 03:26clinicians moving into this area,
  • 03:29biomedical data scientists and we'll
  • 03:32have summer interns from North America
  • 03:35and also from Africa starting this year.
  • 03:38The new certificate program in medical
  • 03:41software and medical AI has been advertised.
  • 03:45The applicants have submitted their forms
  • 03:49and we are going to start very soon.
  • 03:54In terms of infrastructure and services,
  • 03:56we are mounting the research,
  • 03:58computing and clinical and biomedical
  • 04:00data services that may be of interest
  • 04:03to you and that's what I'll focus
  • 04:06my presentation on and of course
  • 04:09collaborating and grant proposals,
  • 04:11quality safety research.
  • 04:14We were fortunate to to get about
  • 04:17$17 million this year of of grants.
  • 04:21Since the multiple year grants we hope to
  • 04:25have increased that in subsequent years.
  • 04:28There were 19 grants in contracts
  • 04:31and 27 new proposals by our faculty.
  • 04:34So I'll talk mostly about the service
  • 04:38portion which is staffed by our faculty.
  • 04:41It's the Yale Biomedical Informatics
  • 04:44and Computing
  • 04:48Center, or CORE, in which we have
  • 04:51a Research Information officer,
  • 04:53and that's the Doctor Daniella Meeker in a
  • 04:59Computational Infrastructure Officer with
  • 05:01Assistant Dean for Informatics washroom.
  • 05:04And the principle here is our section,
  • 05:08it's one of several units that
  • 05:11benefits from informatics,
  • 05:12informatics, infrastructure,
  • 05:13the Department of Medicine is another one,
  • 05:16surgery, various departments,
  • 05:21YCCI, the YCC and so on.
  • 05:22And then we operate with partner entities
  • 05:25such as the EU New Haven Health System,
  • 05:28the EU University and so on.
  • 05:31And we have to do that with compatible
  • 05:36policies in essentially we're here
  • 05:40to support research operations
  • 05:42related to data and compute that
  • 05:46tied together this this 3 units.
  • 05:51So in terms of the department centres,
  • 05:54every department will have
  • 05:56an informatics liaison.
  • 05:58Who is the main point of contact?
  • 06:01Who gather helps prioritize needs
  • 06:03of the departments and bring to us
  • 06:06identify data sources that are needed,
  • 06:09refer trainees,
  • 06:10and partner in grant proposals
  • 06:14or other activities.
  • 06:16So let me get back to here
  • 06:19that you see an initial set of
  • 06:23Advisory Board members who help
  • 06:26us prioritize what we need to do.
  • 06:29Not all departments yet represented,
  • 06:31but we we think we have a large
  • 06:35coverage here and plan to expand.
  • 06:38Now we ran an AI survey you you may
  • 06:42have gotten this spam in our inboxes
  • 06:45more than once and we wanted to survey
  • 06:48what are the capabilities and interest
  • 06:51in AI at the School of Medicine.
  • 06:54So we had 1100 respondents from
  • 06:58all categories and there's high
  • 07:00interest in AI and in AI training.
  • 07:04So on the right hand side you will
  • 07:06see the number of respondents and
  • 07:08then the on the line going across
  • 07:11it is the percentage of them
  • 07:14who were interested in training.
  • 07:17So you can see it's it's all more
  • 07:19than half of the respondents
  • 07:21and in general great interest in
  • 07:25ChatGPT in machine learning and
  • 07:28language models and so on.
  • 07:31Now we ran in two months we organized
  • 07:36an AI in medicine symposium,
  • 07:38the first one here at the School
  • 07:41of Medicine in which we wanted also
  • 07:44to showcase and actually honestly
  • 07:47learn about what kinds of AI are
  • 07:49being done in various departments.
  • 07:53We one of the presentations was about
  • 07:56that this model we we call Meiyama
  • 07:59a medical large language models.
  • 08:01LAMA is a foundation model produced by Meta,
  • 08:07formerly Facebook,
  • 08:08in which you can add things on top of it,
  • 08:10do more training on top of an existing
  • 08:14foundation model and then create
  • 08:16medical large language models and
  • 08:18that that was done by researchers from
  • 08:22our team and there are others as well.
  • 08:25So this is one of them.
  • 08:27Another one is called Meditrom,
  • 08:29also a faculty member in BIDS.
  • 08:32In our unit there's ophthalmology
  • 08:34GPT and this is done by a faculty
  • 08:38who has joint appointments with us
  • 08:40and with a secondary appointment
  • 08:43in ophthalmology.
  • 08:44And also there is GUT GPT and another
  • 08:49large language models and and this one
  • 08:53coordinated by by Dennis Chung and NGI.
  • 08:57There are opportunities.
  • 08:58So Doctor Kara has mounted this
  • 09:02slide in in terms of engagement
  • 09:05with healthcare that there are
  • 09:07parts of it that we are currently
  • 09:10not doing as adequately to get
  • 09:12to the endpoint of having people
  • 09:14interact with the health system.
  • 09:17So he he mentions experts cannot
  • 09:20consistently diagnose structural heart
  • 09:22disorders and it's a hidden disease
  • 09:25and we often only diagnose when
  • 09:27it's at a more advanced stage.
  • 09:30So there are apps and there are
  • 09:34other AI based techniques that
  • 09:36help diagnose earlier before the
  • 09:40advanced treatment and evaluation.
  • 09:43So here is a slide from his lab
  • 09:47in which he puts on the left hand
  • 09:50side the AICG applications that
  • 09:52he has already put it out there.
  • 09:55There's a wearable cards DS plus
  • 10:00application as well and trials
  • 10:02that have been done on top
  • 10:04of machine learning models,
  • 10:06again developed here at Yale.
  • 10:12Doctor Sheng in GI is also several
  • 10:16applications, one of them related to
  • 10:20human algorithmic interaction with
  • 10:23several collaborators and that uses that
  • 10:26gut GPT application that I I mentioned.
  • 10:29So a lot of interesting
  • 10:32applications here at at Yale.
  • 10:38Those two by the way are secondarily
  • 10:41appointment with us and they're
  • 10:43primarily appointed in the
  • 10:45Department of Medicine at sections
  • 10:47of cardiology and GI respectively.
  • 10:49The BIDS faculty we have now 12
  • 10:53ladder rank primary faculty and 13
  • 10:56non ladder rank faculty and then
  • 10:59again 40 people who were appointed
  • 11:02in other departments and have
  • 11:05secondary with us and those are in 20
  • 11:09departments in five different schools.
  • 11:11So what we see is on the on the left hand
  • 11:16side BIDS having data workflow management,
  • 11:19AI and methods expertise in
  • 11:20the departments and centre 7.
  • 11:22On the right hand side the data
  • 11:24and the domain expertise for the
  • 11:27particular application that we are
  • 11:29talking about be that AI or other
  • 11:31forms of informatics and data science.
  • 11:34So in terms of modalities,
  • 11:36we do have bioinformatics,
  • 11:39imaging and clinical informatics and
  • 11:42some are more populated than others at
  • 11:45present and our plan is to keep moving
  • 11:49and having this applications done on on.
  • 11:53Again on the department and centre
  • 11:55side we find many people who
  • 11:58have been doing informatics on a
  • 12:00fragmented way and would try to not
  • 12:03only acknowledge what they're doing,
  • 12:05but try to synergize and create
  • 12:09the computational infrastructure
  • 12:11that it can again can get to high
  • 12:15higher levels because there are more
  • 12:18people using them at this point.
  • 12:21So our goal of developing new computer
  • 12:24based methods and systems to improve
  • 12:26human health is part of our mission to
  • 12:30train the next generation and then help
  • 12:32researchers make maximal use of data.
  • 12:35So I would expect many today in
  • 12:38the audience are working with the
  • 12:40data from the health system or from
  • 12:43clinical trials or from basic science
  • 12:46and in that all data management
  • 12:49in advanced methods may benefit.
  • 12:54So AI is the current interest of
  • 12:58of many people and there are NIH
  • 13:01requests for applications in many,
  • 13:03many different areas in institute
  • 13:06that puts them here as examples.
  • 13:08And then on the left hand side,
  • 13:10I select some topics that are very,
  • 13:14very related to it as well as projects
  • 13:17that we are currently involved.
  • 13:20One is aim ahead the HealthEquity and
  • 13:23advanced diversity for the workforce.
  • 13:26The other one is Bridge two AI in
  • 13:29other common fund programs and and
  • 13:32as I mentioned before the all of
  • 13:35Us Precision Medicine initiative
  • 13:37AI is a team sport.
  • 13:38So again I put both both sides here so
  • 13:42that we don't exist on methods only
  • 13:45we have to have you know important
  • 13:48applications and have the right
  • 13:50questions to answer and that's where
  • 13:53regardless of the medical specialty
  • 13:55we we do need partnerships and help.
  • 13:59So data,
  • 14:00computational infrastructure and training
  • 14:02is what we're we're trying to do.
  • 14:05And and here's a busy diagram in
  • 14:09terms of the computing infrastructure
  • 14:11that we found
  • 14:13needs to be developed at Yale so
  • 14:16that we can have everyone fulfill
  • 14:18their application needs as well
  • 14:21as new development of methods.
  • 14:25So going from from #1 to #4,
  • 14:28what safe is a computational
  • 14:31infrastructure designed to have HIPAA
  • 14:34compliance and to have protect sensitive
  • 14:38information such as electronic health
  • 14:41records and clinical trial data and
  • 14:44that has a certain amount of storage
  • 14:47that that needs to increase from.
  • 14:50What is that?
  • 14:51There is already a system in there,
  • 14:53but it it needs a lot of updates for
  • 14:58that and the research Virtual Display
  • 15:02Interface is how you use your own
  • 15:07computer to serve as a terminal to
  • 15:10servers that are much more powerful
  • 15:12and again secured for this operation.
  • 15:14So you you use your computer to access data
  • 15:19and to do analysis on this protected server.
  • 15:23So, so this again is is being enhanced from
  • 15:27an existing seed compute infrastructure
  • 15:31that existed and then you have the cloud,
  • 15:35the Amazon Web Services spin up plus
  • 15:39which will be more secure version of
  • 15:43the current capability that exists
  • 15:46throughout the university.
  • 15:48This will be possibly adequate
  • 15:51for genome annotation,
  • 15:53genome analysis and image processing as well.
  • 15:58And then finally on the right hand side,
  • 16:01it's what we are calling Microscope
  • 16:03High Performance Computing.
  • 16:05And this is a set of servers with GPU nodes
  • 16:09and these are graphic processing units
  • 16:13where they're necessary to do AI these days.
  • 16:17And it is something that currently is
  • 16:20not available to to our community.
  • 16:25So why do we do all that?
  • 16:26Well,
  • 16:27because we we want to use data.
  • 16:28And in the past I've
  • 16:30created data networks from
  • 16:34several institutions,
  • 16:35including the national VA,
  • 16:38so that we could use more data because
  • 16:41however large our system can be,
  • 16:45there are important other diversity
  • 16:47of data in in other areas.
  • 16:49So that's something we want to
  • 16:52eventually participate in this clinical
  • 16:55data networks with participation of
  • 16:58the Yale electronic health records,
  • 17:01clinical data and so on.
  • 17:02We're we're not there yet.
  • 17:04There's a lot to be done in data
  • 17:06curation and preparation but but we're
  • 17:09working towards that goal also in
  • 17:11terms of precision medicine and and
  • 17:14PRS which has polygenic risk scores.
  • 17:17We we do think we need to pay attention
  • 17:21to that particularly because the
  • 17:24current studies are lacking diversity
  • 17:28in the population and they're
  • 17:31incorrect findings being reduced.
  • 17:33So can we play a role in that
  • 17:37with again the diversity of the
  • 17:40local population participating
  • 17:41in in more studies and and then
  • 17:44doing this large networks as well?
  • 17:46Can we do that without introducing
  • 17:49more biases and can we protect privacy?
  • 17:52How do we study individuals from mixed
  • 17:56ancestries who are currently many
  • 17:58times discarded from analysis because
  • 18:02it's harder to do the competition with them?
  • 18:05So we have a A Center of Excellence
  • 18:08in Genome Science based on Admixture
  • 18:10Center of Admixtured Science and
  • 18:13Technology in which we want to
  • 18:16account for particular ancestry
  • 18:19based on the specific chromosome and
  • 18:23location rather than global ancestry
  • 18:26as it's currently done in genetic
  • 18:29studies because we know there's
  • 18:32improve power if we include that
  • 18:35mixed populations in the studies.
  • 18:39So just schematically A trait is
  • 18:43influenced by genetic determinants,
  • 18:46exposures,
  • 18:46social determinants and within genetic
  • 18:49determinants there are variant effects.
  • 18:53There is also ancestry effects and
  • 18:57we need to account for for all of
  • 18:59that and the way we do things,
  • 19:01unfortunately or fortunately,
  • 19:03that the data are sequestered
  • 19:06in different compute enclaves.
  • 19:09Fortunately because that helps protect
  • 19:12the security and privacy of the data,
  • 19:16but unfortunately because it makes
  • 19:18the calculations way more difficult
  • 19:21to to be done.
  • 19:22But we have technology workarounds
  • 19:26that allow us to compute with both sets
  • 19:29of data that this will be eventually
  • 19:332 million whole genome sequences
  • 19:35that we can use to do precision medicine.
  • 19:39So we have work in this area of
  • 19:42secure Federated algorithms in
  • 19:45which currently we are working
  • 19:48with computing with all of U.S.
  • 19:50data and the million veteran program data.
  • 19:53But we believe that as new Biobanks and
  • 19:58new programs appear in other countries,
  • 20:02if they follow a similar protocol
  • 20:05that we could eventually do
  • 20:07that with other nations as well.
  • 20:10And that would greatly increase
  • 20:12the power of all the analysis.
  • 20:15And again,
  • 20:16these are algorithms that allow
  • 20:19all participant data to stay
  • 20:21in their compute enclaves.
  • 20:24Kindness of questions that the
  • 20:26precision medicine programs such as
  • 20:29all of us try to answer once again,
  • 20:32all all sorts of them and and several
  • 20:35different specialties, right so.
  • 20:37So this was a slide from the
  • 20:39program that helps to disseminate
  • 20:42and try to convey the importance of
  • 20:45doing a cohort like this where we
  • 20:50have electronic health record data,
  • 20:52survey data, physical measurement data,
  • 20:55bio samples in a a small part,
  • 20:58wearables and digital app data as well.
  • 21:02So we are recruiting here
  • 21:05in New Haven. We have at
  • 21:11YCCRAYCCI a recruitment portion
  • 21:14and also in in a few other places
  • 21:17in we partner with the Precision,
  • 21:22the Puerto Rico Center for
  • 21:24Clinical Investigation as well.
  • 21:26We have a recruitment site in there,
  • 21:29so adding to to the program.
  • 21:32And then another thing that I I
  • 21:35wanted to talk about is because it's
  • 21:38being launched as as we speak which
  • 21:41is the Cosmos community data set.
  • 21:44Cosmos is an initiative from Epic of the
  • 21:47vendor of electronic health records system.
  • 21:50In that if a particular customer of
  • 21:54EPIC such as us wants to participate
  • 21:58then you would have access for de
  • 22:02identified data for research not
  • 22:05only of our own data here at the EU
  • 22:09New Haven health system but also of
  • 22:12other institutions participating.
  • 22:14So right now there are 230 million
  • 22:18patients represented with several
  • 22:20encounters and face to face visits.
  • 22:23So it's it's about 1300 hospitals
  • 22:27so quite a a a very large group
  • 22:31of them and and you can use this
  • 22:34data for research
  • 22:38the analysis tools the one that
  • 22:41is available for everyone is
  • 22:43Slicer Dicer on this larger set.
  • 22:46Of course you you already know of
  • 22:49Slicer Dicer for the electronic
  • 22:51health record systems for the epic
  • 22:54system here at the health system.
  • 22:56But this would be you can as as
  • 23:01we get to this launched use Slicer
  • 23:04Dicer as A to do research or to do
  • 23:09some aggregate in graphics on the
  • 23:13full population which is about 200.
  • 23:16It's above 200 such health systems
  • 23:20represented so much larger than than our own.
  • 23:24And then the analysis tools.
  • 23:27You can also do command line R,
  • 23:30Python, SQL statements,
  • 23:33but that requires certification that it
  • 23:36requires that you you were trained in
  • 23:39in the particular aspects of doing that.
  • 23:42So what we're doing right now is
  • 23:46to start with a class of 1 or more
  • 23:49representatives for different departments
  • 23:51so that they can be trained on this,
  • 23:54and then start helping fulfill requests
  • 23:58that come from their own departments.
  • 24:02Their data management tools are listed here.
  • 24:05And then code libraries as well.
  • 24:07And some of you may recognize some of
  • 24:11these libraries used for AI type research.
  • 24:14The good thing too is that this whole
  • 24:18computer environment is already
  • 24:21existing is from the epic side.
  • 24:24So while we build ours,
  • 24:27there is one that can already be
  • 24:29used by data analysts.
  • 24:34Now the the timeline is like I said,
  • 24:36we are right around the,
  • 24:38the corner of being available
  • 24:42in terms of Cosmos Live meaning
  • 24:45this slicer Dicer portion of it.
  • 24:48And then we we have people doing
  • 24:52the prerequisite completion for the
  • 24:54course that they will take April 8:00
  • 24:56and 9:00 in order to be certified
  • 24:59to be used for that platform.
  • 25:02There will be 30 initial people
  • 25:05trained and and then monthly
  • 25:07we will add a few more.
  • 25:12So in summary, what we're doing for the
  • 25:15informatics infrastructure for research
  • 25:17that you all benefit from is catching
  • 25:21up on hardware and cloud security,
  • 25:25cloud environments and also on the data,
  • 25:28on preparing the data for
  • 25:31analysis we are are in,
  • 25:34we have some initial work and policies
  • 25:36and and the training of people,
  • 25:38we have much more to do in this area,
  • 25:41development of software and then the
  • 25:43launching of services in in the overall
  • 25:46training is also in the pipeline.
  • 25:48But but we decided to go one thing at a
  • 25:54time because we were still a small group
  • 25:56as I mentioned before the researchers
  • 26:00in the case we're recruiting faculty
  • 26:02not only to do their own research
  • 26:05and be users and testers of this
  • 26:07environment but also those who lead
  • 26:09the development of this whole services.
  • 26:12So, so that has been an
  • 26:14area of emphasis of ours,
  • 26:17research scientists and postdocs.
  • 26:18We need to have PhD students,
  • 26:21medical students, undergrads,
  • 26:22the whole community of researchers
  • 26:25to to benefit from this environment.
  • 26:28But the environment is still coming up,
  • 26:30which is why I wanted to present here
  • 26:34and then also answer any questions
  • 26:38related to requirements or you know,
  • 26:43interest in external data sets as well.
  • 26:46Not just the electronic health
  • 26:48records from EPIC,
  • 26:49but some researchers work with claims
  • 26:52data that are licensed from other places
  • 26:57and some many other sources exist.
  • 27:01So.
  • 27:01So with that I'll stop sharing and see
  • 27:04whether there are any questions or comments,
  • 27:08suggestions on on how what you need
  • 27:10to do most so we can move ahead.
  • 27:22So I think one question I had was
  • 27:23you know if they're faculty that are
  • 27:25interested in working with bids is there,
  • 27:27how can they reach out?
  • 27:29I mean should they reach out to you,
  • 27:30is there an online form or how
  • 27:32does that work it it depends.
  • 27:36Typically if they aren't like in those
  • 27:39informatics people who happen to just
  • 27:41be in another department but want to
  • 27:44affiliate with us then they they talk
  • 27:46to me we we set up a a time go over
  • 27:51as secondary faculty what they they
  • 27:53would like to benefit from and and so on.
  • 27:56So those are the informatics
  • 27:59faculty appointed elsewhere,
  • 28:00the informatics users who
  • 28:03want for example data.
  • 28:05The whole JDAT.
  • 28:08Currently,
  • 28:08JDAT Group is being revamped and
  • 28:13renamed and reorganized so that the the
  • 28:18data requests can be more streamlined
  • 28:20than they are right now and actually
  • 28:23filtered because some of them are not,
  • 28:28you know,
  • 28:28at at the same level as as others.
  • 28:30In terms of how researchers have thought
  • 28:35about the request and whether electronic
  • 28:39health records can really answer the
  • 28:41question that they have in mind.
  • 28:43Because many times it's not.
  • 28:45The question you have is the
  • 28:48question that's electronic
  • 28:50health records can help answer.
  • 28:52So there there is a lot of re
  • 28:55restructure in that area in terms
  • 28:59of compute against some faculty.
  • 29:02But those are typically more the the
  • 29:05the informatics ones have computer
  • 29:07needs that we currently cannot fulfill.
  • 29:10But we're designing the new structure
  • 29:14in order to fulfill anything also
  • 29:19related to security of people who
  • 29:23might somehow have inherited data
  • 29:25sets in local servers or laptops or
  • 29:29other things that need to be moved.
  • 29:32That's the whole area in which the
  • 29:35Chief Research Information Officer,
  • 29:37Doctor Meeker is is working towards,
  • 29:40because we do have to move certain data
  • 29:44sets into environments that are more
  • 29:49up to date with regards to to security.
  • 29:57Yeah, I have. I see a doctor
  • 30:01chalk. Thank you, Doctor
  • 30:04Ono Machado for for a nice talk.
  • 30:07I just have one very short question.
  • 30:10You mentioned about the All of Us
  • 30:12precision medicine research program.
  • 30:14Is that immediately available to work on?
  • 30:19Can you speak a little
  • 30:20bit about that? Thanks.
  • 30:22Yeah. So the data for the All
  • 30:24of Us program is immediately
  • 30:26available and in fact several Yale
  • 30:28researchers have already published
  • 30:31based on on data from that program.
  • 30:33And the reason it's very timely
  • 30:36right now is one they also provide
  • 30:39their own computer environment.
  • 30:41So we don't need to provide ours.
  • 30:45You you essentially go online and
  • 30:49as long as you have an area Commons
  • 30:52account you can sign up for it.
  • 30:55There are a few
  • 30:58online training that you
  • 31:00have to to go through.
  • 31:01So there is some prerequisites in
  • 31:04order to access the environment
  • 31:06and then and that's for the
  • 31:09environment similar to Cosmos,
  • 31:10the environment that you do command
  • 31:12line that you select cohorts and so on.
  • 31:14If you just want to do a data
  • 31:17browser so similar to Sliced Redicer,
  • 31:19you can do it immediately
  • 31:22without registration.
  • 31:23The very interesting portion of
  • 31:26it is the genome sequencing,
  • 31:29which right now is already 350,000
  • 31:34and that's already one of the
  • 31:36largest collections smaller
  • 31:38than UK Biobank currently.
  • 31:41But in terms of diversity,
  • 31:43is is not even comparable because
  • 31:46the whole recruitment process,
  • 31:49the whole program is predicated on
  • 31:52eliminating that bias that exists
  • 31:55that genetic studies are based on
  • 31:59European populations for the most part.
  • 32:08Yes, Doctor,
  • 32:10wonderful, wonderful summary.
  • 32:11Thank you so much for sharing.
  • 32:14So I'm Kim Blendman.
  • 32:16I am an assistant professor here
  • 32:18in Medical Oncology at the School
  • 32:20of Medicine as well as an assistant
  • 32:22professor in Computer Science at
  • 32:23the School of Engineering and
  • 32:24Applied Science here as well.
  • 32:26And I was very much so interested
  • 32:28in your Cosmos community data set.
  • 32:30And you mentioned that you know you
  • 32:33will allow some of your faculty as
  • 32:35a assistant myself to be, you know,
  • 32:37users and beta testers for some of the
  • 32:39things that you have moving forward.
  • 32:40And I was just wondering, you know,
  • 32:42how do we follow up on that to try to
  • 32:44get a little bit more information of,
  • 32:45you know, what's, you know,
  • 32:47more deep dives into what's actually
  • 32:48in the data set and things like that,
  • 32:50that we could, you know,
  • 32:51as faculty, you know,
  • 32:52jump into and and and and query and you know,
  • 32:56understand a bit more of how we
  • 32:58can use it in our for our research,
  • 33:00our personal research,
  • 33:01Right. Because the seats were
  • 33:04limited for the class of 30,
  • 33:06we had asked the department chairs
  • 33:09to nominate who they wanted.
  • 33:13And we actually did not encourage
  • 33:16faculty to be the the people necessarily
  • 33:19because whoever is first has to serve
  • 33:22the needs of a whole lot of others.
  • 33:25So we even recommended being
  • 33:27staff members who were highly
  • 33:29skilled in the data analysis,
  • 33:31but in some cases it was, you know,
  • 33:34there wasn't anyone with that description
  • 33:37and they wanted faculty and and so we
  • 33:41we fulfilled one per per department.
  • 33:45So. So the moving forward there will
  • 33:48be at at least two to four each month
  • 33:51who will undergo the training as well.
  • 33:54So I mean the goal is one day
  • 33:58everyone who has the the skills
  • 34:00to to do SQL queries plus R or or
  • 34:05Python as a programming language
  • 34:09would be would have access.
  • 34:12It just takes some time because the
  • 34:15epic does require the training and in
  • 34:18fact I like that because it it becomes
  • 34:20not a completely open environment
  • 34:23for everyone who would because we
  • 34:26all know that even de identified
  • 34:29data can be easily DE identified.
  • 34:32So there is this filtering of of
  • 34:37people's not everyone in it also
  • 34:39it's only individuals from the
  • 34:42institutions that are contributing data.
  • 34:44So there is control over,
  • 34:46they are employed in this institution.
  • 34:48If something is done wrong,
  • 34:51wrong way to try to re identify and so on
  • 34:55then there could be consequences and so on.
  • 34:58So I think it's the,
  • 35:00it's the first step towards I'll say
  • 35:03opening the data a little bit and if
  • 35:07that works then there will be other
  • 35:11offerings at the same time as
  • 35:14we develop the computational
  • 35:15infrastructure that we need. No,
  • 35:18I I totally agree with that.
  • 35:19You know how you're you're opening
  • 35:20this up in terms of you know
  • 35:23individuals who would be really to
  • 35:25serve as that that that core role you
  • 35:27know where they have the bandwidth.
  • 35:28And yeah I agree that it wouldn't be
  • 35:30you know as appropriate to have the
  • 35:31faculty do that but are there you
  • 35:33know so you mentioned that you're
  • 35:34the J dad has is being restructured
  • 35:37and that you know could you and you
  • 35:39spoke a little bit about you know
  • 35:41you know the offerings you know that
  • 35:42it will have as is moving forward.
  • 35:44Do you have a thought in terms of when
  • 35:46that will be rolled out and when we can,
  • 35:48you know, start to connect with J Dad
  • 35:51again about, you know, pulling out,
  • 35:53you know, information and things
  • 35:54of that that nature that we can,
  • 35:56you know, get get things moving again?
  • 35:59Yeah, it hasn't stopped this way.
  • 36:03But what happens is it's, you know,
  • 36:07because the volume is so high,
  • 36:10too long and maybe even when it
  • 36:12comes then the people don't need
  • 36:15the data anymore so that it causes
  • 36:18even more waste of resources.
  • 36:20So what we're trying to do is to the,
  • 36:24the one upfront moving forward when
  • 36:27when we launched this, this new era,
  • 36:30right, do a feasibility analysis and
  • 36:33and are this data really you know,
  • 36:37can we even solve that problem?
  • 36:39There are several trainee questions right?
  • 36:42Medical students, postdocs,
  • 36:44graduate students who and currently
  • 36:48there there isn't a prioritization.
  • 36:51It's you know everyone enters the
  • 36:54queue and I think that's another
  • 36:57area in which we will have to think
  • 37:01about more carefully in in other
  • 37:05institutions the cost self selects who
  • 37:09who is putting those requests or not.
  • 37:13Of course we don't want that.
  • 37:15We don't want to prevent good
  • 37:18ideas to to move forward.
  • 37:20But we can also not have any
  • 37:24mechanism to to determine that.
  • 37:27This particular research question is not
  • 37:32well answered with one single system,
  • 37:37you know observational or
  • 37:40retrospective analysis.
  • 37:42It's it just would require much more
  • 37:45which then again having opportunity
  • 37:47to use all of us is an opportunity
  • 37:51to use Epic Cosmos as well.
  • 37:55So.
  • 37:55So we want to move towards a point
  • 37:59that we can decentralize somewhat
  • 38:02this request because at least
  • 38:05the feasible feasibility we no,
  • 38:07we don't have enough patience with this,
  • 38:10this and this.
  • 38:12We're not going for that trial
  • 38:15because it's not feasible.
  • 38:18But that all requires a lot of work
  • 38:22because since the data haven't been
  • 38:25used as much and you only encounter
  • 38:28data issues as you use the data, right?
  • 38:33Absolutely.
  • 38:34There will be a lot that we'll
  • 38:36have to do as we start opening
  • 38:39this new possibilities.
  • 38:43Now, what was the what was
  • 38:44the new name of the J dot?
  • 38:45You said that it's changed names as well.
  • 38:48Oh, it's still under the bait. OK All
  • 38:53right. Thank you so much.
  • 38:54This is fantastic. I, I really,
  • 38:56you know, love what you're,
  • 38:57what you're doing and how you're
  • 38:59reorganizing it, you know,
  • 39:00to make it more streamlined
  • 39:02and more accessible to people
  • 39:03who who don't have that,
  • 39:05that background and skill.
  • 39:06And I hope that those of us
  • 39:07such as myself who do have the
  • 39:09background and skill can, you know,
  • 39:10be a part of this movement as well.
  • 39:12Oh, exactly. And I I want to thank
  • 39:14everyone for the patience because yes,
  • 39:16we've been here a year in two months now and
  • 39:23it is not as simple,
  • 39:29yeah, but but it's doable.
  • 39:31Others done it.
  • 39:32We can do it to edit it at UCSD many,
  • 39:36many years ago. And so again,
  • 39:41if we can navigate the the aspect
  • 39:46of the cultural aspect on on trust
  • 39:50and and how we we move this along,
  • 39:53I think we'll go, we go a long
  • 39:56ways just taking a little long. Do
  • 40:18you have any more questions?
  • 40:26No, Thank you so much for coming.
  • 40:28Oh, Eugenia, do you have another question?
  • 40:31No, no. Thank you so much for
  • 40:34coming and presenting about bids.
  • 40:35I think everybody really enjoyed it and
  • 40:36it's exciting to see what there is to come.
  • 40:40Thank you so much. Bye, bye.