Biomedical Informatics and Data Science at Yale
March 04, 2024Janeway Society First Friday Seminar Series
March 1, 2024
Topic: Biomedical Informatics and Data Science at Yale
Presenter: Lucila Ohno-Machado, MD, MBA, PhD the Waldemar von Zedtwitz Professor of Medicine and Biomedical Informatics and Data Science; Deputy Dean for Biomedical Informatics; Chair, Section of Biomedical Informatics and Data Science
Information
- ID
- 11412
- To Cite
- DCA Citation Guide
Transcript
- 00:00Thank you all for joining today.
- 00:02Today we have Lucia Ohana Machado,
- 00:04who's the Professor of Medicine
- 00:06and Professor of Biomedical
- 00:07Informatics and Data Science.
- 00:09She's also the Deputy Dean for
- 00:11Biomedical Informatics and Chair
- 00:13of the new Section of Biomedical
- 00:14Informatics and Data Science.
- 00:16Lucilla came to us from the
- 00:19University of California in San Diego
- 00:21where she was the Associate Dean and Chair
- 00:23of Bio, Chair of Department of
- 00:26Biomedical Informatics and she
- 00:28is going to describe some of the
- 00:30initiatives that BIDS is running towards
- 00:33broadening biomedical and clinical
- 00:36data access and utilization at Yale.
- 00:40Thank you, Leslie.
- 00:42Thank you. Let me share the screen
- 00:47and make sure it's the right one.
- 00:49We can see it looks great excellent.
- 00:52So thank you and and because it's a small
- 00:54group please feel free to interrupt.
- 00:56I might not always see your hand up
- 00:59but I asked the the organizers to help
- 01:02me if there is any question in the
- 01:05middle in and today what I will present
- 01:08is mostly what's in it for you at the
- 01:12Biomedical Informatics and Data Science.
- 01:13I'm not necessarily presenting
- 01:16our research or featuring a lot of
- 01:20projects that are ongoing right now,
- 01:23but it's essentially how you can
- 01:25take advantage of new things that are
- 01:28coming to enhance your own portfolio.
- 01:33So as you know informatics has been
- 01:35at you for a very long time.
- 01:38Since
- 01:4319, 1985 or 86 there has
- 01:45been a training grant.
- 01:47So you can imagine before that there
- 01:50was already activity in this area
- 01:52with the Yale Center for Medical
- 01:55Informatics as as well as many
- 01:58activities in Computational biology.
- 01:59But the faculty had appointments
- 02:02in different departments and and
- 02:04Yale was was not perceived as a
- 02:10large informatics academic unit.
- 02:14So we came to fulfill that that
- 02:18need and created the section,
- 02:21the first standing section of
- 02:23Biomedical Informatics and Data
- 02:25Science starting January 1st of 2023.
- 02:28So we're here for a little over
- 02:31a year and with the the mission
- 02:34of research training and services
- 02:36with the research with the faculty
- 02:38who are currently in department
- 02:41focusing on AI and medicine in
- 02:44our natural language processing
- 02:46including large language models,
- 02:48privacy enhancing technology,
- 02:51blockchain technology.
- 02:52In some examples of large projects
- 02:55that did transfer to Yale,
- 02:58the Centre of Excellence in
- 03:00Genome Science focused on admix
- 03:03genomes and also recruitment.
- 03:05A portion of the all of Us precision
- 03:09medicine program is is not here at
- 03:12Yale and we're in Puerto Rico as
- 03:15an associated entity in training.
- 03:19We have curriculum for Master's PhD students.
- 03:24We, we do mentor postdocs,
- 03:26clinicians moving into this area,
- 03:29biomedical data scientists and we'll
- 03:32have summer interns from North America
- 03:35and also from Africa starting this year.
- 03:38The new certificate program in medical
- 03:41software and medical AI has been advertised.
- 03:45The applicants have submitted their forms
- 03:49and we are going to start very soon.
- 03:54In terms of infrastructure and services,
- 03:56we are mounting the research,
- 03:58computing and clinical and biomedical
- 04:00data services that may be of interest
- 04:03to you and that's what I'll focus
- 04:06my presentation on and of course
- 04:09collaborating and grant proposals,
- 04:11quality safety research.
- 04:14We were fortunate to to get about
- 04:17$17 million this year of of grants.
- 04:21Since the multiple year grants we hope to
- 04:25have increased that in subsequent years.
- 04:28There were 19 grants in contracts
- 04:31and 27 new proposals by our faculty.
- 04:34So I'll talk mostly about the service
- 04:38portion which is staffed by our faculty.
- 04:41It's the Yale Biomedical Informatics
- 04:44and Computing
- 04:48Center, or CORE, in which we have
- 04:51a Research Information officer,
- 04:53and that's the Doctor Daniella Meeker in a
- 04:59Computational Infrastructure Officer with
- 05:01Assistant Dean for Informatics washroom.
- 05:04And the principle here is our section,
- 05:08it's one of several units that
- 05:11benefits from informatics,
- 05:12informatics, infrastructure,
- 05:13the Department of Medicine is another one,
- 05:16surgery, various departments,
- 05:21YCCI, the YCC and so on.
- 05:22And then we operate with partner entities
- 05:25such as the EU New Haven Health System,
- 05:28the EU University and so on.
- 05:31And we have to do that with compatible
- 05:36policies in essentially we're here
- 05:40to support research operations
- 05:42related to data and compute that
- 05:46tied together this this 3 units.
- 05:51So in terms of the department centres,
- 05:54every department will have
- 05:56an informatics liaison.
- 05:58Who is the main point of contact?
- 06:01Who gather helps prioritize needs
- 06:03of the departments and bring to us
- 06:06identify data sources that are needed,
- 06:09refer trainees,
- 06:10and partner in grant proposals
- 06:14or other activities.
- 06:16So let me get back to here
- 06:19that you see an initial set of
- 06:23Advisory Board members who help
- 06:26us prioritize what we need to do.
- 06:29Not all departments yet represented,
- 06:31but we we think we have a large
- 06:35coverage here and plan to expand.
- 06:38Now we ran an AI survey you you may
- 06:42have gotten this spam in our inboxes
- 06:45more than once and we wanted to survey
- 06:48what are the capabilities and interest
- 06:51in AI at the School of Medicine.
- 06:54So we had 1100 respondents from
- 06:58all categories and there's high
- 07:00interest in AI and in AI training.
- 07:04So on the right hand side you will
- 07:06see the number of respondents and
- 07:08then the on the line going across
- 07:11it is the percentage of them
- 07:14who were interested in training.
- 07:17So you can see it's it's all more
- 07:19than half of the respondents
- 07:21and in general great interest in
- 07:25ChatGPT in machine learning and
- 07:28language models and so on.
- 07:31Now we ran in two months we organized
- 07:36an AI in medicine symposium,
- 07:38the first one here at the School
- 07:41of Medicine in which we wanted also
- 07:44to showcase and actually honestly
- 07:47learn about what kinds of AI are
- 07:49being done in various departments.
- 07:53We one of the presentations was about
- 07:56that this model we we call Meiyama
- 07:59a medical large language models.
- 08:01LAMA is a foundation model produced by Meta,
- 08:07formerly Facebook,
- 08:08in which you can add things on top of it,
- 08:10do more training on top of an existing
- 08:14foundation model and then create
- 08:16medical large language models and
- 08:18that that was done by researchers from
- 08:22our team and there are others as well.
- 08:25So this is one of them.
- 08:27Another one is called Meditrom,
- 08:29also a faculty member in BIDS.
- 08:32In our unit there's ophthalmology
- 08:34GPT and this is done by a faculty
- 08:38who has joint appointments with us
- 08:40and with a secondary appointment
- 08:43in ophthalmology.
- 08:44And also there is GUT GPT and another
- 08:49large language models and and this one
- 08:53coordinated by by Dennis Chung and NGI.
- 08:57There are opportunities.
- 08:58So Doctor Kara has mounted this
- 09:02slide in in terms of engagement
- 09:05with healthcare that there are
- 09:07parts of it that we are currently
- 09:10not doing as adequately to get
- 09:12to the endpoint of having people
- 09:14interact with the health system.
- 09:17So he he mentions experts cannot
- 09:20consistently diagnose structural heart
- 09:22disorders and it's a hidden disease
- 09:25and we often only diagnose when
- 09:27it's at a more advanced stage.
- 09:30So there are apps and there are
- 09:34other AI based techniques that
- 09:36help diagnose earlier before the
- 09:40advanced treatment and evaluation.
- 09:43So here is a slide from his lab
- 09:47in which he puts on the left hand
- 09:50side the AICG applications that
- 09:52he has already put it out there.
- 09:55There's a wearable cards DS plus
- 10:00application as well and trials
- 10:02that have been done on top
- 10:04of machine learning models,
- 10:06again developed here at Yale.
- 10:12Doctor Sheng in GI is also several
- 10:16applications, one of them related to
- 10:20human algorithmic interaction with
- 10:23several collaborators and that uses that
- 10:26gut GPT application that I I mentioned.
- 10:29So a lot of interesting
- 10:32applications here at at Yale.
- 10:38Those two by the way are secondarily
- 10:41appointment with us and they're
- 10:43primarily appointed in the
- 10:45Department of Medicine at sections
- 10:47of cardiology and GI respectively.
- 10:49The BIDS faculty we have now 12
- 10:53ladder rank primary faculty and 13
- 10:56non ladder rank faculty and then
- 10:59again 40 people who were appointed
- 11:02in other departments and have
- 11:05secondary with us and those are in 20
- 11:09departments in five different schools.
- 11:11So what we see is on the on the left hand
- 11:16side BIDS having data workflow management,
- 11:19AI and methods expertise in
- 11:20the departments and centre 7.
- 11:22On the right hand side the data
- 11:24and the domain expertise for the
- 11:27particular application that we are
- 11:29talking about be that AI or other
- 11:31forms of informatics and data science.
- 11:34So in terms of modalities,
- 11:36we do have bioinformatics,
- 11:39imaging and clinical informatics and
- 11:42some are more populated than others at
- 11:45present and our plan is to keep moving
- 11:49and having this applications done on on.
- 11:53Again on the department and centre
- 11:55side we find many people who
- 11:58have been doing informatics on a
- 12:00fragmented way and would try to not
- 12:03only acknowledge what they're doing,
- 12:05but try to synergize and create
- 12:09the computational infrastructure
- 12:11that it can again can get to high
- 12:15higher levels because there are more
- 12:18people using them at this point.
- 12:21So our goal of developing new computer
- 12:24based methods and systems to improve
- 12:26human health is part of our mission to
- 12:30train the next generation and then help
- 12:32researchers make maximal use of data.
- 12:35So I would expect many today in
- 12:38the audience are working with the
- 12:40data from the health system or from
- 12:43clinical trials or from basic science
- 12:46and in that all data management
- 12:49in advanced methods may benefit.
- 12:54So AI is the current interest of
- 12:58of many people and there are NIH
- 13:01requests for applications in many,
- 13:03many different areas in institute
- 13:06that puts them here as examples.
- 13:08And then on the left hand side,
- 13:10I select some topics that are very,
- 13:14very related to it as well as projects
- 13:17that we are currently involved.
- 13:20One is aim ahead the HealthEquity and
- 13:23advanced diversity for the workforce.
- 13:26The other one is Bridge two AI in
- 13:29other common fund programs and and
- 13:32as I mentioned before the all of
- 13:35Us Precision Medicine initiative
- 13:37AI is a team sport.
- 13:38So again I put both both sides here so
- 13:42that we don't exist on methods only
- 13:45we have to have you know important
- 13:48applications and have the right
- 13:50questions to answer and that's where
- 13:53regardless of the medical specialty
- 13:55we we do need partnerships and help.
- 13:59So data,
- 14:00computational infrastructure and training
- 14:02is what we're we're trying to do.
- 14:05And and here's a busy diagram in
- 14:09terms of the computing infrastructure
- 14:11that we found
- 14:13needs to be developed at Yale so
- 14:16that we can have everyone fulfill
- 14:18their application needs as well
- 14:21as new development of methods.
- 14:25So going from from #1 to #4,
- 14:28what safe is a computational
- 14:31infrastructure designed to have HIPAA
- 14:34compliance and to have protect sensitive
- 14:38information such as electronic health
- 14:41records and clinical trial data and
- 14:44that has a certain amount of storage
- 14:47that that needs to increase from.
- 14:50What is that?
- 14:51There is already a system in there,
- 14:53but it it needs a lot of updates for
- 14:58that and the research Virtual Display
- 15:02Interface is how you use your own
- 15:07computer to serve as a terminal to
- 15:10servers that are much more powerful
- 15:12and again secured for this operation.
- 15:14So you you use your computer to access data
- 15:19and to do analysis on this protected server.
- 15:23So, so this again is is being enhanced from
- 15:27an existing seed compute infrastructure
- 15:31that existed and then you have the cloud,
- 15:35the Amazon Web Services spin up plus
- 15:39which will be more secure version of
- 15:43the current capability that exists
- 15:46throughout the university.
- 15:48This will be possibly adequate
- 15:51for genome annotation,
- 15:53genome analysis and image processing as well.
- 15:58And then finally on the right hand side,
- 16:01it's what we are calling Microscope
- 16:03High Performance Computing.
- 16:05And this is a set of servers with GPU nodes
- 16:09and these are graphic processing units
- 16:13where they're necessary to do AI these days.
- 16:17And it is something that currently is
- 16:20not available to to our community.
- 16:25So why do we do all that?
- 16:26Well,
- 16:27because we we want to use data.
- 16:28And in the past I've
- 16:30created data networks from
- 16:34several institutions,
- 16:35including the national VA,
- 16:38so that we could use more data because
- 16:41however large our system can be,
- 16:45there are important other diversity
- 16:47of data in in other areas.
- 16:49So that's something we want to
- 16:52eventually participate in this clinical
- 16:55data networks with participation of
- 16:58the Yale electronic health records,
- 17:01clinical data and so on.
- 17:02We're we're not there yet.
- 17:04There's a lot to be done in data
- 17:06curation and preparation but but we're
- 17:09working towards that goal also in
- 17:11terms of precision medicine and and
- 17:14PRS which has polygenic risk scores.
- 17:17We we do think we need to pay attention
- 17:21to that particularly because the
- 17:24current studies are lacking diversity
- 17:28in the population and they're
- 17:31incorrect findings being reduced.
- 17:33So can we play a role in that
- 17:37with again the diversity of the
- 17:40local population participating
- 17:41in in more studies and and then
- 17:44doing this large networks as well?
- 17:46Can we do that without introducing
- 17:49more biases and can we protect privacy?
- 17:52How do we study individuals from mixed
- 17:56ancestries who are currently many
- 17:58times discarded from analysis because
- 18:02it's harder to do the competition with them?
- 18:05So we have a A Center of Excellence
- 18:08in Genome Science based on Admixture
- 18:10Center of Admixtured Science and
- 18:13Technology in which we want to
- 18:16account for particular ancestry
- 18:19based on the specific chromosome and
- 18:23location rather than global ancestry
- 18:26as it's currently done in genetic
- 18:29studies because we know there's
- 18:32improve power if we include that
- 18:35mixed populations in the studies.
- 18:39So just schematically A trait is
- 18:43influenced by genetic determinants,
- 18:46exposures,
- 18:46social determinants and within genetic
- 18:49determinants there are variant effects.
- 18:53There is also ancestry effects and
- 18:57we need to account for for all of
- 18:59that and the way we do things,
- 19:01unfortunately or fortunately,
- 19:03that the data are sequestered
- 19:06in different compute enclaves.
- 19:09Fortunately because that helps protect
- 19:12the security and privacy of the data,
- 19:16but unfortunately because it makes
- 19:18the calculations way more difficult
- 19:21to to be done.
- 19:22But we have technology workarounds
- 19:26that allow us to compute with both sets
- 19:29of data that this will be eventually
- 19:332 million whole genome sequences
- 19:35that we can use to do precision medicine.
- 19:39So we have work in this area of
- 19:42secure Federated algorithms in
- 19:45which currently we are working
- 19:48with computing with all of U.S.
- 19:50data and the million veteran program data.
- 19:53But we believe that as new Biobanks and
- 19:58new programs appear in other countries,
- 20:02if they follow a similar protocol
- 20:05that we could eventually do
- 20:07that with other nations as well.
- 20:10And that would greatly increase
- 20:12the power of all the analysis.
- 20:15And again,
- 20:16these are algorithms that allow
- 20:19all participant data to stay
- 20:21in their compute enclaves.
- 20:24Kindness of questions that the
- 20:26precision medicine programs such as
- 20:29all of us try to answer once again,
- 20:32all all sorts of them and and several
- 20:35different specialties, right so.
- 20:37So this was a slide from the
- 20:39program that helps to disseminate
- 20:42and try to convey the importance of
- 20:45doing a cohort like this where we
- 20:50have electronic health record data,
- 20:52survey data, physical measurement data,
- 20:55bio samples in a a small part,
- 20:58wearables and digital app data as well.
- 21:02So we are recruiting here
- 21:05in New Haven. We have at
- 21:11YCCRAYCCI a recruitment portion
- 21:14and also in in a few other places
- 21:17in we partner with the Precision,
- 21:22the Puerto Rico Center for
- 21:24Clinical Investigation as well.
- 21:26We have a recruitment site in there,
- 21:29so adding to to the program.
- 21:32And then another thing that I I
- 21:35wanted to talk about is because it's
- 21:38being launched as as we speak which
- 21:41is the Cosmos community data set.
- 21:44Cosmos is an initiative from Epic of the
- 21:47vendor of electronic health records system.
- 21:50In that if a particular customer of
- 21:54EPIC such as us wants to participate
- 21:58then you would have access for de
- 22:02identified data for research not
- 22:05only of our own data here at the EU
- 22:09New Haven health system but also of
- 22:12other institutions participating.
- 22:14So right now there are 230 million
- 22:18patients represented with several
- 22:20encounters and face to face visits.
- 22:23So it's it's about 1300 hospitals
- 22:27so quite a a a very large group
- 22:31of them and and you can use this
- 22:34data for research
- 22:38the analysis tools the one that
- 22:41is available for everyone is
- 22:43Slicer Dicer on this larger set.
- 22:46Of course you you already know of
- 22:49Slicer Dicer for the electronic
- 22:51health record systems for the epic
- 22:54system here at the health system.
- 22:56But this would be you can as as
- 23:01we get to this launched use Slicer
- 23:04Dicer as A to do research or to do
- 23:09some aggregate in graphics on the
- 23:13full population which is about 200.
- 23:16It's above 200 such health systems
- 23:20represented so much larger than than our own.
- 23:24And then the analysis tools.
- 23:27You can also do command line R,
- 23:30Python, SQL statements,
- 23:33but that requires certification that it
- 23:36requires that you you were trained in
- 23:39in the particular aspects of doing that.
- 23:42So what we're doing right now is
- 23:46to start with a class of 1 or more
- 23:49representatives for different departments
- 23:51so that they can be trained on this,
- 23:54and then start helping fulfill requests
- 23:58that come from their own departments.
- 24:02Their data management tools are listed here.
- 24:05And then code libraries as well.
- 24:07And some of you may recognize some of
- 24:11these libraries used for AI type research.
- 24:14The good thing too is that this whole
- 24:18computer environment is already
- 24:21existing is from the epic side.
- 24:24So while we build ours,
- 24:27there is one that can already be
- 24:29used by data analysts.
- 24:34Now the the timeline is like I said,
- 24:36we are right around the,
- 24:38the corner of being available
- 24:42in terms of Cosmos Live meaning
- 24:45this slicer Dicer portion of it.
- 24:48And then we we have people doing
- 24:52the prerequisite completion for the
- 24:54course that they will take April 8:00
- 24:56and 9:00 in order to be certified
- 24:59to be used for that platform.
- 25:02There will be 30 initial people
- 25:05trained and and then monthly
- 25:07we will add a few more.
- 25:12So in summary, what we're doing for the
- 25:15informatics infrastructure for research
- 25:17that you all benefit from is catching
- 25:21up on hardware and cloud security,
- 25:25cloud environments and also on the data,
- 25:28on preparing the data for
- 25:31analysis we are are in,
- 25:34we have some initial work and policies
- 25:36and and the training of people,
- 25:38we have much more to do in this area,
- 25:41development of software and then the
- 25:43launching of services in in the overall
- 25:46training is also in the pipeline.
- 25:48But but we decided to go one thing at a
- 25:54time because we were still a small group
- 25:56as I mentioned before the researchers
- 26:00in the case we're recruiting faculty
- 26:02not only to do their own research
- 26:05and be users and testers of this
- 26:07environment but also those who lead
- 26:09the development of this whole services.
- 26:12So, so that has been an
- 26:14area of emphasis of ours,
- 26:17research scientists and postdocs.
- 26:18We need to have PhD students,
- 26:21medical students, undergrads,
- 26:22the whole community of researchers
- 26:25to to benefit from this environment.
- 26:28But the environment is still coming up,
- 26:30which is why I wanted to present here
- 26:34and then also answer any questions
- 26:38related to requirements or you know,
- 26:43interest in external data sets as well.
- 26:46Not just the electronic health
- 26:48records from EPIC,
- 26:49but some researchers work with claims
- 26:52data that are licensed from other places
- 26:57and some many other sources exist.
- 27:01So.
- 27:01So with that I'll stop sharing and see
- 27:04whether there are any questions or comments,
- 27:08suggestions on on how what you need
- 27:10to do most so we can move ahead.
- 27:22So I think one question I had was
- 27:23you know if they're faculty that are
- 27:25interested in working with bids is there,
- 27:27how can they reach out?
- 27:29I mean should they reach out to you,
- 27:30is there an online form or how
- 27:32does that work it it depends.
- 27:36Typically if they aren't like in those
- 27:39informatics people who happen to just
- 27:41be in another department but want to
- 27:44affiliate with us then they they talk
- 27:46to me we we set up a a time go over
- 27:51as secondary faculty what they they
- 27:53would like to benefit from and and so on.
- 27:56So those are the informatics
- 27:59faculty appointed elsewhere,
- 28:00the informatics users who
- 28:03want for example data.
- 28:05The whole JDAT.
- 28:08Currently,
- 28:08JDAT Group is being revamped and
- 28:13renamed and reorganized so that the the
- 28:18data requests can be more streamlined
- 28:20than they are right now and actually
- 28:23filtered because some of them are not,
- 28:28you know,
- 28:28at at the same level as as others.
- 28:30In terms of how researchers have thought
- 28:35about the request and whether electronic
- 28:39health records can really answer the
- 28:41question that they have in mind.
- 28:43Because many times it's not.
- 28:45The question you have is the
- 28:48question that's electronic
- 28:50health records can help answer.
- 28:52So there there is a lot of re
- 28:55restructure in that area in terms
- 28:59of compute against some faculty.
- 29:02But those are typically more the the
- 29:05the informatics ones have computer
- 29:07needs that we currently cannot fulfill.
- 29:10But we're designing the new structure
- 29:14in order to fulfill anything also
- 29:19related to security of people who
- 29:23might somehow have inherited data
- 29:25sets in local servers or laptops or
- 29:29other things that need to be moved.
- 29:32That's the whole area in which the
- 29:35Chief Research Information Officer,
- 29:37Doctor Meeker is is working towards,
- 29:40because we do have to move certain data
- 29:44sets into environments that are more
- 29:49up to date with regards to to security.
- 29:57Yeah, I have. I see a doctor
- 30:01chalk. Thank you, Doctor
- 30:04Ono Machado for for a nice talk.
- 30:07I just have one very short question.
- 30:10You mentioned about the All of Us
- 30:12precision medicine research program.
- 30:14Is that immediately available to work on?
- 30:19Can you speak a little
- 30:20bit about that? Thanks.
- 30:22Yeah. So the data for the All
- 30:24of Us program is immediately
- 30:26available and in fact several Yale
- 30:28researchers have already published
- 30:31based on on data from that program.
- 30:33And the reason it's very timely
- 30:36right now is one they also provide
- 30:39their own computer environment.
- 30:41So we don't need to provide ours.
- 30:45You you essentially go online and
- 30:49as long as you have an area Commons
- 30:52account you can sign up for it.
- 30:55There are a few
- 30:58online training that you
- 31:00have to to go through.
- 31:01So there is some prerequisites in
- 31:04order to access the environment
- 31:06and then and that's for the
- 31:09environment similar to Cosmos,
- 31:10the environment that you do command
- 31:12line that you select cohorts and so on.
- 31:14If you just want to do a data
- 31:17browser so similar to Sliced Redicer,
- 31:19you can do it immediately
- 31:22without registration.
- 31:23The very interesting portion of
- 31:26it is the genome sequencing,
- 31:29which right now is already 350,000
- 31:34and that's already one of the
- 31:36largest collections smaller
- 31:38than UK Biobank currently.
- 31:41But in terms of diversity,
- 31:43is is not even comparable because
- 31:46the whole recruitment process,
- 31:49the whole program is predicated on
- 31:52eliminating that bias that exists
- 31:55that genetic studies are based on
- 31:59European populations for the most part.
- 32:08Yes, Doctor,
- 32:10wonderful, wonderful summary.
- 32:11Thank you so much for sharing.
- 32:14So I'm Kim Blendman.
- 32:16I am an assistant professor here
- 32:18in Medical Oncology at the School
- 32:20of Medicine as well as an assistant
- 32:22professor in Computer Science at
- 32:23the School of Engineering and
- 32:24Applied Science here as well.
- 32:26And I was very much so interested
- 32:28in your Cosmos community data set.
- 32:30And you mentioned that you know you
- 32:33will allow some of your faculty as
- 32:35a assistant myself to be, you know,
- 32:37users and beta testers for some of the
- 32:39things that you have moving forward.
- 32:40And I was just wondering, you know,
- 32:42how do we follow up on that to try to
- 32:44get a little bit more information of,
- 32:45you know, what's, you know,
- 32:47more deep dives into what's actually
- 32:48in the data set and things like that,
- 32:50that we could, you know,
- 32:51as faculty, you know,
- 32:52jump into and and and and query and you know,
- 32:56understand a bit more of how we
- 32:58can use it in our for our research,
- 33:00our personal research,
- 33:01Right. Because the seats were
- 33:04limited for the class of 30,
- 33:06we had asked the department chairs
- 33:09to nominate who they wanted.
- 33:13And we actually did not encourage
- 33:16faculty to be the the people necessarily
- 33:19because whoever is first has to serve
- 33:22the needs of a whole lot of others.
- 33:25So we even recommended being
- 33:27staff members who were highly
- 33:29skilled in the data analysis,
- 33:31but in some cases it was, you know,
- 33:34there wasn't anyone with that description
- 33:37and they wanted faculty and and so we
- 33:41we fulfilled one per per department.
- 33:45So. So the moving forward there will
- 33:48be at at least two to four each month
- 33:51who will undergo the training as well.
- 33:54So I mean the goal is one day
- 33:58everyone who has the the skills
- 34:00to to do SQL queries plus R or or
- 34:05Python as a programming language
- 34:09would be would have access.
- 34:12It just takes some time because the
- 34:15epic does require the training and in
- 34:18fact I like that because it it becomes
- 34:20not a completely open environment
- 34:23for everyone who would because we
- 34:26all know that even de identified
- 34:29data can be easily DE identified.
- 34:32So there is this filtering of of
- 34:37people's not everyone in it also
- 34:39it's only individuals from the
- 34:42institutions that are contributing data.
- 34:44So there is control over,
- 34:46they are employed in this institution.
- 34:48If something is done wrong,
- 34:51wrong way to try to re identify and so on
- 34:55then there could be consequences and so on.
- 34:58So I think it's the,
- 35:00it's the first step towards I'll say
- 35:03opening the data a little bit and if
- 35:07that works then there will be other
- 35:11offerings at the same time as
- 35:14we develop the computational
- 35:15infrastructure that we need. No,
- 35:18I I totally agree with that.
- 35:19You know how you're you're opening
- 35:20this up in terms of you know
- 35:23individuals who would be really to
- 35:25serve as that that that core role you
- 35:27know where they have the bandwidth.
- 35:28And yeah I agree that it wouldn't be
- 35:30you know as appropriate to have the
- 35:31faculty do that but are there you
- 35:33know so you mentioned that you're
- 35:34the J dad has is being restructured
- 35:37and that you know could you and you
- 35:39spoke a little bit about you know
- 35:41you know the offerings you know that
- 35:42it will have as is moving forward.
- 35:44Do you have a thought in terms of when
- 35:46that will be rolled out and when we can,
- 35:48you know, start to connect with J Dad
- 35:51again about, you know, pulling out,
- 35:53you know, information and things
- 35:54of that that nature that we can,
- 35:56you know, get get things moving again?
- 35:59Yeah, it hasn't stopped this way.
- 36:03But what happens is it's, you know,
- 36:07because the volume is so high,
- 36:10too long and maybe even when it
- 36:12comes then the people don't need
- 36:15the data anymore so that it causes
- 36:18even more waste of resources.
- 36:20So what we're trying to do is to the,
- 36:24the one upfront moving forward when
- 36:27when we launched this, this new era,
- 36:30right, do a feasibility analysis and
- 36:33and are this data really you know,
- 36:37can we even solve that problem?
- 36:39There are several trainee questions right?
- 36:42Medical students, postdocs,
- 36:44graduate students who and currently
- 36:48there there isn't a prioritization.
- 36:51It's you know everyone enters the
- 36:54queue and I think that's another
- 36:57area in which we will have to think
- 37:01about more carefully in in other
- 37:05institutions the cost self selects who
- 37:09who is putting those requests or not.
- 37:13Of course we don't want that.
- 37:15We don't want to prevent good
- 37:18ideas to to move forward.
- 37:20But we can also not have any
- 37:24mechanism to to determine that.
- 37:27This particular research question is not
- 37:32well answered with one single system,
- 37:37you know observational or
- 37:40retrospective analysis.
- 37:42It's it just would require much more
- 37:45which then again having opportunity
- 37:47to use all of us is an opportunity
- 37:51to use Epic Cosmos as well.
- 37:55So.
- 37:55So we want to move towards a point
- 37:59that we can decentralize somewhat
- 38:02this request because at least
- 38:05the feasible feasibility we no,
- 38:07we don't have enough patience with this,
- 38:10this and this.
- 38:12We're not going for that trial
- 38:15because it's not feasible.
- 38:18But that all requires a lot of work
- 38:22because since the data haven't been
- 38:25used as much and you only encounter
- 38:28data issues as you use the data, right?
- 38:33Absolutely.
- 38:34There will be a lot that we'll
- 38:36have to do as we start opening
- 38:39this new possibilities.
- 38:43Now, what was the what was
- 38:44the new name of the J dot?
- 38:45You said that it's changed names as well.
- 38:48Oh, it's still under the bait. OK All
- 38:53right. Thank you so much.
- 38:54This is fantastic. I, I really,
- 38:56you know, love what you're,
- 38:57what you're doing and how you're
- 38:59reorganizing it, you know,
- 39:00to make it more streamlined
- 39:02and more accessible to people
- 39:03who who don't have that,
- 39:05that background and skill.
- 39:06And I hope that those of us
- 39:07such as myself who do have the
- 39:09background and skill can, you know,
- 39:10be a part of this movement as well.
- 39:12Oh, exactly. And I I want to thank
- 39:14everyone for the patience because yes,
- 39:16we've been here a year in two months now and
- 39:23it is not as simple,
- 39:29yeah, but but it's doable.
- 39:31Others done it.
- 39:32We can do it to edit it at UCSD many,
- 39:36many years ago. And so again,
- 39:41if we can navigate the the aspect
- 39:46of the cultural aspect on on trust
- 39:50and and how we we move this along,
- 39:53I think we'll go, we go a long
- 39:56ways just taking a little long. Do
- 40:18you have any more questions?
- 40:26No, Thank you so much for coming.
- 40:28Oh, Eugenia, do you have another question?
- 40:31No, no. Thank you so much for
- 40:34coming and presenting about bids.
- 40:35I think everybody really enjoyed it and
- 40:36it's exciting to see what there is to come.
- 40:40Thank you so much. Bye, bye.