4-29-25 Workshop Session 1
June 04, 2025Information
- ID
- 13190
- To Cite
- DCA Citation Guide
Transcript
- 00:10Good afternoon, everyone.
- 00:11Thank you for joining us
- 00:13today.
- 00:15First of all, apologies for
- 00:16any inconvenience.
- 00:18We had a lot of
- 00:19registrants, you know, to overflow
- 00:21the room capacity. I was
- 00:23still getting people. Okay. Can
- 00:24I register till the last
- 00:25moment? So I might have
- 00:27told some of you we
- 00:28are out of food and
- 00:29things and all, but I
- 00:31think some people didn't show
- 00:33up. So if you wanna
- 00:34grab something, there is still
- 00:35food out there.
- 00:38With that said, I guess,
- 00:40other things are popular. That
- 00:41is the reason why everyone
- 00:43is here.
- 00:45And this is the first
- 00:47YBIC seminar,
- 00:49and we recently people have
- 00:51been, asking us what is
- 00:52this place. So this is
- 00:54basically the Department of Biomedical
- 00:55Informatics and Data Science. We
- 00:57just moved here a few
- 00:58weeks back,
- 01:00from hundred college ninth floor
- 01:02to hundred and one college
- 01:04tenth floor.
- 01:06And, yeah, as I said,
- 01:07this is the first five
- 01:08week. It's a seminar series,
- 01:09and this is the first
- 01:10one. On May twenty ninth,
- 01:13morning, we are gonna have
- 01:14another one that is gonna
- 01:15be on market scan.
- 01:18So thanks again, for joining.
- 01:21I hope we meet your
- 01:23expectations
- 01:24and you have a great
- 01:25time learning.
- 01:26I would first welcome doctor
- 01:27Korsho,
- 01:28to give a brief introduction
- 01:30of what YVIX stands for
- 01:32and what it is all
- 01:32about.
- 01:35So, thanks. Vipina, you should
- 01:37introduce yourself.
- 01:39Yeah. I So thanks, Vipina,
- 01:41Sue, and all others for
- 01:42the, organization of these events.
- 01:44Yeah.
- 01:46Oh,
- 01:47I'll just go ahead to
- 01:50so
- 01:51oh, does it move?
- 01:58I guess most of you
- 02:00first time being this building.
- 02:01Right?
- 02:03Okay.
- 02:05So we actually moved here
- 02:07two weeks ago, like said,
- 02:08and,
- 02:11we are at, my name
- 02:12is Washu. I'm a professor
- 02:14vice chair for research at
- 02:15the department of biomedical informatics
- 02:17and data science.
- 02:18Today, this workshop is organized
- 02:20actually by, a office called
- 02:22YBIC,
- 02:23Yale Biomedical Informatics and Computing.
- 02:26It's led by doctor,
- 02:28Lucila Ono Machado, deputy dean,
- 02:30for biomedical informatics at the
- 02:32Yale School of Medicine.
- 02:34And, I don't know how
- 02:35many of you have actually
- 02:37accessed the YBIC website. Can
- 02:38you show me your hands
- 02:40if
- 02:41very few? So it's good
- 02:42opportunity for me to promote
- 02:44this.
- 02:45So the the reason too
- 02:47is that Xi YBIC is
- 02:49really trying to make a
- 02:50central hub for folks,
- 02:53looking for biomedical datasets,
- 02:56or you are looking for
- 02:57software tools to conduct a,
- 02:59clinical research.
- 03:00Also, maybe also, for a
- 03:02lot of medical AI stuff,
- 03:04you are looking for more
- 03:05secure computing environment, right, to
- 03:07run the models.
- 03:09And we're also trying to
- 03:10provide training. And if you
- 03:11go to the website, you
- 03:12actually find a lot of
- 03:13information.
- 03:14And YBIC is really a
- 03:16collaboration
- 03:18among a number of different
- 03:19entities,
- 03:20partnership with the YNH,
- 03:23the hospital,
- 03:25analytic team,
- 03:27also with,
- 03:28YCRC.
- 03:29Many of you know, probably
- 03:30the Yale,
- 03:32Center for Research Computing,
- 03:34health science ITS,
- 03:37team. So we work together
- 03:38really trying to, provide all
- 03:40the resource to to the
- 03:42folks who are working on,
- 03:44biomedical informatics, clinical research, requires
- 03:47heavy,
- 03:48computational
- 03:49resource, for example, things like
- 03:51that. And right now, we
- 03:52actually have three different office
- 03:53under doctor,
- 03:55Ona Machado. One is the,
- 03:57strategic initiative. It's led
- 03:59by herself and more on
- 04:01the initiatives on the
- 04:03strategic planning. And the second
- 04:05one is led by doctor
- 04:06Danny,
- 04:08Daniela Meeker in the middle.
- 04:09I think many of you
- 04:10probably already use the research
- 04:12informatics office,
- 04:14like JEDA team to help
- 04:16you retrieve data from the
- 04:18the hospital EHR system. I
- 04:19think probably many of you
- 04:21have used the service. Right?
- 04:22Can you raise your hand
- 04:23if you have a
- 04:25we have a JEDI team
- 04:27lead,
- 04:27Richard here. So if you
- 04:29have more question, you can
- 04:30also ask him. And then,
- 04:32the third one is called
- 04:33the, research computing infrastructure.
- 04:36It's called it by myself
- 04:37and,
- 04:38doctor,
- 04:39Weiss.
- 04:40So
- 04:41this one is really trying
- 04:43to,
- 04:44provide
- 04:46all the secure computing environment,
- 04:48which I think many of
- 04:49you actually might be interested
- 04:50because now when we actually
- 04:52develop medical AI, large amount
- 04:53of model, we're often looking
- 04:54for,
- 04:57GPU resources. And you probably
- 04:59heard about,
- 05:00recent investment from Yale,
- 05:02fifty million on hardware as
- 05:04on GPUs
- 05:05to facilitate
- 05:06AI research.
- 05:07And in, specifically, what we
- 05:09are working on is trying
- 05:11to provide more kind of
- 05:12secure computing environment.
- 05:14Because we are may working
- 05:15on the kingly patient
- 05:17data with PHIs, how we
- 05:18can protect those information while
- 05:20we do medical research.
- 05:22So right now, our office
- 05:23has been kind of working
- 05:25on four different platforms.
- 05:27Some of you may heard
- 05:28of some of the platforms.
- 05:29Some of them actually is
- 05:31still coming. So the first
- 05:32one, I think, today, we're
- 05:34also gonna talk about is,
- 05:35like, the VDI environment so
- 05:37you can remote access to
- 05:39the environment within the hospital
- 05:41through a kind of remote
- 05:42desktop kind of, environment. Second
- 05:44one,
- 05:45is also a focus of
- 05:47today's,
- 05:48a lot of presentation is
- 05:49the CHG,
- 05:51safe environment
- 05:52with a lot of CPUs
- 05:53and GPU within the hospital
- 05:55in a HIPAA compliance
- 05:57environment. And the third one
- 05:59is, what we are building.
- 06:00You probably heard about spin
- 06:02up, which is the AWS
- 06:04kind of self-service,
- 06:05environment. Now we're actually making
- 06:07this, for the spin up
- 06:09plus. What we are doing,
- 06:10trying to make it, NIST
- 06:12eight hundred one seventy one
- 06:14environment.
- 06:15So, over there, you can,
- 06:18safely
- 06:19manage all the PHI data.
- 06:21Under the fourth one, it's
- 06:23named hopper.
- 06:24So it's a more secure
- 06:26computing GPU class that will
- 06:28be available in July this
- 06:30year. So far, Hopper already
- 06:32have,
- 06:33installed about sixty NVIDIA h
- 06:36one hundred, and it's in
- 06:37better testing environment.
- 06:39The the data center is
- 06:40hosted in the Massachusetts,
- 06:42data center, but it will
- 06:44be managed by a team
- 06:45at the end. So I
- 06:47just have a call out
- 06:48for this platform. It will
- 06:49come in soon if you're
- 06:51looking for more GPU,
- 06:52resources.
- 06:54So for today's, a lot
- 06:55of presentation, actually, we are
- 06:57focusing on the first two,
- 06:58which is on the VDI,
- 07:00CHP safe environment, which is
- 07:02sitting in the hospital environment.
- 07:04And the idea
- 07:06is really trying to answer
- 07:08all your questions like, how
- 07:09can we access the data?
- 07:11How can we request the
- 07:13computational resource within the CHP
- 07:16safe? For example, on the
- 07:17CHP safe, you can also
- 07:18see we have some GPUs
- 07:20like a one hundred, h
- 07:21one hundred as well. How
- 07:22can you request it? And
- 07:23then the second half is
- 07:25really,
- 07:26about building large engine model.
- 07:28Like, under CHP safe, we
- 07:31have developed some available tools,
- 07:33APIs.
- 07:34So we show you how
- 07:35you can call to use
- 07:36those tools
- 07:37to to for different clinical,
- 07:39applications.
- 07:41I'll stop here. I'll let
- 07:42the, Vipina just to introduce
- 07:44the agenda and the way
- 07:45we go from there. Thanks,
- 07:46everyone.
- 07:54This is how we have
- 07:55structured today's presentation,
- 07:57session one from one to
- 07:59two thirty. We'll have a
- 08:01coffee break from two thirty
- 08:02to two forty five. I've
- 08:03already started getting coffee. Yes.
- 08:05And from session,
- 08:07two from two forty five
- 08:08to four, and followed by
- 08:10that, we'll have a networking
- 08:11reception in our kitchen right
- 08:13there.
- 08:15So
- 08:16for session one, this is
- 08:17what we are gonna do.
- 08:19Rich is gonna talk about
- 08:21research data or JED at
- 08:23what, where, how, these things.
- 08:25And again, as doctor Shu
- 08:26mentioned, we will dive deep
- 08:28into chip safe environment and
- 08:30what is Camino,
- 08:31the presentations by Nate and
- 08:33Al,
- 08:34and then doctor Shu will
- 08:35do a a overview of
- 08:37large language models.
- 08:40And after the coffee break,
- 08:43YuJa is gonna show you
- 08:45why we do annotation, some
- 08:46of the annotation tools that
- 08:48we have set up on
- 08:49this environment.
- 08:50And then I'll go deeply
- 08:51into one of the clinical
- 08:53information extraction pipelines that we
- 08:55have built.
- 08:57And then Vincent is gonna
- 08:58do a short demo on
- 09:00Kiwi as an API services.
- 09:02We'll discuss those things. And
- 09:03finally,
- 09:04a bit of programming
- 09:07intense session,
- 09:08that is will be done
- 09:09by Lingfei where we'll tell
- 09:11you how to develop customized
- 09:12LLMs for your specific task.
- 09:18We will have a q
- 09:19and a session after each
- 09:21speaker presence, so please hold
- 09:23your questions till that so
- 09:24that we are on time.
- 09:26I'd like to welcome
- 09:28now Rich Hintz, director clinical
- 09:31research
- 09:32data
- 09:33services. Rich, please take over.
- 09:46Good afternoon.
- 09:50Flip this over a bit.
- 09:55Alright. Thanks for having me,
- 09:57everyone.
- 09:58I'm Richard. I'm part of
- 09:59the
- 10:00research informatics office, and I'm
- 10:02glad to be here today.
- 10:04And, to tell you a
- 10:05little bit I wanna give
- 10:06you a little background about
- 10:07our group. We're a team
- 10:08of eleven. We support Yale
- 10:10faculty, staff, residents,
- 10:12hospital,
- 10:14employees, as well as medical
- 10:16students.
- 10:18We are primarily
- 10:20tasked with providing data for
- 10:22research and research data needs
- 10:23and includes providing
- 10:26Ah,
- 10:27thank you. Alright.
- 10:29No one minds if I'm
- 10:29quiet apparently, but that's fine.
- 10:31That's fine.
- 10:32No.
- 10:33We typically respond to about,
- 10:35over six hundred data requests
- 10:37every year,
- 10:38and, you know, we provide
- 10:40queries,
- 10:41custom data extracts.
- 10:42We work with Epic Reporting
- 10:44and develop,
- 10:46assist with, MyChart recruitment.
- 10:49So, hopefully, I'll be talking
- 10:50about today,
- 10:51how to obtain data through
- 10:52JDAT, giving a little background
- 10:54on that, touching upon some
- 10:55of the environments that Hua
- 10:57mentioned.
- 10:58And then, you know, you'll
- 10:59hear a lot of great
- 11:00information from Nate and Al
- 11:02coming up on the computational
- 11:04health platform and Camino.
- 11:06So over the last year,
- 11:07we've been working to focus
- 11:08on,
- 11:09really improving processes
- 11:11that will put data in
- 11:13the hands of our research
- 11:14community faster and easier. And,
- 11:16primarily, we've been doing that
- 11:18to focus on some of
- 11:18this and promote some of
- 11:19the self-service approaches
- 11:21as well as, you know,
- 11:23data provisioning
- 11:24tools on the the y
- 11:26n YMHH
- 11:27infrastructure.
- 11:33So the the data that's
- 11:35available,
- 11:36the the clinical data, and
- 11:37the research data, you know,
- 11:38they are very overlapped.
- 11:40It's a it's a large,
- 11:43network of the health system,
- 11:45the Yale University,
- 11:48New England Medical Group spanning
- 11:49all of Connecticut down into
- 11:52New York and up into
- 11:53Rhode Island, touching upon more
- 11:54than a hundred and thirty
- 11:55practices,
- 11:57including the Smile of Cancer
- 11:58Center, it locations,
- 12:01all the YNHH hospitals, including
- 12:03the campuses up in Rhode
- 12:04Island, also the the Chapel
- 12:06Street and York Street campuses.
- 12:08All these are linked together
- 12:10through the Epic
- 12:11medical record system, which really
- 12:13gives us the foundation for
- 12:14providing the data.
- 12:20So we feel the data
- 12:21is very rich. There are
- 12:23more than four point three
- 12:24million patient records
- 12:26in Epic.
- 12:27The database dates back to
- 12:29the implementation of Epic within
- 12:31the health system approximately twelve
- 12:33years ago. And when it
- 12:35went live at all the
- 12:36the different locations across the
- 12:37health system,
- 12:39I've listed a couple of
- 12:40things in there. These are
- 12:41the the
- 12:42the the information. Pretty much
- 12:43any data that is tracked
- 12:45clinically, we can extract out,
- 12:47including all the patient demographics,
- 12:49their vitals, comorbidities,
- 12:51surgical data,
- 12:53all all the labs. You
- 12:54know, there's there's a there's
- 12:55a wealth of data from
- 12:57newborns and deliveries up through
- 12:59geriatric care.
- 13:05So the data source is
- 13:06you know, this is kind
- 13:07of where we
- 13:09where we are, involved is
- 13:11all the data really starts
- 13:12from from Epic. Epic, as
- 13:14you already know, is the
- 13:16electronic
- 13:17health care record system. It
- 13:19is designed for patient patient
- 13:21care.
- 13:22Underneath the hood, however, is
- 13:24the chronicles database, and that
- 13:27database is designed for very
- 13:29quick real time access to
- 13:31individual patient records and to
- 13:33support,
- 13:35you know, physician clinician
- 13:37training of patients.
- 13:39However, though, that database
- 13:42is not as efficient
- 13:44at doing large scale data
- 13:46extracts, reporting across,
- 13:48all all historical time or
- 13:50working on data across large
- 13:52patient cohorts.
- 13:53So with that, the data,
- 13:55in Epic
- 13:57is extracted nightly into the
- 13:58Clarity platform. Clarity is a
- 14:00much larger database. It has
- 14:02nearly everything that's in in
- 14:04the chronicles. It's structured in
- 14:06a very similar format,
- 14:08but it's a SQL database.
- 14:10Our our team has access
- 14:11to it, and it allows
- 14:12for a lot of these
- 14:13larger reporting
- 14:15tools. However, it is not
- 14:16real time. It's a you
- 14:18know, it's extracted daily. It's
- 14:19a day behind. It's also
- 14:20a little more complex. There
- 14:22are approximately twenty thousand tables,
- 14:24in the Clarity database. So
- 14:26it it's it's massive, but
- 14:27it does contain a lot
- 14:28of rich data.
- 14:30To solve some of those
- 14:31issues with the with the
- 14:32speed and the complexity,
- 14:34Epic has created their Caboodle
- 14:36data model.
- 14:37Caboodle has is a smaller,
- 14:39more normalized database designed for,
- 14:43more productionized reporting. There's on
- 14:45the order of six hundred
- 14:46tables.
- 14:47Queries run faster, but it
- 14:49does contain
- 14:50you know, there not everything
- 14:52is is is necessarily in
- 14:54that,
- 14:55but it is easier to
- 14:56use and faster.
- 14:58Moving on,
- 15:00one of one of the
- 15:00also advantages of Capoodle is
- 15:02we can bring in data
- 15:04from outside of Epic as
- 15:05well.
- 15:06So that data has been
- 15:07expanded,
- 15:09as new models are are
- 15:10made available
- 15:11or new data source. For
- 15:13example,
- 15:14there's an issue initiative right
- 15:15now to bring in the
- 15:16data from the tumor registry
- 15:18into Caboodle so that the
- 15:21really accurate,
- 15:23tumor and cancer staging can
- 15:24be linked to, you know,
- 15:26the clinical treatment practices.
- 15:28So that that's one of
- 15:29the the advantage of Caboodle.
- 15:31And the last step here
- 15:32at Yale is there's an
- 15:33additional transformation
- 15:35that that happens to move
- 15:36to the OMOP database.
- 15:39The OMOP is the observational
- 15:41medical outcomes partnership model. It's
- 15:43a common data model using
- 15:45open standards. It is something
- 15:47that can be,
- 15:49used across institutions.
- 15:51And with that, the Cabool
- 15:53data is moved into and
- 15:55transformed
- 15:56into the OMOP database,
- 15:58and it has, on the
- 15:59order of thirty seven tables,
- 16:01a much more straightforward model
- 16:02to use.
- 16:03It has most of the
- 16:04data and, hopefully, a little
- 16:06bit less of a learning
- 16:07curve to to access the
- 16:08data. And as we talk
- 16:09a little bit more about,
- 16:11you know, the data that
- 16:12we provide, and for those
- 16:12of you who are gonna
- 16:13use,
- 16:15CHP and access direct access
- 16:17to the data, we'll be
- 16:18talking about you know, it'll
- 16:19be from the OMOP tables
- 16:20and the OMOP format.
- 16:27So as I mentioned, you
- 16:28know, one of our goals
- 16:29is to promote self-service access
- 16:31to data. And one of
- 16:32the things that we have
- 16:33been working on as a
- 16:35team,
- 16:36is to improve the experience
- 16:38and speed of access to
- 16:39data,
- 16:40through self-service tools. One of
- 16:42the main ways is through
- 16:43our research of basic access.
- 16:45We've also worked on promoting
- 16:47the use of slicer, dicer
- 16:49as a an analytics tool
- 16:52and allowing
- 16:53greater access to that.
- 16:57Also, I'll I'll show it
- 16:58to you later. But, in
- 16:59our request form, we have
- 17:00the JDate report library.
- 17:02All of the reports
- 17:04and,
- 17:05dashboards that have been put
- 17:06together by the the operational
- 17:08clinical joint data analytics team
- 17:10are available
- 17:11available for search,
- 17:13and they may actually they've
- 17:14been, vetted by,
- 17:16clinical practices
- 17:17and may provide an excellent
- 17:19start or the data that
- 17:20you need for your,
- 17:22for your research projects.
- 17:24Prep to research is
- 17:26a an area that we'll
- 17:28talk about a little bit
- 17:29more, but that is where,
- 17:30we'll give you quicker access
- 17:32to the OMOP data for
- 17:34self-service.
- 17:35And two last things that
- 17:36we have added,
- 17:38on an interactive
- 17:40basis is
- 17:41our JADA office hours. You
- 17:43can request a consult with
- 17:44one of our one of
- 17:45our team to help with,
- 17:47you know, understanding the data
- 17:48that's out there,
- 17:50working on slicer, dicer, but
- 17:52we're now opening up this
- 17:53time so so that we
- 17:54can provide more hands on
- 17:55support.
- 17:57And the last thing I
- 17:58just mentioned is we've recently
- 17:59kicked off a Teams channel,
- 18:02supporting, OMOP and OMOP data
- 18:04model. So looking
- 18:05to work with you all
- 18:06as a community to
- 18:08work on questions and answers,
- 18:09a collaborative
- 18:10space, and access to some
- 18:12of our, you know, tips
- 18:14and tricks.
- 18:20Researcher basic access or we
- 18:22keep referring to a lot
- 18:23of times by acronyms. RBA
- 18:24is a consolidation
- 18:26of security roles that
- 18:28researchers often will need to
- 18:30use to access common systems
- 18:31and tools. We've noticed that
- 18:33there's been, historically trouble getting
- 18:35access to many of the
- 18:37EPIC systems
- 18:38or many of the data,
- 18:41models and data roles.
- 18:43And mostly due to multiple
- 18:45iterations, the information isn't always
- 18:46available,
- 18:48how to apply for this.
- 18:49So we've consolidated
- 18:51all of this into one
- 18:53role, which we can help
- 18:54administer.
- 18:55It will help you with
- 18:56things like if you don't
- 18:57have a,
- 18:58Yale name and health system
- 19:00ID or epic ID, we
- 19:01can help that provision that.
- 19:02It'll get you access to
- 19:04the epic epic slicer dicer
- 19:06tool,
- 19:07with expanded data models.
- 19:09It will also allow you
- 19:11to get the basic security
- 19:12for the, computational health platform
- 19:15and,
- 19:16Camino counts, which is the
- 19:17first step in Camino,
- 19:20camino team provisioning.
- 19:22And, it also will provision
- 19:24for you the access to
- 19:26the VDI that we talked
- 19:27about so that you have
- 19:28that secure computing environment,
- 19:30that's preloaded as a Windows
- 19:32environment with a number of
- 19:34common tools. Hopefully, helpful tools
- 19:36are Python,
- 19:37Microsoft Office, OneDrive,
- 19:39SQL, Visual Studio, and the
- 19:41list is growing.
- 19:46So I just wanna touch
- 19:47on how to get, re
- 19:49how to request research and
- 19:50basic access.
- 19:51It used to be a
- 19:52lengthy process. Really, it is
- 19:54now a very, very quick
- 19:55one stop
- 19:56shop. It is
- 19:58it replaces
- 19:59several
- 20:00requests.
- 20:01It can one thing I
- 20:03will note that it this
- 20:04is a security access, so
- 20:05it needs to be,
- 20:07submitted,
- 20:08to the health system via
- 20:10your supervisor or PI.
- 20:12So that means you need
- 20:13a YNHH
- 20:14ID
- 20:15to have access to this.
- 20:17If you have trouble accessing
- 20:18it, email me. I can
- 20:19assist with that as well.
- 20:21But, the form is very
- 20:22easy to fill out. You
- 20:24just need based on basic
- 20:25information
- 20:25such as who you are,
- 20:28who who needs it, and
- 20:29you need to be a
- 20:30member of the covered entity
- 20:31or typically school of medicine
- 20:33or sponsored by someone.
- 20:40SlicerDicer, if you have had
- 20:42a chance, if you're in
- 20:43Epic, you may have seen
- 20:44this already, but SlicerDicer
- 20:46is Epic's,
- 20:47data exploration and visual tool.
- 20:49It is a powerful
- 20:51self-service
- 20:52tool
- 20:53and allowing you to do
- 20:54things such as define patient
- 20:56cohorts,
- 20:57based on clinical criteria
- 20:59and really explore the data
- 21:01that is available to you.
- 21:03In some cases, it can
- 21:05do all of your it
- 21:06can be used as a
- 21:07tool for all of your
- 21:08analysis. In other cases, it
- 21:09will be a great tool
- 21:11to set up,
- 21:12explore the data and work
- 21:14with some of the other
- 21:14teams.
- 21:15So SlicerDicer
- 21:17is,
- 21:18it provides aggregate data. There's
- 21:20there's no PHI,
- 21:22so you don't need an
- 21:23IRB to explore the data
- 21:25within it. So that's one
- 21:26of the advantage of using
- 21:28these self-service tools.
- 21:31It's a great way to
- 21:32get an under overall picture
- 21:33of your study study cohort.
- 21:35You can add in the
- 21:36middle. You can it's kinda
- 21:37small. I know. But there
- 21:39are
- 21:40ways to you can add
- 21:42the criteria.
- 21:43So this one, the blue
- 21:45boxes
- 21:46are the actual filters and
- 21:47criterias you can filter on.
- 21:49The orange are
- 21:52the folders which
- 21:54organize similar criteria.
- 21:56This example is defined to
- 21:58pick patients with diabetes
- 22:00who are not on prednisone
- 22:02and that patient age of
- 22:03eighteen to a hundred and
- 22:04eight with an abnormal hemoglobin
- 22:06a one c. So
- 22:08you can build very complex
- 22:10clinical
- 22:11criteria. You can one of
- 22:13the big tools and hence
- 22:14the name is you can
- 22:15add slices,
- 22:17and that allows you to
- 22:18break up the data
- 22:20by those criteria
- 22:22and this or even define
- 22:23the ranges. So in this
- 22:24case, we've defined we
- 22:27want to break and graph
- 22:28by,
- 22:29an age cohort.
- 22:31We've set the age core
- 22:32or
- 22:33based on the stops in
- 22:35there.
- 22:37So that is,
- 22:39showing us
- 22:41multiple ways to do that.
- 22:48Additionally, there as I mentioned,
- 22:50you get additional with the
- 22:52RBA, you get,
- 22:54twenty five additional clinical data
- 22:56models that are not accessible
- 22:57to everyone.
- 22:58You can link these models.
- 23:00So if you develop a
- 23:00patient cohort and you need
- 23:02to see, detailed lab specifics,
- 23:05you can link to the
- 23:06lab model. So it really
- 23:07gives you the ability to,
- 23:10expand your queries. And also
- 23:12by using the slices and
- 23:13some of the top
- 23:14ten, top fifty features, you
- 23:16can drill down to see
- 23:17what are the
- 23:19categories of the data. You
- 23:20can use it to help
- 23:21define,
- 23:22some of your queries. So
- 23:23if you need to know
- 23:24what are the lab value
- 23:25ranges, what are the names
- 23:26of the labs, what are
- 23:28the diagnoses involved with this,
- 23:29you can do a lot
- 23:30of that exploration right through
- 23:31this tool,
- 23:33without having to write any
- 23:34queries at all.
- 23:40So preparatory to research
- 23:42is,
- 23:43as we keep calling it
- 23:45p two r,
- 23:46is,
- 23:47one of the ways that
- 23:49we can help put data
- 23:50in your hands a lot
- 23:51quicker.
- 23:53With the preparatory to research,
- 23:55if you fill out the
- 23:56form got a little screenshot,
- 23:58and I have links to
- 23:59all of these, request forms
- 24:01or request applications at the
- 24:03end of the presentation. So,
- 24:05that Pina can share that
- 24:06with you.
- 24:09This will allow you
- 24:11to get access to the
- 24:12OMOP limited dataset for for
- 24:14ninety days with the goal
- 24:15of using it. You have
- 24:16access to,
- 24:18the dataset which, you know,
- 24:19the thirty seven tables with
- 24:21direct identifiers
- 24:22removed so that you can
- 24:24query the data,
- 24:25do some detailed analysis of
- 24:27your cohorts, maybe even define
- 24:28the queries for your datasets
- 24:30so that you can you
- 24:31can, you know, create, datasets
- 24:34to use, develop your protocol.
- 24:36At the end of the
- 24:37ninety days,
- 24:38you can
- 24:39switch that over, convert that
- 24:41over with the help of,
- 24:43the JADAP team into an
- 24:45IRB once you have your
- 24:46IRB protocol,
- 24:47into an IRB approved project,
- 24:50we can help provision those
- 24:51datasets,
- 24:52essentially execute those queries, give
- 24:54you exactly the data that
- 24:55you were looking for,
- 24:57along with identifiers.
- 25:00So
- 25:01this can be done by
- 25:03submitting a research data request,
- 25:05which I'll talk about next,
- 25:06and including this form
- 25:08and the statement that you're
- 25:09looking for prep to research
- 25:10because you are working on,
- 25:13you know, developing a protocol.
- 25:20I I think, Nate's gonna
- 25:21talk about this more, but
- 25:22I just wanted to talk
- 25:22a little bit about the
- 25:23the OMAP dataset,
- 25:25with a small graphic just
- 25:27to show you it is
- 25:28is not as complex of
- 25:29a data model as some
- 25:30of as as Clarity, we
- 25:31certainly couldn't,
- 25:33propose.
- 25:34But the the the you
- 25:36know, based on the standards,
- 25:38these are some of the
- 25:39tables that people are finding
- 25:40to be most useful.
- 25:42You have direct access for
- 25:44to query these except for
- 25:45the MRNs and some of
- 25:46the direct identifiers.
- 25:48Once you have prep to
- 25:50research, you know, and
- 25:51you will need to,
- 25:54so so one thing, you'll
- 25:55need to, submit for RBA
- 25:57so they have access to
- 25:58the platforms.
- 26:00But on this,
- 26:02thirty seven tables,
- 26:04these are some of the
- 26:05most, frequent I have a
- 26:06asterisk next to the note
- 26:07table.
- 26:08The notes, and I assume
- 26:10if you're you're gonna work
- 26:11a lot with LLMs, that's
- 26:12probably one of the key
- 26:13things you're looking for.
- 26:16Notes are because they they're
- 26:17not easily de identified, are
- 26:19not included as part of
- 26:20the limited dataset.
- 26:22However, once you've, you know,
- 26:24developed a patient cohort,
- 26:26JDAT can
- 26:28can execute those queries
- 26:30and provide you the the
- 26:32the notes that map to
- 26:33your dataset and linking values,
- 26:36so that you have the
- 26:37PHI readily available.
- 26:44Alright. So maybe the question
- 26:46is a lot of people
- 26:47ask is, okay. How how
- 26:48do I get this? How
- 26:49do I request data from
- 26:50JEDT?
- 26:52And what we're gonna say,
- 26:53again, encourage you to start
- 26:55with the self-service tools, such
- 26:56as slice or dice it.
- 26:57Do do some research, prepare,
- 27:00prepare your questions,
- 27:01prepare as much of your
- 27:02cohort as you can.
- 27:06Use the slice or dice
- 27:07of tools.
- 27:09And, you know, the starting
- 27:10point is is is really
- 27:12the,
- 27:14the the YBIC website. So,
- 27:16well, thanks for showing that.
- 27:17I'll I'll show you specifically
- 27:19where you can get it.
- 27:19But if you can get
- 27:20to the YBIC website, you
- 27:21can make a data request,
- 27:23through JADA. You can submit,
- 27:25get the link to research
- 27:26of basic access,
- 27:27and hopefully, very quickly get
- 27:29to,
- 27:30you know, the starting part
- 27:31for what you need.
- 27:32Prerequisites
- 27:34for data requests or before
- 27:36at least to get the
- 27:37data is research with basic
- 27:38access. That's gonna give you
- 27:39the tools and the security,
- 27:40so we need to submit
- 27:41that.
- 27:42And if you're looking for
- 27:43data, especially PHI,
- 27:45we will need you to
- 27:46have your compliance documents, your
- 27:48IRB protocol, the approval.
- 27:50If you're if you are
- 27:51planning to release the data,
- 27:52we have a daily work
- 27:53worksheet that will help walk
- 27:55you through the questions to
- 27:56determine if you need a
- 27:56data use agreement or executive
- 27:58sponsors to sign off.
- 28:00Typically, that's only in the
- 28:02case of
- 28:03above a certain threshold of
- 28:04data or if the data
- 28:06is being requested to lead
- 28:07the organization.
- 28:09Now always encourage if you're
- 28:11working for with our team
- 28:12to extract the data,
- 28:14some things to make things
- 28:15go smoothly for for you
- 28:16and for us
- 28:17is to really define
- 28:19your inclusion criteria
- 28:21as much detail as you
- 28:22can provide, help us with
- 28:24definitions.
- 28:25SlicerDicer
- 28:26is a great tool to
- 28:27do that. So instead of
- 28:28telling us you need patients
- 28:29with diabetes,
- 28:30you might be able to
- 28:31tell us about, I need
- 28:32these
- 28:33ICD ten codes. I need
- 28:35these values
- 28:37of hemoglobin a one c's.
- 28:39So the more detail you
- 28:41can provide, the more accurately
- 28:42and the more quickly we
- 28:43can assist with with that.
- 28:51Alright. As I mentioned, this
- 28:52is the page
- 28:54on Webex website. It's under
- 28:56the YNIH data extracts.
- 28:58And, you know, for the
- 29:00first step, submitting
- 29:01your research basic access, the
- 29:03orange button at the top
- 29:04will direct you right to
- 29:05that that link in the
- 29:07ServiceNow for the health system
- 29:09to make that request directly.
- 29:10Again, if you have trouble,
- 29:12you can email me. I
- 29:13can assist with that. And
- 29:14then below it is how
- 29:15to submit a research data
- 29:17request. That will bring you
- 29:18to the,
- 29:19the Helix analytics portal,
- 29:21which is our request system
- 29:22for creating a request.
- 29:29So speaking of requests, this
- 29:30is the, the research data
- 29:32request for the Helix request
- 29:34portal.
- 29:36All requests, both research as
- 29:38well as clinical and operational,
- 29:39come through this.
- 29:41So there are request types.
- 29:42This one's prefilled as a
- 29:44research request type as opposed
- 29:45to a regular
- 29:46JADAP data request.
- 29:49We would I would like
- 29:50to point out up in
- 29:50the top left hand corner,
- 29:52it's kinda small, but that's
- 29:53where you can search for
- 29:54the reports.
- 29:55So by keying in a
- 29:57subject, I'd, some content,
- 29:59that's where you'll get every
- 30:01report that Jada has published
- 30:02in the dashboards
- 30:04so that you can get
- 30:05access to an explorer. So
- 30:06if you need inpatient data
- 30:08information, you click in inpatient.
- 30:10You'll get a list of
- 30:11probably a hundred reports,
- 30:13as a starting point.
- 30:17The rest of the form
- 30:17is fairly fairly
- 30:20self explanatory. There's a couple
- 30:22of key fields. Again, the
- 30:23things I would point out,
- 30:25again, please,
- 30:26if you if you're submitting
- 30:27prep to research,
- 30:28attach your your your request
- 30:30form, you know, that your
- 30:31signed document.
- 30:33Put a note in there
- 30:33saying, yes. I'm looking for,
- 30:36prep to research.
- 30:37Fill out the description.
- 30:39Here's the sections there to
- 30:41put in the criteria that
- 30:42you're looking for. If you
- 30:44have complex criteria,
- 30:45go ahead and attach it.
- 30:46The button down the bottom
- 30:47will allow you to attach
- 30:49forms, and we're also looking
- 30:50for you to add attach
- 30:51your IRB protocol and, as
- 30:54well as your approval letter.
- 31:03One last thing. So when
- 31:03you submit the request, it's
- 31:04gonna give you a request
- 31:05number.
- 31:07That request is also a
- 31:08number.
- 31:09If you need to reach
- 31:10out to us, please include
- 31:11that. We are very project
- 31:13number oriented.
- 31:14That's how we kinda link
- 31:15all these things together. So
- 31:17if you can send that,
- 31:18that will help us.
- 31:19Additionally, that's the request number
- 31:21that will show up in
- 31:22the my request link. This
- 31:23is everything you submitted to
- 31:25our team,
- 31:26the status of it, the
- 31:27comments that we have have
- 31:29added to it as well.
- 31:31But, additionally, you can
- 31:33you can also make changes
- 31:35by adding,
- 31:36comments of your own and
- 31:37attaching,
- 31:38documents. So if we reach
- 31:40out, say, by the way,
- 31:41we're looking for a particular
- 31:42compliance document,
- 31:44you can attach it here.
- 31:45This is a a way
- 31:46to streamline and try and
- 31:47get some of these communications
- 31:49out of email.
- 31:53And this is,
- 31:55again, this I guess this
- 31:56presentation will be shared, but
- 31:58here are all the links
- 31:59to the things I've I've
- 32:00talked about,
- 32:01hopefully, for for easy access,
- 32:03and for more reference.
- 32:06And that's all I have
- 32:07for today. I'm happy to
- 32:08take any questions
- 32:10either now or via email
- 32:11as they come up.
- 32:23Yes.
- 32:24So,
- 32:26we don't have this question
- 32:28at your mention. We have
- 32:29a download have a data
- 32:31set. Right? And
- 32:32if
- 32:39we
- 32:46disseminate
- 32:48the.
- 32:49Yeah. Yeah. It it's not
- 32:51as straightforward.
- 32:53That's that's a great question.
- 32:54You know, part of it
- 32:55is going to be
- 32:57where it's being disseminated to.
- 32:59Even if it's de identified,
- 33:00if it's leaving the organization,
- 33:01it's going to have some
- 33:03review by compliance.
- 33:05Sometimes,
- 33:06you know, again, is it
- 33:06de identified to safe harbor
- 33:08methods? What's in your data?
- 33:10What are you sharing? Sometimes
- 33:11the imaging data falls into
- 33:13that. So,
- 33:14I can help advise on
- 33:16those on a case by
- 33:17case level, but we still
- 33:18work with the IT office
- 33:19on that.
- 33:31What batch of modem will
- 33:32be there?
- 33:33You'll need, at least a
- 33:35YNHH ID.
- 33:37So if you don't already
- 33:38have that, you know, research
- 33:39your basic access request, we'll
- 33:41help we'll get that for
- 33:41you.
- 33:49Thank
- 33:51you
- 33:53very
- 33:55much.
- 33:57Thanks, Rich.
- 34:00We'll move to the next
- 34:01speaker.
- 34:03Nate is gonna talk about,
- 34:05ChipSafe
- 34:06computational health platform. Nate, if
- 34:08you could introduce yourself and
- 34:09then start.
- 34:10Thank you.
- 34:15Hi, everyone. I'm Nate Price.
- 34:17I've been with the Yale
- 34:19New Haven Health System
- 34:20since we were just a
- 34:22hospital.
- 34:23And,
- 34:24back in the back in
- 34:25the beginning, I was just,
- 34:28an engineer working with a
- 34:29bunch of like minded nerds
- 34:30in the department of lab
- 34:31medicine.
- 34:32I joined Charley Torrey's Helix
- 34:35data science,
- 34:37group about
- 34:38seven years ago now. And,
- 34:41so we've done a lot
- 34:42of great work since then.
- 34:43We've developed some cool stuff,
- 34:45and,
- 34:46we're enjoying
- 34:47post collaboration with our colleagues
- 34:49in in,
- 34:50bids and and and why
- 34:51that.
- 34:55Let's
- 34:58see.
- 35:00Yeah. First, we're working here.
- 35:03There we go.
- 35:07Before I get into a
- 35:08lot of tech stuff, I
- 35:09wanted to take you back
- 35:10nearly a century,
- 35:12and share with you,
- 35:14my favorite quotation of all
- 35:15time.
- 35:17I'm gonna I can't resist
- 35:18reading the entire thing.
- 35:20It is the, introduction
- 35:22to AA Mills'
- 35:23classic Winnie the Pooh.
- 35:25Here is Edward Bair coming
- 35:27downstairs now. Bump. Bump. Bump.
- 35:30On the back of his
- 35:30head behind Christopher Robin.
- 35:33It is, as far as
- 35:34he knows, the only way
- 35:36of coming downstairs, but sometimes
- 35:38he feels that there really
- 35:39is another way if only
- 35:41he could stop bumping for
- 35:42a moment and think of
- 35:43it.
- 35:45And I feel,
- 35:46I don't know about you,
- 35:47but I feel I feel
- 35:48like that's sort of the
- 35:49story of my life. I
- 35:50think we often get stuck
- 35:52in ways of doing things
- 35:53that are not ideal,
- 35:55but we're too much in
- 35:57the middle of it to
- 35:57step back and think of
- 35:58a of a better way.
- 36:01But,
- 36:02the the, ChipSafe platform,
- 36:04is a way of help
- 36:06helping us to keep from
- 36:07bumping our heads. Like, if
- 36:09you've been trying to do
- 36:10data science on your laptop
- 36:12and its compute power, that's
- 36:13kind of a bump.
- 36:15Trying to figure out where
- 36:16to get large datasets from
- 36:19could be a bump. You
- 36:20need GPU com compute power.
- 36:23On your own, that's that's
- 36:24kind of a bump.
- 36:26And if you're trying to
- 36:27do stuff on your own
- 36:28in a compliant way, that's
- 36:29a huge bump.
- 36:32So this is why we
- 36:34have CHIP and SAFE. CHIP,
- 36:35of course, stands for computational
- 36:37health platform.
- 36:39SAFE is the secure, aligned,
- 36:41flexible environment,
- 36:44which, you know, really, they
- 36:46they they mean they mean
- 36:47the same thing.
- 36:49There's a lot to unpack
- 36:51in this environment.
- 36:53Let me just
- 36:55wanna see if I can
- 36:56get,
- 36:59can I get the laser
- 36:59pointer working,
- 37:06Yeah? I don't see that.
- 37:07I don't see the cursor
- 37:08there. Oh, here's my oh,
- 37:10finally got my cursor. Okay.
- 37:11Thank you.
- 37:12Great. Yep. I'll
- 37:14get that done.
- 37:15Yes. There's a pointer. Okay.
- 37:17Thank you.
- 37:19Alright. I'm gonna work from
- 37:21the bottom and sort of,
- 37:23go in not not exactly
- 37:24the top, top to bottom,
- 37:25way this diagram is organized.
- 37:27But, you know, we have
- 37:29a great deal of storage,
- 37:31which we'll detail
- 37:33shortly. We have both sort
- 37:35of hot,
- 37:37SSD storage and, and a
- 37:39lot and a great deal
- 37:40of of cold storage via,
- 37:43NetApp
- 37:44and the CompRise application.
- 37:47In terms of computation,
- 37:49we have a huge computational
- 37:50array,
- 37:51by Nutanix that provides us
- 37:53all of the CPUs and
- 37:55the and the memory used
- 37:56to spin up the many,
- 37:58many, VMs that comprise
- 38:00the computational health platform.
- 38:03We also have,
- 38:05a a significant and growing
- 38:07number of GPUs,
- 38:09both
- 38:10NVIDIA and and Tesla.
- 38:14We're gonna take a look
- 38:15at this little,
- 38:16ship's wheel symbol here. That
- 38:17is the symbol for Kubernetes,
- 38:19which is a Greek word
- 38:20that,
- 38:21originally means governor or helmsman.
- 38:25It is
- 38:27the helmsman that kind of
- 38:29steers a lot of what
- 38:30goes on in Chip.
- 38:31It is a platform for
- 38:33orchestrating
- 38:34container based applications, which is
- 38:36pretty much all of Chip.
- 38:39It makes applications scalable and
- 38:41fault tolerant.
- 38:42It allows,
- 38:44applications to add more compute
- 38:46power or take some away
- 38:48depending on what's needed.
- 38:52And a lot of the
- 38:53other, many of the other
- 38:55components you'll you'll you'll see
- 38:56in more detail in,
- 38:57upcoming slides.
- 38:59And then Kamino
- 39:00up here is, of course,
- 39:02a key part of Chip,
- 39:02and I'm gonna tell you
- 39:04a lot more about that
- 39:05shortly.
- 39:06Gonna take a quick look
- 39:07at our data assets over
- 39:09here.
- 39:10We do have genomic data
- 39:12in,
- 39:13in Chip in the form
- 39:14of,
- 39:15DCF data
- 39:17from,
- 39:18ACTX,
- 39:19from the act from ACTEX
- 39:20patients and the generations project.
- 39:24As Rich has described, we
- 39:26have the EHR,
- 39:28in OMOP format.
- 39:30We have real time EHR
- 39:32data in terms of some,
- 39:33we have some HL seven
- 39:35feeds that are,
- 39:36updating
- 39:37things like,
- 39:39ADT information, lab results. We
- 39:41can actually
- 39:42have that stuff available
- 39:44faster than it's available in
- 39:45OMOP depending on the application.
- 39:48We have high speed bed
- 39:49monitor and vent data with,
- 39:51like I don't know. We've
- 39:53we we stopped counting the
- 39:54number of billions of data
- 39:55points per month that we're
- 39:56ingesting,
- 39:57but we're getting bed monitors,
- 39:59vents, anesthesia data,
- 40:01in real time into chip.
- 40:04There's clinical data I I
- 40:06won't go into too much
- 40:07detail about a lot of
- 40:08these things. I will mention
- 40:10that you that there is
- 40:11BYO capability that if you
- 40:13need compute power,
- 40:14but you have a dataset,
- 40:16of your own that you
- 40:17want to work on, it's
- 40:18possible to bring your own
- 40:19data to chip.
- 40:21And then, of course, we
- 40:21have LLMs, which is really
- 40:23the primary reason we're all
- 40:24here this afternoon.
- 40:32In Hitchhiker's Guide to the
- 40:33Galaxy, Douglas Adams said,
- 40:36space is big.
- 40:38You just won't believe how
- 40:39hugely mind bogglingly big it
- 40:41is. I mean, you might
- 40:42think it's a long way
- 40:43down the road to the
- 40:44chemist,
- 40:45but that's just peanuts to
- 40:47space.
- 40:48Now we have not made
- 40:49our environment quite as big
- 40:51as space, but we've made
- 40:52it,
- 40:53pretty large, and it's continuing
- 40:55to grow as our user
- 40:57requirements and application requirements do
- 40:59too.
- 41:01Just a few of these
- 41:02numbers. We have,
- 41:04fourteen over fourteen hundred CPU
- 41:06cores available.
- 41:07About half of them are
- 41:08currently in use.
- 41:10Nearly thirteen terabytes of memory.
- 41:14In terms of storage, we've
- 41:16got two tiers of fast
- 41:17storage that total,
- 41:20over four hundred,
- 41:21well, four hundred and sixteen
- 41:23terabytes.
- 41:24We have a storage grid
- 41:25for colder storage that contains
- 41:27seven hundred terabytes.
- 41:29And there's a, storage management
- 41:32application called CompRise that that
- 41:34handles the transfer of data
- 41:35between hot and cold.
- 41:38GPUs, which, of course, everyone's
- 41:40very interested in, we have,
- 41:43sixteen
- 41:45Tesla,
- 41:46GPUs altogether, two Tesla cards
- 41:49of eight each. We have,
- 41:51eight currently, we have eight
- 41:53NVIDIA a one hundreds and
- 41:54eight NVIDIA,
- 41:56h one hundreds.
- 41:57And I think that number
- 41:58is expected to grow as
- 42:00we, as we as we
- 42:02progress.
- 42:07I just wanted to look
- 42:08briefly at our major chip
- 42:10data sources. You know, the
- 42:11the first thing they tell
- 42:12you when you go to
- 42:14make a PowerPoint presentation is
- 42:15don't stand in front of
- 42:16a room and read your
- 42:17slide.
- 42:18But now I'm gonna stand
- 42:19in front of the room
- 42:20and read my slide.
- 42:23We've already mentioned,
- 42:24EHR data is in the
- 42:26OMOP common data model. We'll
- 42:27get into that in a
- 42:28little bit of detail.
- 42:30Real time HL seven data
- 42:32includes things like ADT, lab
- 42:34results, flow sheets,
- 42:36data innovations,
- 42:37orders, and results, which actually
- 42:39contain
- 42:40if you're interested in,
- 42:42laboratory data and instrument data,
- 42:45some of the results and
- 42:47orders going to and from
- 42:48data innovations contain things that
- 42:50are not,
- 42:52in the Beaker Lab system
- 42:54or in or even in
- 42:55the EPIC EHR.
- 42:56We have
- 42:58clinical images
- 42:59that are not generally they're
- 43:00not
- 43:02all stored in SHIP,
- 43:04but,
- 43:05they can be pulled on
- 43:06demand using our,
- 43:08Camino data request
- 43:10process. So we have access
- 43:11to all of the clinical
- 43:12images that are in the
- 43:13vendor neutral archive.
- 43:16CT scans, ultrasounds, X rays,
- 43:18ophthalmological data, ECGs,
- 43:20you know,
- 43:21anything that's a clinical image
- 43:23that was done in the
- 43:24health system is in the
- 43:25v in the VNA
- 43:27and can be pulled,
- 43:29via
- 43:30the our our extraction process.
- 43:33And as I mentioned before,
- 43:34we do have genomic data.
- 43:35We have a lot of,
- 43:36thousands of clinical and research,
- 43:39VCF
- 43:40files from,
- 43:41patients who have, have gone
- 43:43undergone ACTX
- 43:45testing,
- 43:46and generations patients. And we
- 43:47have for generations patients, we
- 43:49have full exome data as
- 43:50well. So that's that's another
- 43:52thing that's available to researchers.
- 43:58Right. So here we are
- 43:59at the OMOP common data
- 44:00model again.
- 44:02Don't wanna dwell on this
- 44:03too much because Rich did
- 44:04a really good job of
- 44:05describing it. But,
- 44:08I just wanna, emphasize that
- 44:10common data models are really
- 44:11the key to collaborative research.
- 44:14Two different health systems may
- 44:15have very different and largely
- 44:17incompatible
- 44:18EHRs, but the data in
- 44:20their common data models,
- 44:22should be at least mostly
- 44:23compatible with each other.
- 44:25And OMOP,
- 44:27maintained by Odysee, the,
- 44:30I forget what the I
- 44:31forget what the acronym expands
- 44:33to, but there's the, there's
- 44:35the URL for it.
- 44:37That's the pre preeminent,
- 44:39common data model these days.
- 44:40And we've,
- 44:41many of our research have
- 44:42all researchers have already used
- 44:44OMOP data extracts to to
- 44:46do collaborative research with other
- 44:48institutions.
- 44:52OMOP does not contain everything
- 44:54that's in the EHR,
- 44:55but its fairly simple data
- 44:57model covers eighty to ninety
- 44:59percent of what's in EPIC.
- 45:01And as Rich already mentioned
- 45:03that there's a a gradual
- 45:04reduction in complexity as you
- 45:06get further away from the
- 45:07Epic Chronicles database. Epic,
- 45:10has an, an ETL to
- 45:12Clarity, which has more than
- 45:14eighteen thousand tables, I guess,
- 45:15close to twenty thousand.
- 45:18That in turn is, extracted
- 45:20to Caboodle, which has some
- 45:22more like five hundred tables.
- 45:24And when you get to
- 45:25OMOP, it actually has a
- 45:27relatively small number, like a
- 45:28few dozen
- 45:30without it it cuts down
- 45:31on data complexity without
- 45:34leaving out significant amounts of
- 45:36the data itself.
- 45:38Means that queries are a
- 45:39lot easier to construct compared
- 45:40to Clarity, and it's eighteen
- 45:42thousand tables. You you still
- 45:44have to understand your data,
- 45:46but the process is a
- 45:47lot simpler.
- 45:49Our OMOP database is pseudo
- 45:51relational, meaning
- 45:53it can behave like a
- 45:54SQL database, and you can
- 45:56actually construct queries
- 45:58in
- 45:59pure SQL,
- 46:01including joins, windows, and things
- 46:03like that, or you can
- 46:04use pure spark syntax.
- 46:06Our computing environment can let
- 46:08you do that either way
- 46:09depending on what you prefer.
- 46:12Either way, you can run
- 46:14queries on OMOP data many,
- 46:16many times faster than you
- 46:17could run data on on,
- 46:19run queries on the equivalent
- 46:20data in a SQL Server.
- 46:24One of the things I
- 46:25didn't, get into that that
- 46:26I didn't even touch on
- 46:28in our big system diagram
- 46:29was data robot.
- 46:32We have a fully functioning
- 46:33data robot installation and ship.
- 46:36I've laid out kind of
- 46:38the basic features,
- 46:40on my slide, but in
- 46:41my sort of limited experience
- 46:43with data robot, it's a
- 46:45really good tool for understanding
- 46:47a dataset even if you're
- 46:48not trying to develop predict
- 46:49a predictive model.
- 46:51But if you are,
- 46:53it's relatively painless way to
- 46:55pull a dataset in,
- 46:58have the, have DataRobot kind
- 47:00of analyze it. It will
- 47:02tell you stuff. It'll basically
- 47:04take care of a lot
- 47:05of your data cleaning and
- 47:07data engineering for you.
- 47:09If you wanna make a
- 47:10predictive model, it will run
- 47:12many, many competing models against
- 47:14each other and will tell
- 47:15you what what the best
- 47:17one seems to be.
- 47:19You can easily tune hyperparameters.
- 47:21It's not a black box
- 47:23because it's very good at
- 47:24actually showing you
- 47:25which parameters,
- 47:27and and which data elements
- 47:28matter.
- 47:31It has some, what is
- 47:32called, MLOps built into it,
- 47:34machine language ops, meaning that
- 47:36if you develop a, predictive
- 47:38model,
- 47:39you can set up an,
- 47:41a method for sort of
- 47:42continuous monitoring of the quality
- 47:44of your prediction and sort
- 47:46of evaluating tuning.
- 47:47And,
- 47:48they've also put a lot
- 47:49of effort into generative AI
- 47:51integration,
- 47:52which I have to say,
- 47:53I
- 47:54don't know exactly how that
- 47:55works, but suffice it to
- 47:57say that DataRobot is highly
- 47:58committed to gen GenAI as
- 48:00well.
- 48:01So I I recommend, you
- 48:02know, data robot if you
- 48:04wanna try
- 48:05to even just play around
- 48:06with, developing a a predictive
- 48:07model.
- 48:10Now we're into Camino,
- 48:13which is kind of the
- 48:14core of Chip.
- 48:16It's the way that most
- 48:17of us will interact with
- 48:18the computational health platform.
- 48:20Here, I'm showing you an
- 48:21art a link to a
- 48:22nice article that Fang Chi
- 48:23Lin and others published last
- 48:24summer.
- 48:25I recommend reading it. It
- 48:27has a slightly different take,
- 48:30on Camino than what I'm
- 48:31presenting here, although I think
- 48:33the information concepts
- 48:35overlap.
- 48:36And,
- 48:37my colleague, Alpa Paselli, will
- 48:39be giving, giving a a
- 48:40proper demo of Camino in
- 48:42a bit, so stay tuned
- 48:43for that.
- 48:47Here is the team architecture.
- 48:52It's you, for for,
- 48:54I think Rich,
- 48:56touched on prep to research
- 48:58teams, which is a, another
- 49:00kind of Camino team. But
- 49:02for most research purposes, if
- 49:04you start with an IRB,
- 49:06and a Helix data request
- 49:08and a PI,
- 49:10if you've got those things
- 49:11established, then,
- 49:13and your request is,
- 49:15submitted and granted, you get
- 49:17a Camino team,
- 49:18whose name is built from
- 49:21the IRB, the Helix data
- 49:22request, and your PI.
- 49:24Every team gets a quota,
- 49:27which,
- 49:28allows you a total number
- 49:29of g, CPUs, GPUs, and
- 49:31memory.
- 49:33Those quotas are flexible. It's
- 49:34not it it's it the
- 49:35quota is assigned depending on
- 49:38the size of the team,
- 49:39the size of the data
- 49:40request, and your and your
- 49:41expected needs. Quotas can also
- 49:43be changed as necessary.
- 49:47So
- 49:48teams,
- 49:50contain
- 49:51users, and users have environments.
- 49:56So a compute
- 49:58every user
- 49:59can spin up
- 50:01one or more,
- 50:02what we call environments, which
- 50:03are essentially
- 50:05fully,
- 50:06fully provisioned,
- 50:08Linux virtual machines.
- 50:10User environments
- 50:12are private. So if you
- 50:13create an environment,
- 50:15you are the only one
- 50:16that accesses the compute power
- 50:18in that environment.
- 50:20So your colleague next to
- 50:21you will have their own
- 50:22environment,
- 50:23and they may have a
- 50:24slightly different they may,
- 50:28define a slightly different environment
- 50:29than you. More or less
- 50:31memory, more or less CPUs,
- 50:33maybe with GPUs, maybe without.
- 50:36Data requests
- 50:38can be added to environments
- 50:41depending on how many you
- 50:42know, the nature of your
- 50:43research and your IRB.
- 50:47Data is shared within the
- 50:49team. So
- 50:50this user I've got you
- 50:51know, I have
- 50:53I'm showing as mounting three
- 50:55separate directories of data requests.
- 50:58They can add that to
- 50:59their environment. But there,
- 51:01another user in that team
- 51:02can also access those data
- 51:04requests.
- 51:06So data requests are shared
- 51:08within a team, and then
- 51:10every team has a shared
- 51:11data folder that is automatically
- 51:13part of everyone's
- 51:15environment.
- 51:20Let's see.
- 51:24Oh, yeah. So a little
- 51:25bit more on environments.
- 51:27Environment is a complete Linux
- 51:29virtual machine.
- 51:31The what you get in
- 51:33your, you can
- 51:35request certain configurations
- 51:38of CPU, GPU, and memory.
- 51:40It is subject to your,
- 51:42availability and your team quotas.
- 51:45When you've defined an environment,
- 51:47what we're looking at here
- 51:48on on the left is
- 51:49what an environment actually looks
- 51:51like in Camino.
- 51:53And there are two,
- 51:56there are two, items here
- 51:58under the active sessions in
- 51:59the active sessions box. One
- 52:01is a is a is
- 52:02a hyperlink.
- 52:03That's the,
- 52:04link to the JupyterLab,
- 52:07GUI,
- 52:08which allows you access to
- 52:10Jupyter,
- 52:11Jupyter notebook,
- 52:13things like our,
- 52:15and
- 52:18the, those of you who
- 52:19are
- 52:19command line fans can also,
- 52:23do SSH directly from your,
- 52:25say, your VDI session,
- 52:28and you can go straight
- 52:29to the command line of
- 52:30your environment. It's the same
- 52:32thing. If you're running a
- 52:34Python or
- 52:35or or some or any
- 52:36sort of script, from the
- 52:38command line, you're really accessing
- 52:40the same machine, the same
- 52:41data, the same directory structure
- 52:42that JupyterLab does.
- 52:47See.
- 52:48And that's I think that's
- 52:49about all I've got to
- 52:50say for that slide.
- 52:54So
- 52:55I've already said that Camino
- 52:57environments have flexible computing power.
- 53:00They can do a lot
- 53:00of different things depending on
- 53:02circumstances. And I'm I'm gonna
- 53:04apologize in advance to our
- 53:05fans in the audience here,
- 53:07but I'm gonna leave it
- 53:08out in my in this
- 53:09slide and focus on Python
- 53:11and PySpark.
- 53:13You could have an environment
- 53:14with two CPUs and eight
- 53:16gigabytes of memory or sixty
- 53:18four CPUs and two hundred
- 53:20and fifty six gigabytes of
- 53:21memory and with or without
- 53:23GPUs.
- 53:26The key thing about PySpark,
- 53:28Python with Spark, is that
- 53:30it is it creates distributed
- 53:32computing.
- 53:33So you can with if
- 53:34you're running a PySpark script,
- 53:36which looks very much like
- 53:37Python
- 53:38with SQL statements thrown in,
- 53:41You can,
- 53:42distribute very large queries over
- 53:44many executors to immensely speed
- 53:46up your your processing. You're
- 53:47not simply
- 53:49even in even with your
- 53:50environment, you've got you're running
- 53:51in one little,
- 53:53virtual machine. But when you're
- 53:55running,
- 53:56queries from JupyterLab,
- 53:58you can actually be spinning
- 53:59up many, many executors to
- 54:01speed up your, compute task.
- 54:05We have,
- 54:06flexible means of putting in
- 54:08data requests. Now everything
- 54:10that I'm describing here should
- 54:12be subject to your
- 54:14IRB and your,
- 54:15Helix data request.
- 54:17And Helix data the data
- 54:19requests made in Camino have
- 54:21to be approved by an
- 54:22admin.
- 54:23But for instance, you can
- 54:26you have a, you know,
- 54:27a few different ways of
- 54:28requesting data.
- 54:30I mentioned genomics, and genomics
- 54:33data is available on Chip,
- 54:34but it's not part of
- 54:35the,
- 54:37sort of GUI based
- 54:40data request mechanism.
- 54:43But I've I've got I'm
- 54:44showing one of the pathways
- 54:46for, for instance, selecting image
- 54:48data.
- 54:51For instance, you can you
- 54:52start by picking imaging or
- 54:54OMOP data,
- 54:56possibly or genomic data eventually
- 54:58once we have that in
- 54:59the GUI.
- 55:00Let's say that you're interested
- 55:01in imaging. You can select
- 55:03your clinical images
- 55:05based on a set of
- 55:06medical record numbers, a set
- 55:08of accession numbers,
- 55:09a bulk upload of IDs
- 55:11if you have a very
- 55:11large number, or or a
- 55:13predefined cohort.
- 55:15And that's one of the
- 55:16really,
- 55:17powerful things we have in
- 55:18Camino.
- 55:19We have a number of
- 55:20cohorts defined based on computed
- 55:23phenotypes.
- 55:25And
- 55:26if there's a cohort phenotype
- 55:28you need that isn't in
- 55:29CHIP, we can that we
- 55:31can easily add it. And
- 55:32cohorts provide a sort of
- 55:34a slicer, dicer style,
- 55:36statistics page so you can
- 55:37actually see what, you know,
- 55:39what the the salient characteristics
- 55:41of your cohort is.
- 55:48We've mentioned VDI.
- 55:51VDI
- 55:51complements
- 55:53Camino
- 55:53environments nicely because,
- 55:55you know, VDI being a
- 55:57virtual Windows desktop,
- 56:00and I should emphasize, it's
- 56:02not chip, but it's chip
- 56:03adjacent. So it works with
- 56:05chip, but it's not like,
- 56:06our group doesn't maintain it.
- 56:08This is something that desktop
- 56:09engineering maintains.
- 56:11All researchers
- 56:12get access
- 56:13to the Yale Research VDI.
- 56:16And,
- 56:18so the,
- 56:19VDI, as I said, and
- 56:20and,
- 56:21Camino environments complement each other.
- 56:22Camino environments have a lot
- 56:24of raw compute power, but
- 56:26they don't have,
- 56:27a lot of nice GUI
- 56:28tools.
- 56:29Conversely, VDI,
- 56:32with Windows doesn't have the
- 56:34data processing power of Camino
- 56:36and PySpark,
- 56:37but they have,
- 56:38apps that are good for,
- 56:40statistics presentation
- 56:42in finished finished datasets, like
- 56:44like Stata, like SAS,
- 56:47even, like, our studio is
- 56:48available.
- 56:50So,
- 56:52the other key thing is
- 56:53that
- 56:54once you have if you
- 56:56are a Camino user
- 56:58and a VDI user,
- 57:00the team shared directory right
- 57:02here
- 57:04can be accessed by your
- 57:05team members from VDI. So
- 57:07let's say you've done a
- 57:09great deal of data crunching
- 57:10on some multimodal thing, and
- 57:12you've got
- 57:14you've got a certain number
- 57:15of images or you've,
- 57:17or you've,
- 57:19produced a dataset derived from
- 57:21OMOP using compute power in
- 57:23Camino,
- 57:24you any team member can
- 57:26drop that information into the
- 57:28your shared folder in Camino.
- 57:30And then
- 57:31in VDI,
- 57:32you can bring up that
- 57:34folder as a as as
- 57:35if it's a Windows shared
- 57:36folder and then operate it
- 57:38on it in
- 57:39your you know, the application
- 57:40of choice.
- 57:42So you get that flexibility
- 57:44of raw compute power and,
- 57:46you know, sort of nice
- 57:47GUI tools and presentation
- 57:49through the sharing of the
- 57:50team directory.
- 57:55Now let's get on to
- 57:56to large language models.
- 58:01Secure computing at Y and
- 58:03HH doesn't,
- 58:04doesn't and many of you
- 58:05have probably already encountered this.
- 58:08You can you can, you
- 58:09know, get your own account
- 58:10with OpenAI, but you can't
- 58:11necessarily do that from
- 58:13within,
- 58:14you can't do that from
- 58:15within the hospital network, and
- 58:16you certainly can't access,
- 58:19arbitrary cloud based
- 58:21large language models and computing
- 58:22resources with,
- 58:24with patient data.
- 58:26So
- 58:27we have,
- 58:28developed a pretty significant,
- 58:31library of large language models
- 58:33within Chip.
- 58:34Here's the list of what
- 58:35we've got right now,
- 58:37and, you know, we are
- 58:38adding to them on a
- 58:39regular basis. So if you,
- 58:43have an application that requires
- 58:45a particular LLM that is
- 58:47not here,
- 58:48you can, you know, you
- 58:49can ask us and we
- 58:50will you know, there's some
- 58:52a little bit of security
- 58:53review, but we'll be happy
- 58:54to include it in our,
- 58:56library of large language models.
- 59:03So the one the one
- 59:06method of accessing
- 59:07large language models in Camino
- 59:09is through a Camino environment
- 59:11with a dedicated GPU.
- 59:14If you
- 59:15reserve
- 59:16GPUs in Camino and there's
- 59:18a formal,
- 59:19GPU request process,
- 59:21you can have GPUs allocated
- 59:23to you for a certain
- 59:25period of time.
- 59:27The obvious advantages
- 59:29are that you get a
- 59:31lot of flexibility because you
- 59:32can say, well, I've got
- 59:33my GPU. I'm gonna try
- 59:35I'm gonna run
- 59:36my,
- 59:38I'm gonna operate on my
- 59:39dataset with,
- 59:40two or three different GPUs
- 59:42in you know, over the
- 59:43course of a few days
- 59:44and see or sorry. Two
- 59:45or three different LLMs
- 59:46and see,
- 59:47how they differ.
- 59:50You get maximum compute power
- 59:51because while you have the
- 59:52GPU, it is yours and
- 59:53yours alone.
- 59:56Really good for multimodal studies
- 59:57because, you know, many we
- 59:59know of at least a
- 60:00couple of research groups that
- 01:00:01are doing studies involving OMOP
- 01:00:03data and clinical image data
- 01:00:05and possibly other things, and
- 01:00:07it's very useful to have
- 01:00:09the the GPU available for
- 01:00:10that.
- 01:00:13Disadvantages
- 01:00:14are kind of what you'd
- 01:00:15expect, that there's a higher
- 01:00:16cost.
- 01:00:17Once you reserve a GPU
- 01:00:19and that reservation is accepted,
- 01:00:20the meter is running, and
- 01:00:21then you're gonna be responsible
- 01:00:23for paying for that resource.
- 01:00:26It's a resource bottleneck
- 01:00:27because if you have a
- 01:00:29GPU,
- 01:00:30reserved and you actually,
- 01:00:32spun up an environment with
- 01:00:35four h one hundreds,
- 01:00:36those h one hundreds are
- 01:00:37not available to anyone else.
- 01:00:39They are not a shared
- 01:00:39resource. They are tied to
- 01:00:41an environment, and they cannot
- 01:00:42be used by anybody else
- 01:00:43until your environment has stopped.
- 01:00:48Also, they require you doing
- 01:00:49things this way require,
- 01:00:51requires more programming expertise.
- 01:00:54For some people, that might
- 01:00:55not be desirable for simpler
- 01:00:56use cases. Like, you may
- 01:00:58simply be wanting to,
- 01:01:01sort of, you know, ex
- 01:01:02yeah, exchange prompts and and
- 01:01:04prompt responses with
- 01:01:06with an LLM in the
- 01:01:07way that we interact with,
- 01:01:09you know, open a OpenAI
- 01:01:10and chat GPT.
- 01:01:15Let's see.
- 01:01:16We have an example. We
- 01:01:18have a sample notebook and
- 01:01:20project for anybody who wants
- 01:01:22to try using dedicated GPU.
- 01:01:25Vincent Zhang of, Hua's team
- 01:01:27wrote,
- 01:01:28a Jupyter notebook
- 01:01:30that it does a simple,
- 01:01:32does a demo demo of
- 01:01:33simple inference and classification.
- 01:01:36And I took that notebook
- 01:01:37and adapted it into a
- 01:01:39fully self contained repo in
- 01:01:41our GitHub enterprise,
- 01:01:43installation.
- 01:01:45And it it's,
- 01:01:46if you have an environment
- 01:01:47with at least one GPU,
- 01:01:49you can run this Jupyter
- 01:01:51notebook and then tailor it
- 01:01:52to your,
- 01:01:53to your requirements. You can
- 01:01:54customize your prompts. You can
- 01:01:56change the data that's being
- 01:01:57fed in. You can see
- 01:01:58how the how the, how
- 01:02:00the LLM behaves. So that
- 01:02:01may be a very useful
- 01:02:03thing if you're looking to
- 01:02:04get started.
- 01:02:09The more efficient way,
- 01:02:11or a more effective way
- 01:02:12to get at,
- 01:02:14LLMs and GPUs is is
- 01:02:16doing it via software as
- 01:02:18a service, which I know
- 01:02:19Vincent and,
- 01:02:20other colleagues are gonna be
- 01:02:22talking about in a lot
- 01:02:22more detail this afternoon.
- 01:02:25But just from the diagram,
- 01:02:26you can see how it
- 01:02:27kind of makes,
- 01:02:29adds flexibility because you have
- 01:02:31one container with a number
- 01:02:32of GPUs,
- 01:02:34multiple teams and multiple users
- 01:02:36can be sending queries to
- 01:02:38it at the same time.
- 01:02:41Containers like the Kiwi system
- 01:02:43can queue requests and queue
- 01:02:45results
- 01:02:46so that
- 01:02:47you're not if you are
- 01:02:49issuing a query, you're not
- 01:02:51you don't get a busy
- 01:02:51signal if the if there's
- 01:02:53a lot going on, but
- 01:02:54you may have to wait
- 01:02:55a little bit.
- 01:02:58The,
- 01:02:59it creates some obvious improvements
- 01:03:01in efficiency because,
- 01:03:03with many people querying a
- 01:03:05GPU, the GPU can be
- 01:03:06running continuously,
- 01:03:08rather than sort of in
- 01:03:10a stop start way than
- 01:03:11having somebody
- 01:03:12with a dedicated GPU run
- 01:03:14a couple of things,
- 01:03:15stop, wait a few hours
- 01:03:16or a few days while
- 01:03:17nobody else can use it.
- 01:03:20So we have we've
- 01:03:22just begun to roll out
- 01:03:24the Kiwi
- 01:03:26containerized
- 01:03:26application in Chip.
- 01:03:28It will be I think
- 01:03:30we're gonna be doing some
- 01:03:31beta testing of it in
- 01:03:32the test environment. Soon, we're
- 01:03:34rolling out to production. And
- 01:03:35then I believe we have
- 01:03:36other,
- 01:03:38other,
- 01:03:39SaaS versions of,
- 01:03:41GPUs and LMs,
- 01:03:43that we'll be rolling out
- 01:03:44in Chip as well to
- 01:03:45maximize people's
- 01:03:48access to to those resources.
- 01:03:51And I think that is
- 01:03:53everything that I have to
- 01:03:55say about Chip and SAFE.
- 01:03:58So thank you for bearing
- 01:03:59with me, and,
- 01:04:00happy to take any questions
- 01:04:02for a couple of minutes
- 01:04:03if anyone has any.
- 01:04:12Thanks.
- 01:04:20There are more time.
- 01:04:21I'll
- 01:04:25so I'll use a data
- 01:04:26science software engineer.
- 01:04:28I'll move from you.
- 01:04:42Yeah. Good afternoon, everyone.
- 01:04:44I'm Al Pacelle.
- 01:04:46I have I'm probably the
- 01:04:48newest member of the team,
- 01:04:49I think,
- 01:04:51almost a year
- 01:04:52with Chip.
- 01:04:55Very exciting for me. I
- 01:04:56came from UnitedHealthcare,
- 01:04:58after twenty seven years, and,
- 01:05:01it
- 01:05:02the data science information
- 01:05:04has been, really exciting to
- 01:05:06get into and start to
- 01:05:07work with.
- 01:05:09So I start wanted to
- 01:05:10start off with a couple
- 01:05:11of good reasons to use
- 01:05:13Chip.
- 01:05:16I think
- 01:05:17kind of a resounding message
- 01:05:19that everybody has spoken about
- 01:05:21so far has been
- 01:05:22the availability of data.
- 01:05:25I think Chip has probably
- 01:05:26one of the best,
- 01:05:29amounts of data available.
- 01:05:30There's three point one million
- 01:05:32individuals
- 01:05:33that have been,
- 01:05:35to
- 01:05:36the hospital or or in
- 01:05:38the system,
- 01:05:39since
- 01:05:40twenty twelve, twenty thirteen.
- 01:05:45All that data is HIPAA
- 01:05:46compliant.
- 01:05:47So, you know, we have
- 01:05:49pretty much everything that,
- 01:05:51everything you could possibly want.
- 01:05:53It's
- 01:05:54data from Epic. I think
- 01:05:56we mentioned the the path
- 01:05:58from
- 01:06:01the the path basically from
- 01:06:02Epic all the way to
- 01:06:03Caboodle.
- 01:06:05And we also have imaging
- 01:06:07data
- 01:06:08in the vendor neutral archive.
- 01:06:10So you can get images
- 01:06:11and data from data off
- 01:06:13those images as well.
- 01:06:19The thing
- 01:06:20Nate mentioned that we have
- 01:06:21a bunch of preload
- 01:06:22preloaded large language models,
- 01:06:25and,
- 01:06:26Nate showed that information as
- 01:06:27well. I'm gonna I was
- 01:06:28gonna demo that, but,
- 01:06:31I'll I'll hit the actual,
- 01:06:34the actual repository so you
- 01:06:35can see them live.
- 01:06:39And
- 01:06:40it's a high speed compute
- 01:06:42environment. I think Nate's numbers
- 01:06:43are probably better than MindMiner,
- 01:06:45I think from
- 01:06:47couple years ago.
- 01:06:48But,
- 01:06:50you know, over over seventeen
- 01:06:51hundred CPUs,
- 01:06:53twenty eight GPUs,
- 01:06:55petabyte of storage and change.
- 01:06:57So all
- 01:06:59all super good.
- 01:07:04So Camino. Camino is a
- 01:07:06curated data broker. It's a
- 01:07:08front end chip.
- 01:07:11Users get their own custom
- 01:07:13compute environment.
- 01:07:14Like we mentioned, it was
- 01:07:15Linux.
- 01:07:19We also provide Jupyter Notebooks
- 01:07:20as an interface for coding.
- 01:07:22So you can code in
- 01:07:23Python
- 01:07:24three, PySpark,
- 01:07:26PyTorch,
- 01:07:28and Nate's favorite, R.
- 01:07:34Amino comes preloaded with Python
- 01:07:36packages. We have a pretty
- 01:07:38good number of common packages,
- 01:07:41as well as our analytics
- 01:07:43packages as well.
- 01:07:46There's also the ability to
- 01:07:47add additional packages. So if
- 01:07:50you're if you have questions
- 01:07:51about whether your favorite package
- 01:07:53is available or not,
- 01:07:55I will show you in
- 01:07:56a few minutes how to
- 01:07:57actually ask me to add
- 01:07:59more packages.
- 01:08:00So and same goes for
- 01:08:02LLMs.
- 01:08:03We have a bunch of
- 01:08:04preloaded LLMs.
- 01:08:06I think we have a
- 01:08:07pretty nice,
- 01:08:09cross section of them. But
- 01:08:11if you find something that
- 01:08:12you're interested in that
- 01:08:15you really want,
- 01:08:17you can put in a
- 01:08:18request and talk to me,
- 01:08:19and I will be happy
- 01:08:20to see if we can
- 01:08:21get that loaded for you.
- 01:08:24Also, there's more information on,
- 01:08:26we have a
- 01:08:28on the com Camino chip
- 01:08:30user group website.
- 01:08:32Sorry. In that website, but
- 01:08:34in the, Yale University
- 01:08:36Microsoft Teams instance, we have
- 01:08:38a Camino chip user group.
- 01:08:40And in that Camino chip
- 01:08:41user group,
- 01:08:43there's a welcome packet. It
- 01:08:44has all that information, the
- 01:08:46analytics packages,
- 01:08:47the LLMs, etcetera.
- 01:08:50And just a little fun
- 01:08:51fact,
- 01:08:52this picture here up on
- 01:08:53the wall,
- 01:08:55that is our data center.
- 01:08:56That picture came from our
- 01:08:58data center.
- 01:08:59I'm not sure which closet
- 01:09:01has the LLMs in it,
- 01:09:02but or sorry. The, GPUs
- 01:09:05in it, but they're right
- 01:09:07there, literally. Yeah.
- 01:09:11But with all the heat
- 01:09:12coming off of it.
- 01:09:17Alright.
- 01:09:19So I'm gonna show you
- 01:09:20a few things right now.
- 01:09:21I'm gonna show you how
- 01:09:22to reserve a GPU. So
- 01:09:24Nate mentioned that,
- 01:09:26we do have GPUs available,
- 01:09:28and you can request
- 01:09:30a GPU.
- 01:09:31And, I will show you
- 01:09:32how to do that reservation.
- 01:09:35Also show you the,
- 01:09:38how to request additional analytics
- 01:09:40packages and models if you
- 01:09:42have
- 01:09:42interest in loading up additional
- 01:09:44models.
- 01:09:46I'll show you how to
- 01:09:47create an environment, and I'll
- 01:09:48show you an actual Jupyter
- 01:09:50notebook.
- 01:09:50I'm not gonna actually run
- 01:09:51through it, but I will
- 01:09:53I will I've already prerun
- 01:09:55it, and I will show
- 01:09:56you the results.
- 01:09:58So without further
- 01:10:02ado,
- 01:10:24And this is actually the
- 01:10:26OHDSI,
- 01:10:28Odysee
- 01:10:30website that Nate was discussing
- 01:10:32earlier.
- 01:10:36Alright. So this is Camino.
- 01:10:38Sure.
- 01:10:44Screen's not sharing. Gotcha. Yeah.
- 01:10:51Yep.
- 01:10:56System share.
- 01:11:01One sharing, one stop.
- 01:11:15Yeah.
- 01:11:30Alright. Thanks, Juan.
- 01:11:33Alright. Sorry about that, folks.
- 01:11:38Alright. So the first thing
- 01:11:40I'm gonna show you is
- 01:11:40how to request a GPU.
- 01:11:42So,
- 01:11:44on everybody's
- 01:11:46account,
- 01:11:47there's a drop down
- 01:11:49that shows,
- 01:11:51GPU reservation in the list.
- 01:11:55K. And when you go
- 01:11:56to create a GPU reservation,
- 01:11:57you just click on the
- 01:11:58new GPU
- 01:11:59reservation.
- 01:12:02It'll prefill with your team
- 01:12:03name and your username.
- 01:12:06It'll also give you a
- 01:12:08drop down that will list
- 01:12:09all the available
- 01:12:11GPU models that we have
- 01:12:13as well as the sizes.
- 01:12:14So two GPU, four GPU,
- 01:12:16eight GPU.
- 01:12:19Hey. I'm gonna just pick,
- 01:12:22everybody seems to like the
- 01:12:23h one hundred, so I'll
- 01:12:24do a two h one
- 01:12:25hundreds.
- 01:12:29You can choose
- 01:12:30how long you wanna reserve
- 01:12:31them.
- 01:12:32I'll give you an advanced
- 01:12:33warning.
- 01:12:34Typically, if you get a
- 01:12:35request in today,
- 01:12:37I may
- 01:12:39I may actually grant it
- 01:12:41to you today.
- 01:12:42The day I grant it
- 01:12:43to you, I usually give
- 01:12:44you the rest of the
- 01:12:45day for free because
- 01:12:47I don't
- 01:12:48I don't always,
- 01:12:51I don't always know if
- 01:12:52you put it in at,
- 01:12:52like, eight AM or if
- 01:12:53you put it you know,
- 01:12:54if I approve it at
- 01:12:55noon or twelve thirty or
- 01:12:57one o'clock. I don't wanna
- 01:12:58shortchange you by half a
- 01:13:00day. So you get you
- 01:13:01get the full you get
- 01:13:02the rest of the day
- 01:13:02for free, then you get
- 01:13:04the next day completely.
- 01:13:06So, I'm gonna put it
- 01:13:08in for one day.
- 01:13:10Whoops.
- 01:13:11Give me that.
- 01:13:14COA is the chart of
- 01:13:15accounts field.
- 01:13:16So if you are,
- 01:13:19we're we're planning at some
- 01:13:20point to start billing for
- 01:13:21these. So if you have,
- 01:13:24for your grant, a chart
- 01:13:25of accounts available, you can
- 01:13:26put your COA number in
- 01:13:27here, which will help you
- 01:13:29to,
- 01:13:30not have to get contacted
- 01:13:31by me to find out
- 01:13:32what your COA is before
- 01:13:33I approve it.
- 01:13:35And then,
- 01:13:36if you put something in
- 01:13:37for
- 01:13:38what the purpose of the
- 01:13:39GPU is to help kinda
- 01:13:41understand what you're doing with
- 01:13:43it.
- 01:13:56K. Obviously, I can't type.
- 01:13:59Alright. When you submit it,
- 01:14:02you'll see it will show
- 01:14:03as pending.
- 01:14:05Okay? And then when somebody
- 01:14:06goes and approves that, it
- 01:14:08will change to approved.
- 01:14:10Usually, if I can't approve
- 01:14:11it, I'll reach out to
- 01:14:12you and find out what
- 01:14:13you'd like to do.
- 01:14:15Typically, the only reason I
- 01:14:17don't approve them is because
- 01:14:19I don't have one available
- 01:14:21because other people already are
- 01:14:22using them.
- 01:14:23So if I reach out
- 01:14:24to you, it's probably gonna
- 01:14:25be, can I do this
- 01:14:27for you in five days
- 01:14:28or four days, or
- 01:14:31is it okay if we
- 01:14:31do this next week?
- 01:14:33But for the most part,
- 01:14:35we've been pretty good about
- 01:14:36sharing the GPUs, so they've
- 01:14:38been going pretty pretty fast.
- 01:14:42Alright. I did mention that,
- 01:14:44I also wanted to show
- 01:14:45you how to ask for
- 01:14:46additional models if you want
- 01:14:48a different model.
- 01:14:50Also, if you,
- 01:14:52have any analytics packages you
- 01:14:54wanna install,
- 01:14:55there is a
- 01:14:56report and enhancement on the
- 01:14:57left hand side here.
- 01:15:03This will take you
- 01:15:04hopefully, if I yep.
- 01:15:07You may have to log
- 01:15:08in to your y n
- 01:15:09h h credentials
- 01:15:10before you do this, but
- 01:15:12it will take you to
- 01:15:13a
- 01:15:13form.
- 01:15:15It's a pretty simple form.
- 01:15:16It'll prefill with all your
- 01:15:17information.
- 01:15:19All you need to do
- 01:15:19is put in a title
- 01:15:21and a description. So,
- 01:15:23I mean, on the title,
- 01:15:24you just put requesting new
- 01:15:25large language model. And in
- 01:15:28the, description,
- 01:15:30tell me
- 01:15:31what you'd like, where it
- 01:15:32comes from,
- 01:15:33how it can get it.
- 01:15:36We are
- 01:15:37heavily invested in Hugging Face,
- 01:15:40so we have a lot
- 01:15:40of,
- 01:15:41opportunity to grab Hugging Face
- 01:15:43models if you use Hugging
- 01:15:45Face.
- 01:15:48Those those are pretty much,
- 01:15:50approved for use, so we
- 01:15:52can we can pull them
- 01:15:54from hugging face.
- 01:15:56For the analytics models, basically,
- 01:15:58anything you can pip install,
- 01:16:00let me know.
- 01:16:01I tried
- 01:16:03to I tried to do
- 01:16:03a juggling act of, do
- 01:16:05we wanna install
- 01:16:07something that we already have
- 01:16:08a similar
- 01:16:10a similar analytics package for?
- 01:16:14So if there's something out
- 01:16:15there
- 01:16:16that's special for you, it
- 01:16:18helps for me to understand
- 01:16:19why it's different than something
- 01:16:21else that's out there. You
- 01:16:23know? Why why seaborne when
- 01:16:25we could use matplotlib?
- 01:16:27You know, what we actually
- 01:16:28have both of those. But,
- 01:16:30you know, if you let
- 01:16:31me know what the difference
- 01:16:32what what you're interested in
- 01:16:34and why, that'll help me
- 01:16:36kinda make a decision to
- 01:16:37move forward with it or
- 01:16:38not.
- 01:16:46Alright. I'm gonna show you
- 01:16:47a little bit in about
- 01:16:48the environments. So we
- 01:16:51there we say, don't do
- 01:16:52live demos.
- 01:16:54Oh, question? Question.
- 01:16:56They talk about the rates.
- 01:16:57How much would it be
- 01:16:59possible?
- 01:17:01I'll I'll just say it's
- 01:17:02very competitive with Hugging Face.
- 01:17:04So Hugging Face charges about
- 01:17:06a hundred and twenty dollars
- 01:17:07a day.
- 01:17:09Ours are about a hundred
- 01:17:10and twenty dollars for seven
- 01:17:12days. So
- 01:17:13it's it's pretty competitive.
- 01:17:16I think the if I
- 01:17:18recall correctly, the h one
- 01:17:19hundreds are forty cents
- 01:17:21per hour reservation,
- 01:17:23and the a one hundreds
- 01:17:25are thirty cents per hour,
- 01:17:26and the v one hundreds
- 01:17:28have not been
- 01:17:30charged before yet. So
- 01:17:32So you've gone to amount
- 01:17:35of May where we are
- 01:17:36trial period.
- 01:17:53Yeah.
- 01:18:03Yeah. It's it's it's very
- 01:18:05it's very competitive compared to
- 01:18:07other providers like Hugging Face.
- 01:18:11I'm gonna show you how
- 01:18:12to create a new environment.
- 01:18:14Alright. So I'm just gonna
- 01:18:15create an environment here.
- 01:18:20I think Rich mentioned we're
- 01:18:21kind of
- 01:18:22helix number
- 01:18:24oriented, so
- 01:18:25a lot of my names
- 01:18:26tend to have helix numbers
- 01:18:28in them.
- 01:18:30I'm gonna just make this
- 01:18:32one
- 01:18:34my favorite number.
- 01:18:50Alright. When you
- 01:18:56when you are selecting an
- 01:18:57environment,
- 01:18:58you need to make sure
- 01:18:59that your image matches whatever
- 01:19:02environment you're gonna build. So
- 01:19:05in this case, I'm gonna
- 01:19:06build an h one hundred
- 01:19:08environment. So I need to
- 01:19:10match the h one hundred
- 01:19:12environment with the GPU and
- 01:19:14PyTorch
- 01:19:15using a one hundred and
- 01:19:16h one hundred.
- 01:19:18So the last one here
- 01:19:19will do either a one
- 01:19:21hundred or h one hundred.
- 01:19:22The one above it, only
- 01:19:23v one hundreds,
- 01:19:25and the other one is
- 01:19:26for just Python and PySpark.
- 01:19:34Alright. So like I said,
- 01:19:35when you match them,
- 01:19:36the size of the environment,
- 01:19:38the GPUs you're gonna use,
- 01:19:39the CPUs you're gonna use
- 01:19:40needs to match with the
- 01:19:42image that you're using. So
- 01:19:44this will have to be
- 01:19:45an h one hundred.
- 01:19:48I'll keep you the eight.
- 01:19:49Let's see.
- 01:19:58K. So two h one
- 01:19:59hundreds
- 01:20:02should do it.
- 01:20:05K. And when you create
- 01:20:06an environment, you'll have an
- 01:20:07opportunity to put a data
- 01:20:10request in. So,
- 01:20:12I'm gonna put in a
- 01:20:13couple of different data alert
- 01:20:14data requests.
- 01:20:19The first data request I
- 01:20:20put in is gonna be
- 01:20:22this large language model access.
- 01:20:24When you request
- 01:20:26a large when you request
- 01:20:27using a GPU, I will
- 01:20:29automatically give you an LLM
- 01:20:31access to go with it.
- 01:20:33That way you can access
- 01:20:35the predownloaded
- 01:20:36large language models.
- 01:20:38Additionally, I'm gonna throw in
- 01:20:39some data here.
- 01:20:45So I have a limited
- 01:20:46dataset,
- 01:20:49folder, which I'll throw in
- 01:20:51there as well.
- 01:20:58Now for the tricky part.
- 01:20:59If I start this up,
- 01:21:03I'm actually starting up a
- 01:21:05two GPU environment now. So,
- 01:21:08in theory, it will start
- 01:21:10up. In practice,
- 01:21:12right now, I think we
- 01:21:12have a lot of people
- 01:21:13using h one hundreds, so
- 01:21:14I may not actually get
- 01:21:15the h one hundred started.
- 01:21:17It may throw an error
- 01:21:18for me,
- 01:21:19but I did wanna just
- 01:21:20at least show it.
- 01:21:24H one hundreds do take
- 01:21:25up a little bit of
- 01:21:26time to spin up.
- 01:21:28The virtual CPUs, the regular
- 01:21:30CPUs,
- 01:21:31spin up pretty quickly. They're
- 01:21:32they take about a minute,
- 01:21:33two minutes.
- 01:21:36The
- 01:21:37LLM I'm sorry. The, the
- 01:21:39h one hundreds, the GPUs,
- 01:21:41the a one hundreds take
- 01:21:43a little bit more time
- 01:21:44to spin up. They sometimes
- 01:21:46go, like, four or five
- 01:21:48minutes. But while we're waiting
- 01:21:49for that spin up, we'll
- 01:21:51do something else as well.
- 01:21:53So
- 01:21:55while that's going,
- 01:22:01we'll go to an environment
- 01:22:02I already have running.
- 01:22:06K. So
- 01:22:09this environment is a two
- 01:22:10CPU,
- 01:22:12eight gigabyte of RAM environment.
- 01:22:15So no GPUs on this
- 01:22:16one. This is just plain
- 01:22:17pie
- 01:22:18pie spark.
- 01:22:24And
- 01:22:26clicking on this link will
- 01:22:27bring up the JupyterLab.
- 01:22:32So in this JupyterLab,
- 01:22:34this is actually
- 01:22:36one that I set up
- 01:22:37a while back. I did
- 01:22:38some
- 01:22:40some SQL testing in here
- 01:22:41just for fun, just to
- 01:22:43get used to the environment.
- 01:22:48So you can see here
- 01:22:49I have
- 01:22:50data. So this is the
- 01:22:51data that Rich was talking
- 01:22:52about. These are literally the,
- 01:22:56parquet files.
- 01:22:57So in here
- 01:23:00are parquet files for each,
- 01:23:04for each and every
- 01:23:07grouping of data. So
- 01:23:10this is like the concept
- 01:23:12tables.
- 01:23:15Almost everybody deals with person
- 01:23:17tables. So these parquet files
- 01:23:19are not a one to
- 01:23:20one relationship. You won't have
- 01:23:21three million if you're looking
- 01:23:23for everybody.
- 01:23:24Each parquet file can contain
- 01:23:26thousands of records. So
- 01:23:29the the fun part is
- 01:23:30reading them all together,
- 01:23:33which I've already done for
- 01:23:34us because I didn't wanna
- 01:23:36spend too much time on
- 01:23:37this.
- 01:23:40So you can see here,
- 01:23:41this is the code that
- 01:23:42reads in the data.
- 01:23:46Alright?
- 01:23:48Every now and then, you
- 01:23:49have little warnings from,
- 01:23:51from
- 01:23:52Jupyter
- 01:23:53because of things that come
- 01:23:55up.
- 01:23:56Some of them usually, the
- 01:23:57warnings you can ignore, but
- 01:23:59errors you have to fix.
- 01:24:02Right? So this is where
- 01:24:04I lead read in the
- 01:24:05OMOP files.
- 01:24:07You can see here, I
- 01:24:08actually read in the OMOP
- 01:24:09files. I got output.
- 01:24:13And this is only the
- 01:24:14first twenty five lines. There's
- 01:24:15about two thousand,
- 01:24:17if I recall correctly, in
- 01:24:18this file.
- 01:24:21So, basically, I read in
- 01:24:23information about a small cohort,
- 01:24:28and I actually
- 01:24:29did a little bit. I
- 01:24:31just split it up to
- 01:24:32see, like, okay. Analytic you
- 01:24:34know, just to do some
- 01:24:35light analytics,
- 01:24:36what
- 01:24:37what group were males versus
- 01:24:39females. So
- 01:24:41it's a graph of male
- 01:24:42versus female.
- 01:24:45I also spit out the
- 01:24:46data into my
- 01:24:50team directory.
- 01:24:57So the actual data is
- 01:24:59here.
- 01:25:02Actually, we have it open.
- 01:25:04So these were the results.
- 01:25:06So people
- 01:25:08who have
- 01:25:10chronic myeloid
- 01:25:12leukemia.
- 01:25:13Right?
- 01:25:15And I have it split
- 01:25:15up based on what their,
- 01:25:18what their condition source code
- 01:25:20was. So these source codes
- 01:25:22are ICD nine codes.
- 01:25:29This is the SQL that
- 01:25:30ran.
- 01:25:31So the c you can
- 01:25:32see from the SQL,
- 01:25:34basically, you're looking at,
- 01:25:37persons who have a specific
- 01:25:39set of source
- 01:25:41condition codes, and their birth
- 01:25:43date is before March first
- 01:25:45two thousand five.
- 01:25:47And specifically in the date
- 01:25:49range
- 01:25:50where they were seen between,
- 01:25:52December first twenty twenty two
- 01:25:55and
- 01:25:56January first twenty twenty three.
- 01:25:58So that's,
- 01:25:59when I say two thousand,
- 01:26:01two thousand people
- 01:26:03just in
- 01:26:04that one month time frame.
- 01:26:06So
- 01:26:08let's see if that other
- 01:26:09environment started.
- 01:26:19Yeah. It's still pending. So,
- 01:26:22I'm not gonna wait for
- 01:26:23it to start, but I
- 01:26:23did wanna just mention that
- 01:26:26when it when it does
- 01:26:27start,
- 01:26:28it's a good day. When
- 01:26:30it doesn't start, you'll get
- 01:26:31an error message that shows
- 01:26:33you why it didn't start.
- 01:26:34So
- 01:26:36with that, are there any
- 01:26:38questions?
- 01:26:42Is there a way that
- 01:26:44you can delete your
- 01:26:46environment that, like, you know,
- 01:26:49Yeah. So,
- 01:26:51we're working on
- 01:26:53the the delete functionality,
- 01:26:55so we can just get
- 01:26:56rid of them. But, basically,
- 01:26:57if you don't start the
- 01:26:58environment, if you shut down
- 01:27:00the environment, it's not taking
- 01:27:01up any resources, so it's
- 01:27:04not a not a major
- 01:27:05concern.
- 01:27:06I do ask people to
- 01:27:07shut down environments when they're
- 01:27:09done because, like everything else
- 01:27:11on this planet, it's finite.
- 01:27:12And,
- 01:27:13you know, if we shut
- 01:27:15down environments where we're not
- 01:27:16using them, you'll have plenty
- 01:27:17of
- 01:27:18resources for other people to
- 01:27:20share. So
- 01:27:22Yeah. Good question. I was
- 01:27:22just wondering, when within the
- 01:27:24stats, when do you create
- 01:27:26That before the environment
- 01:27:29Yeah. So good question. So
- 01:27:32so you can see here,
- 01:27:34right now, I'm a member
- 01:27:35of chip admin.
- 01:27:37If you don't have more
- 01:27:38if you have when you
- 01:27:39request the team
- 01:27:41through the process of that,
- 01:27:45that Rich showed earlier where
- 01:27:47you put in a a
- 01:27:48helix request. So you first,
- 01:27:49you request RBA.
- 01:27:52K. Once you have researcher
- 01:27:53basic access, you'll have a
- 01:27:55YNHH ID,
- 01:27:57then you can request a
- 01:27:58team. When you request the
- 01:28:00team
- 01:28:01through the Helix request, you'll
- 01:28:02get a team,
- 01:28:06okay,
- 01:28:07which will contain your IRB
- 01:28:09number
- 01:28:10or p two r, if
- 01:28:11you're a p two r
- 01:28:11team,
- 01:28:13the helix number, and the
- 01:28:15PI's last name.
- 01:28:19When when you get the
- 01:28:20team,
- 01:28:22if you only have one
- 01:28:23team, you won't see this
- 01:28:24team drop down
- 01:28:26because
- 01:28:27you only show it once
- 01:28:28you have multiple teams.
- 01:28:32You'll then be able
- 01:28:33to have a project.
- 01:28:35Typically, the
- 01:28:37data the the JADAT RIO
- 01:28:39team will put
- 01:28:41a actual,
- 01:28:42project out here for you.
- 01:28:44And under the project is
- 01:28:46where
- 01:28:47that data information gets loaded.
- 01:28:49So,
- 01:28:51we do have
- 01:28:53we do have self-service capability,
- 01:28:55but it does need to
- 01:28:56be approved. So
- 01:28:58but right now, asking people
- 01:29:00to work with the JADAP
- 01:29:01team
- 01:29:02to make sure that they
- 01:29:03have their data,
- 01:29:04their their cohort
- 01:29:06well identified before they go
- 01:29:08through this.
- 01:29:11And then once once your
- 01:29:12data is here,
- 01:29:14then under environments, when you
- 01:29:16create an environment,
- 01:29:18you'll actually be able to
- 01:29:19load that data into that
- 01:29:20environment.
- 01:29:21That answer your question?
- 01:29:24Yeah. It's on a solution
- 01:29:25where to.
- 01:29:28I can.
- 01:29:56Alright. That yeah. You need
- 01:29:57to have a YNHH account
- 01:29:59to get here.
- 01:30:00This is
- 01:30:01this is actually on which
- 01:30:03is,
- 01:30:05is this which one of
- 01:30:06your links on your yeah.
- 01:30:08Rich Rich's portion of the
- 01:30:10presentation had the link to
- 01:30:12Helix,
- 01:30:12but this is the actual,
- 01:30:14live website.
- 01:30:18And, yeah, it's
- 01:30:25yeah, it's it's pretty straightforward
- 01:30:27once you start entering the
- 01:30:28data in there.
- 01:30:30Question?
- 01:30:45Yeah. It does. It just
- 01:30:52didn't even think of it.
- 01:30:53Could you get a gold
- 01:30:55star.
- 01:30:57Yes.
- 01:31:03Great question. So if you
- 01:31:04have custom data, I'm I'm
- 01:31:05gonna let Rich field this
- 01:31:07one.
- 01:31:08If you have custom data
- 01:31:10and I'll tell you when
- 01:31:10I field it, and then
- 01:31:11you can tell me where
- 01:31:12I'm wrong.
- 01:31:13If you have custom data,
- 01:31:15you'll be working with one
- 01:31:16of the JADAT team.
- 01:31:18Somebody from Rich's team will
- 01:31:19work with you
- 01:31:20to load that data,
- 01:31:23into
- 01:31:24Camino in the chip. That's
- 01:31:26probably the best answer. There
- 01:31:27are some things going on
- 01:31:28as you're aware of to
- 01:31:30map directories in the DDI,
- 01:31:32the CHP.
- 01:31:33That's not a single type
- 01:31:34of process. So for some
- 01:31:35time use, I would remove
- 01:31:37the new point.
- 01:31:41Right. Any other questions?
- 01:31:45Alright. Thank you.