Skip to Main Content

4-29-25 Workshop Session 1

June 04, 2025
ID
13190

Transcript

  • 00:10Good afternoon, everyone.
  • 00:11Thank you for joining us
  • 00:13today.
  • 00:15First of all, apologies for
  • 00:16any inconvenience.
  • 00:18We had a lot of
  • 00:19registrants, you know, to overflow
  • 00:21the room capacity. I was
  • 00:23still getting people. Okay. Can
  • 00:24I register till the last
  • 00:25moment? So I might have
  • 00:27told some of you we
  • 00:28are out of food and
  • 00:29things and all, but I
  • 00:31think some people didn't show
  • 00:33up. So if you wanna
  • 00:34grab something, there is still
  • 00:35food out there.
  • 00:38With that said, I guess,
  • 00:40other things are popular. That
  • 00:41is the reason why everyone
  • 00:43is here.
  • 00:45And this is the first
  • 00:47YBIC seminar,
  • 00:49and we recently people have
  • 00:51been, asking us what is
  • 00:52this place. So this is
  • 00:54basically the Department of Biomedical
  • 00:55Informatics and Data Science. We
  • 00:57just moved here a few
  • 00:58weeks back,
  • 01:00from hundred college ninth floor
  • 01:02to hundred and one college
  • 01:04tenth floor.
  • 01:06And, yeah, as I said,
  • 01:07this is the first five
  • 01:08week. It's a seminar series,
  • 01:09and this is the first
  • 01:10one. On May twenty ninth,
  • 01:13morning, we are gonna have
  • 01:14another one that is gonna
  • 01:15be on market scan.
  • 01:18So thanks again, for joining.
  • 01:21I hope we meet your
  • 01:23expectations
  • 01:24and you have a great
  • 01:25time learning.
  • 01:26I would first welcome doctor
  • 01:27Korsho,
  • 01:28to give a brief introduction
  • 01:30of what YVIX stands for
  • 01:32and what it is all
  • 01:32about.
  • 01:35So, thanks. Vipina, you should
  • 01:37introduce yourself.
  • 01:39Yeah. I So thanks, Vipina,
  • 01:41Sue, and all others for
  • 01:42the, organization of these events.
  • 01:44Yeah.
  • 01:46Oh,
  • 01:47I'll just go ahead to
  • 01:50so
  • 01:51oh, does it move?
  • 01:58I guess most of you
  • 02:00first time being this building.
  • 02:01Right?
  • 02:03Okay.
  • 02:05So we actually moved here
  • 02:07two weeks ago, like said,
  • 02:08and,
  • 02:11we are at, my name
  • 02:12is Washu. I'm a professor
  • 02:14vice chair for research at
  • 02:15the department of biomedical informatics
  • 02:17and data science.
  • 02:18Today, this workshop is organized
  • 02:20actually by, a office called
  • 02:22YBIC,
  • 02:23Yale Biomedical Informatics and Computing.
  • 02:26It's led by doctor,
  • 02:28Lucila Ono Machado, deputy dean,
  • 02:30for biomedical informatics at the
  • 02:32Yale School of Medicine.
  • 02:34And, I don't know how
  • 02:35many of you have actually
  • 02:37accessed the YBIC website. Can
  • 02:38you show me your hands
  • 02:40if
  • 02:41very few? So it's good
  • 02:42opportunity for me to promote
  • 02:44this.
  • 02:45So the the reason too
  • 02:47is that Xi YBIC is
  • 02:49really trying to make a
  • 02:50central hub for folks,
  • 02:53looking for biomedical datasets,
  • 02:56or you are looking for
  • 02:57software tools to conduct a,
  • 02:59clinical research.
  • 03:00Also, maybe also, for a
  • 03:02lot of medical AI stuff,
  • 03:04you are looking for more
  • 03:05secure computing environment, right, to
  • 03:07run the models.
  • 03:09And we're also trying to
  • 03:10provide training. And if you
  • 03:11go to the website, you
  • 03:12actually find a lot of
  • 03:13information.
  • 03:14And YBIC is really a
  • 03:16collaboration
  • 03:18among a number of different
  • 03:19entities,
  • 03:20partnership with the YNH,
  • 03:23the hospital,
  • 03:25analytic team,
  • 03:27also with,
  • 03:28YCRC.
  • 03:29Many of you know, probably
  • 03:30the Yale,
  • 03:32Center for Research Computing,
  • 03:34health science ITS,
  • 03:37team. So we work together
  • 03:38really trying to, provide all
  • 03:40the resource to to the
  • 03:42folks who are working on,
  • 03:44biomedical informatics, clinical research, requires
  • 03:47heavy,
  • 03:48computational
  • 03:49resource, for example, things like
  • 03:51that. And right now, we
  • 03:52actually have three different office
  • 03:53under doctor,
  • 03:55Ona Machado. One is the,
  • 03:57strategic initiative. It's led
  • 03:59by herself and more on
  • 04:01the initiatives on the
  • 04:03strategic planning. And the second
  • 04:05one is led by doctor
  • 04:06Danny,
  • 04:08Daniela Meeker in the middle.
  • 04:09I think many of you
  • 04:10probably already use the research
  • 04:12informatics office,
  • 04:14like JEDA team to help
  • 04:16you retrieve data from the
  • 04:18the hospital EHR system. I
  • 04:19think probably many of you
  • 04:21have used the service. Right?
  • 04:22Can you raise your hand
  • 04:23if you have a
  • 04:25we have a JEDI team
  • 04:27lead,
  • 04:27Richard here. So if you
  • 04:29have more question, you can
  • 04:30also ask him. And then,
  • 04:32the third one is called
  • 04:33the, research computing infrastructure.
  • 04:36It's called it by myself
  • 04:37and,
  • 04:38doctor,
  • 04:39Weiss.
  • 04:40So
  • 04:41this one is really trying
  • 04:43to,
  • 04:44provide
  • 04:46all the secure computing environment,
  • 04:48which I think many of
  • 04:49you actually might be interested
  • 04:50because now when we actually
  • 04:52develop medical AI, large amount
  • 04:53of model, we're often looking
  • 04:54for,
  • 04:57GPU resources. And you probably
  • 04:59heard about,
  • 05:00recent investment from Yale,
  • 05:02fifty million on hardware as
  • 05:04on GPUs
  • 05:05to facilitate
  • 05:06AI research.
  • 05:07And in, specifically, what we
  • 05:09are working on is trying
  • 05:11to provide more kind of
  • 05:12secure computing environment.
  • 05:14Because we are may working
  • 05:15on the kingly patient
  • 05:17data with PHIs, how we
  • 05:18can protect those information while
  • 05:20we do medical research.
  • 05:22So right now, our office
  • 05:23has been kind of working
  • 05:25on four different platforms.
  • 05:27Some of you may heard
  • 05:28of some of the platforms.
  • 05:29Some of them actually is
  • 05:31still coming. So the first
  • 05:32one, I think, today, we're
  • 05:34also gonna talk about is,
  • 05:35like, the VDI environment so
  • 05:37you can remote access to
  • 05:39the environment within the hospital
  • 05:41through a kind of remote
  • 05:42desktop kind of, environment. Second
  • 05:44one,
  • 05:45is also a focus of
  • 05:47today's,
  • 05:48a lot of presentation is
  • 05:49the CHG,
  • 05:51safe environment
  • 05:52with a lot of CPUs
  • 05:53and GPU within the hospital
  • 05:55in a HIPAA compliance
  • 05:57environment. And the third one
  • 05:59is, what we are building.
  • 06:00You probably heard about spin
  • 06:02up, which is the AWS
  • 06:04kind of self-service,
  • 06:05environment. Now we're actually making
  • 06:07this, for the spin up
  • 06:09plus. What we are doing,
  • 06:10trying to make it, NIST
  • 06:12eight hundred one seventy one
  • 06:14environment.
  • 06:15So, over there, you can,
  • 06:18safely
  • 06:19manage all the PHI data.
  • 06:21Under the fourth one, it's
  • 06:23named hopper.
  • 06:24So it's a more secure
  • 06:26computing GPU class that will
  • 06:28be available in July this
  • 06:30year. So far, Hopper already
  • 06:32have,
  • 06:33installed about sixty NVIDIA h
  • 06:36one hundred, and it's in
  • 06:37better testing environment.
  • 06:39The the data center is
  • 06:40hosted in the Massachusetts,
  • 06:42data center, but it will
  • 06:44be managed by a team
  • 06:45at the end. So I
  • 06:47just have a call out
  • 06:48for this platform. It will
  • 06:49come in soon if you're
  • 06:51looking for more GPU,
  • 06:52resources.
  • 06:54So for today's, a lot
  • 06:55of presentation, actually, we are
  • 06:57focusing on the first two,
  • 06:58which is on the VDI,
  • 07:00CHP safe environment, which is
  • 07:02sitting in the hospital environment.
  • 07:04And the idea
  • 07:06is really trying to answer
  • 07:08all your questions like, how
  • 07:09can we access the data?
  • 07:11How can we request the
  • 07:13computational resource within the CHP
  • 07:16safe? For example, on the
  • 07:17CHP safe, you can also
  • 07:18see we have some GPUs
  • 07:20like a one hundred, h
  • 07:21one hundred as well. How
  • 07:22can you request it? And
  • 07:23then the second half is
  • 07:25really,
  • 07:26about building large engine model.
  • 07:28Like, under CHP safe, we
  • 07:31have developed some available tools,
  • 07:33APIs.
  • 07:34So we show you how
  • 07:35you can call to use
  • 07:36those tools
  • 07:37to to for different clinical,
  • 07:39applications.
  • 07:41I'll stop here. I'll let
  • 07:42the, Vipina just to introduce
  • 07:44the agenda and the way
  • 07:45we go from there. Thanks,
  • 07:46everyone.
  • 07:54This is how we have
  • 07:55structured today's presentation,
  • 07:57session one from one to
  • 07:59two thirty. We'll have a
  • 08:01coffee break from two thirty
  • 08:02to two forty five. I've
  • 08:03already started getting coffee. Yes.
  • 08:05And from session,
  • 08:07two from two forty five
  • 08:08to four, and followed by
  • 08:10that, we'll have a networking
  • 08:11reception in our kitchen right
  • 08:13there.
  • 08:15So
  • 08:16for session one, this is
  • 08:17what we are gonna do.
  • 08:19Rich is gonna talk about
  • 08:21research data or JED at
  • 08:23what, where, how, these things.
  • 08:25And again, as doctor Shu
  • 08:26mentioned, we will dive deep
  • 08:28into chip safe environment and
  • 08:30what is Camino,
  • 08:31the presentations by Nate and
  • 08:33Al,
  • 08:34and then doctor Shu will
  • 08:35do a a overview of
  • 08:37large language models.
  • 08:40And after the coffee break,
  • 08:43YuJa is gonna show you
  • 08:45why we do annotation, some
  • 08:46of the annotation tools that
  • 08:48we have set up on
  • 08:49this environment.
  • 08:50And then I'll go deeply
  • 08:51into one of the clinical
  • 08:53information extraction pipelines that we
  • 08:55have built.
  • 08:57And then Vincent is gonna
  • 08:58do a short demo on
  • 09:00Kiwi as an API services.
  • 09:02We'll discuss those things. And
  • 09:03finally,
  • 09:04a bit of programming
  • 09:07intense session,
  • 09:08that is will be done
  • 09:09by Lingfei where we'll tell
  • 09:11you how to develop customized
  • 09:12LLMs for your specific task.
  • 09:18We will have a q
  • 09:19and a session after each
  • 09:21speaker presence, so please hold
  • 09:23your questions till that so
  • 09:24that we are on time.
  • 09:26I'd like to welcome
  • 09:28now Rich Hintz, director clinical
  • 09:31research
  • 09:32data
  • 09:33services. Rich, please take over.
  • 09:46Good afternoon.
  • 09:50Flip this over a bit.
  • 09:55Alright. Thanks for having me,
  • 09:57everyone.
  • 09:58I'm Richard. I'm part of
  • 09:59the
  • 10:00research informatics office, and I'm
  • 10:02glad to be here today.
  • 10:04And, to tell you a
  • 10:05little bit I wanna give
  • 10:06you a little background about
  • 10:07our group. We're a team
  • 10:08of eleven. We support Yale
  • 10:10faculty, staff, residents,
  • 10:12hospital,
  • 10:14employees, as well as medical
  • 10:16students.
  • 10:18We are primarily
  • 10:20tasked with providing data for
  • 10:22research and research data needs
  • 10:23and includes providing
  • 10:26Ah,
  • 10:27thank you. Alright.
  • 10:29No one minds if I'm
  • 10:29quiet apparently, but that's fine.
  • 10:31That's fine.
  • 10:32No.
  • 10:33We typically respond to about,
  • 10:35over six hundred data requests
  • 10:37every year,
  • 10:38and, you know, we provide
  • 10:40queries,
  • 10:41custom data extracts.
  • 10:42We work with Epic Reporting
  • 10:44and develop,
  • 10:46assist with, MyChart recruitment.
  • 10:49So, hopefully, I'll be talking
  • 10:50about today,
  • 10:51how to obtain data through
  • 10:52JDAT, giving a little background
  • 10:54on that, touching upon some
  • 10:55of the environments that Hua
  • 10:57mentioned.
  • 10:58And then, you know, you'll
  • 10:59hear a lot of great
  • 11:00information from Nate and Al
  • 11:02coming up on the computational
  • 11:04health platform and Camino.
  • 11:06So over the last year,
  • 11:07we've been working to focus
  • 11:08on,
  • 11:09really improving processes
  • 11:11that will put data in
  • 11:13the hands of our research
  • 11:14community faster and easier. And,
  • 11:16primarily, we've been doing that
  • 11:18to focus on some of
  • 11:18this and promote some of
  • 11:19the self-service approaches
  • 11:21as well as, you know,
  • 11:23data provisioning
  • 11:24tools on the the y
  • 11:26n YMHH
  • 11:27infrastructure.
  • 11:33So the the data that's
  • 11:35available,
  • 11:36the the clinical data, and
  • 11:37the research data, you know,
  • 11:38they are very overlapped.
  • 11:40It's a it's a large,
  • 11:43network of the health system,
  • 11:45the Yale University,
  • 11:48New England Medical Group spanning
  • 11:49all of Connecticut down into
  • 11:52New York and up into
  • 11:53Rhode Island, touching upon more
  • 11:54than a hundred and thirty
  • 11:55practices,
  • 11:57including the Smile of Cancer
  • 11:58Center, it locations,
  • 12:01all the YNHH hospitals, including
  • 12:03the campuses up in Rhode
  • 12:04Island, also the the Chapel
  • 12:06Street and York Street campuses.
  • 12:08All these are linked together
  • 12:10through the Epic
  • 12:11medical record system, which really
  • 12:13gives us the foundation for
  • 12:14providing the data.
  • 12:20So we feel the data
  • 12:21is very rich. There are
  • 12:23more than four point three
  • 12:24million patient records
  • 12:26in Epic.
  • 12:27The database dates back to
  • 12:29the implementation of Epic within
  • 12:31the health system approximately twelve
  • 12:33years ago. And when it
  • 12:35went live at all the
  • 12:36the different locations across the
  • 12:37health system,
  • 12:39I've listed a couple of
  • 12:40things in there. These are
  • 12:41the the
  • 12:42the the information. Pretty much
  • 12:43any data that is tracked
  • 12:45clinically, we can extract out,
  • 12:47including all the patient demographics,
  • 12:49their vitals, comorbidities,
  • 12:51surgical data,
  • 12:53all all the labs. You
  • 12:54know, there's there's a there's
  • 12:55a wealth of data from
  • 12:57newborns and deliveries up through
  • 12:59geriatric care.
  • 13:05So the data source is
  • 13:06you know, this is kind
  • 13:07of where we
  • 13:09where we are, involved is
  • 13:11all the data really starts
  • 13:12from from Epic. Epic, as
  • 13:14you already know, is the
  • 13:16electronic
  • 13:17health care record system. It
  • 13:19is designed for patient patient
  • 13:21care.
  • 13:22Underneath the hood, however, is
  • 13:24the chronicles database, and that
  • 13:27database is designed for very
  • 13:29quick real time access to
  • 13:31individual patient records and to
  • 13:33support,
  • 13:35you know, physician clinician
  • 13:37training of patients.
  • 13:39However, though, that database
  • 13:42is not as efficient
  • 13:44at doing large scale data
  • 13:46extracts, reporting across,
  • 13:48all all historical time or
  • 13:50working on data across large
  • 13:52patient cohorts.
  • 13:53So with that, the data,
  • 13:55in Epic
  • 13:57is extracted nightly into the
  • 13:58Clarity platform. Clarity is a
  • 14:00much larger database. It has
  • 14:02nearly everything that's in in
  • 14:04the chronicles. It's structured in
  • 14:06a very similar format,
  • 14:08but it's a SQL database.
  • 14:10Our our team has access
  • 14:11to it, and it allows
  • 14:12for a lot of these
  • 14:13larger reporting
  • 14:15tools. However, it is not
  • 14:16real time. It's a you
  • 14:18know, it's extracted daily. It's
  • 14:19a day behind. It's also
  • 14:20a little more complex. There
  • 14:22are approximately twenty thousand tables,
  • 14:24in the Clarity database. So
  • 14:26it it's it's massive, but
  • 14:27it does contain a lot
  • 14:28of rich data.
  • 14:30To solve some of those
  • 14:31issues with the with the
  • 14:32speed and the complexity,
  • 14:34Epic has created their Caboodle
  • 14:36data model.
  • 14:37Caboodle has is a smaller,
  • 14:39more normalized database designed for,
  • 14:43more productionized reporting. There's on
  • 14:45the order of six hundred
  • 14:46tables.
  • 14:47Queries run faster, but it
  • 14:49does contain
  • 14:50you know, there not everything
  • 14:52is is is necessarily in
  • 14:54that,
  • 14:55but it is easier to
  • 14:56use and faster.
  • 14:58Moving on,
  • 15:00one of one of the
  • 15:00also advantages of Capoodle is
  • 15:02we can bring in data
  • 15:04from outside of Epic as
  • 15:05well.
  • 15:06So that data has been
  • 15:07expanded,
  • 15:09as new models are are
  • 15:10made available
  • 15:11or new data source. For
  • 15:13example,
  • 15:14there's an issue initiative right
  • 15:15now to bring in the
  • 15:16data from the tumor registry
  • 15:18into Caboodle so that the
  • 15:21really accurate,
  • 15:23tumor and cancer staging can
  • 15:24be linked to, you know,
  • 15:26the clinical treatment practices.
  • 15:28So that that's one of
  • 15:29the the advantage of Caboodle.
  • 15:31And the last step here
  • 15:32at Yale is there's an
  • 15:33additional transformation
  • 15:35that that happens to move
  • 15:36to the OMOP database.
  • 15:39The OMOP is the observational
  • 15:41medical outcomes partnership model. It's
  • 15:43a common data model using
  • 15:45open standards. It is something
  • 15:47that can be,
  • 15:49used across institutions.
  • 15:51And with that, the Cabool
  • 15:53data is moved into and
  • 15:55transformed
  • 15:56into the OMOP database,
  • 15:58and it has, on the
  • 15:59order of thirty seven tables,
  • 16:01a much more straightforward model
  • 16:02to use.
  • 16:03It has most of the
  • 16:04data and, hopefully, a little
  • 16:06bit less of a learning
  • 16:07curve to to access the
  • 16:08data. And as we talk
  • 16:09a little bit more about,
  • 16:11you know, the data that
  • 16:12we provide, and for those
  • 16:12of you who are gonna
  • 16:13use,
  • 16:15CHP and access direct access
  • 16:17to the data, we'll be
  • 16:18talking about you know, it'll
  • 16:19be from the OMOP tables
  • 16:20and the OMOP format.
  • 16:27So as I mentioned, you
  • 16:28know, one of our goals
  • 16:29is to promote self-service access
  • 16:31to data. And one of
  • 16:32the things that we have
  • 16:33been working on as a
  • 16:35team,
  • 16:36is to improve the experience
  • 16:38and speed of access to
  • 16:39data,
  • 16:40through self-service tools. One of
  • 16:42the main ways is through
  • 16:43our research of basic access.
  • 16:45We've also worked on promoting
  • 16:47the use of slicer, dicer
  • 16:49as a an analytics tool
  • 16:52and allowing
  • 16:53greater access to that.
  • 16:57Also, I'll I'll show it
  • 16:58to you later. But, in
  • 16:59our request form, we have
  • 17:00the JDate report library.
  • 17:02All of the reports
  • 17:04and,
  • 17:05dashboards that have been put
  • 17:06together by the the operational
  • 17:08clinical joint data analytics team
  • 17:10are available
  • 17:11available for search,
  • 17:13and they may actually they've
  • 17:14been, vetted by,
  • 17:16clinical practices
  • 17:17and may provide an excellent
  • 17:19start or the data that
  • 17:20you need for your,
  • 17:22for your research projects.
  • 17:24Prep to research is
  • 17:26a an area that we'll
  • 17:28talk about a little bit
  • 17:29more, but that is where,
  • 17:30we'll give you quicker access
  • 17:32to the OMOP data for
  • 17:34self-service.
  • 17:35And two last things that
  • 17:36we have added,
  • 17:38on an interactive
  • 17:40basis is
  • 17:41our JADA office hours. You
  • 17:43can request a consult with
  • 17:44one of our one of
  • 17:45our team to help with,
  • 17:47you know, understanding the data
  • 17:48that's out there,
  • 17:50working on slicer, dicer, but
  • 17:52we're now opening up this
  • 17:53time so so that we
  • 17:54can provide more hands on
  • 17:55support.
  • 17:57And the last thing I
  • 17:58just mentioned is we've recently
  • 17:59kicked off a Teams channel,
  • 18:02supporting, OMOP and OMOP data
  • 18:04model. So looking
  • 18:05to work with you all
  • 18:06as a community to
  • 18:08work on questions and answers,
  • 18:09a collaborative
  • 18:10space, and access to some
  • 18:12of our, you know, tips
  • 18:14and tricks.
  • 18:20Researcher basic access or we
  • 18:22keep referring to a lot
  • 18:23of times by acronyms. RBA
  • 18:24is a consolidation
  • 18:26of security roles that
  • 18:28researchers often will need to
  • 18:30use to access common systems
  • 18:31and tools. We've noticed that
  • 18:33there's been, historically trouble getting
  • 18:35access to many of the
  • 18:37EPIC systems
  • 18:38or many of the data,
  • 18:41models and data roles.
  • 18:43And mostly due to multiple
  • 18:45iterations, the information isn't always
  • 18:46available,
  • 18:48how to apply for this.
  • 18:49So we've consolidated
  • 18:51all of this into one
  • 18:53role, which we can help
  • 18:54administer.
  • 18:55It will help you with
  • 18:56things like if you don't
  • 18:57have a,
  • 18:58Yale name and health system
  • 19:00ID or epic ID, we
  • 19:01can help that provision that.
  • 19:02It'll get you access to
  • 19:04the epic epic slicer dicer
  • 19:06tool,
  • 19:07with expanded data models.
  • 19:09It will also allow you
  • 19:11to get the basic security
  • 19:12for the, computational health platform
  • 19:15and,
  • 19:16Camino counts, which is the
  • 19:17first step in Camino,
  • 19:20camino team provisioning.
  • 19:22And, it also will provision
  • 19:24for you the access to
  • 19:26the VDI that we talked
  • 19:27about so that you have
  • 19:28that secure computing environment,
  • 19:30that's preloaded as a Windows
  • 19:32environment with a number of
  • 19:34common tools. Hopefully, helpful tools
  • 19:36are Python,
  • 19:37Microsoft Office, OneDrive,
  • 19:39SQL, Visual Studio, and the
  • 19:41list is growing.
  • 19:46So I just wanna touch
  • 19:47on how to get, re
  • 19:49how to request research and
  • 19:50basic access.
  • 19:51It used to be a
  • 19:52lengthy process. Really, it is
  • 19:54now a very, very quick
  • 19:55one stop
  • 19:56shop. It is
  • 19:58it replaces
  • 19:59several
  • 20:00requests.
  • 20:01It can one thing I
  • 20:03will note that it this
  • 20:04is a security access, so
  • 20:05it needs to be,
  • 20:07submitted,
  • 20:08to the health system via
  • 20:10your supervisor or PI.
  • 20:12So that means you need
  • 20:13a YNHH
  • 20:14ID
  • 20:15to have access to this.
  • 20:17If you have trouble accessing
  • 20:18it, email me. I can
  • 20:19assist with that as well.
  • 20:21But, the form is very
  • 20:22easy to fill out. You
  • 20:24just need based on basic
  • 20:25information
  • 20:25such as who you are,
  • 20:28who who needs it, and
  • 20:29you need to be a
  • 20:30member of the covered entity
  • 20:31or typically school of medicine
  • 20:33or sponsored by someone.
  • 20:40SlicerDicer, if you have had
  • 20:42a chance, if you're in
  • 20:43Epic, you may have seen
  • 20:44this already, but SlicerDicer
  • 20:46is Epic's,
  • 20:47data exploration and visual tool.
  • 20:49It is a powerful
  • 20:51self-service
  • 20:52tool
  • 20:53and allowing you to do
  • 20:54things such as define patient
  • 20:56cohorts,
  • 20:57based on clinical criteria
  • 20:59and really explore the data
  • 21:01that is available to you.
  • 21:03In some cases, it can
  • 21:05do all of your it
  • 21:06can be used as a
  • 21:07tool for all of your
  • 21:08analysis. In other cases, it
  • 21:09will be a great tool
  • 21:11to set up,
  • 21:12explore the data and work
  • 21:14with some of the other
  • 21:14teams.
  • 21:15So SlicerDicer
  • 21:17is,
  • 21:18it provides aggregate data. There's
  • 21:20there's no PHI,
  • 21:22so you don't need an
  • 21:23IRB to explore the data
  • 21:25within it. So that's one
  • 21:26of the advantage of using
  • 21:28these self-service tools.
  • 21:31It's a great way to
  • 21:32get an under overall picture
  • 21:33of your study study cohort.
  • 21:35You can add in the
  • 21:36middle. You can it's kinda
  • 21:37small. I know. But there
  • 21:39are
  • 21:40ways to you can add
  • 21:42the criteria.
  • 21:43So this one, the blue
  • 21:45boxes
  • 21:46are the actual filters and
  • 21:47criterias you can filter on.
  • 21:49The orange are
  • 21:52the folders which
  • 21:54organize similar criteria.
  • 21:56This example is defined to
  • 21:58pick patients with diabetes
  • 22:00who are not on prednisone
  • 22:02and that patient age of
  • 22:03eighteen to a hundred and
  • 22:04eight with an abnormal hemoglobin
  • 22:06a one c. So
  • 22:08you can build very complex
  • 22:10clinical
  • 22:11criteria. You can one of
  • 22:13the big tools and hence
  • 22:14the name is you can
  • 22:15add slices,
  • 22:17and that allows you to
  • 22:18break up the data
  • 22:20by those criteria
  • 22:22and this or even define
  • 22:23the ranges. So in this
  • 22:24case, we've defined we
  • 22:27want to break and graph
  • 22:28by,
  • 22:29an age cohort.
  • 22:31We've set the age core
  • 22:32or
  • 22:33based on the stops in
  • 22:35there.
  • 22:37So that is,
  • 22:39showing us
  • 22:41multiple ways to do that.
  • 22:48Additionally, there as I mentioned,
  • 22:50you get additional with the
  • 22:52RBA, you get,
  • 22:54twenty five additional clinical data
  • 22:56models that are not accessible
  • 22:57to everyone.
  • 22:58You can link these models.
  • 23:00So if you develop a
  • 23:00patient cohort and you need
  • 23:02to see, detailed lab specifics,
  • 23:05you can link to the
  • 23:06lab model. So it really
  • 23:07gives you the ability to,
  • 23:10expand your queries. And also
  • 23:12by using the slices and
  • 23:13some of the top
  • 23:14ten, top fifty features, you
  • 23:16can drill down to see
  • 23:17what are the
  • 23:19categories of the data. You
  • 23:20can use it to help
  • 23:21define,
  • 23:22some of your queries. So
  • 23:23if you need to know
  • 23:24what are the lab value
  • 23:25ranges, what are the names
  • 23:26of the labs, what are
  • 23:28the diagnoses involved with this,
  • 23:29you can do a lot
  • 23:30of that exploration right through
  • 23:31this tool,
  • 23:33without having to write any
  • 23:34queries at all.
  • 23:40So preparatory to research
  • 23:42is,
  • 23:43as we keep calling it
  • 23:45p two r,
  • 23:46is,
  • 23:47one of the ways that
  • 23:49we can help put data
  • 23:50in your hands a lot
  • 23:51quicker.
  • 23:53With the preparatory to research,
  • 23:55if you fill out the
  • 23:56form got a little screenshot,
  • 23:58and I have links to
  • 23:59all of these, request forms
  • 24:01or request applications at the
  • 24:03end of the presentation. So,
  • 24:05that Pina can share that
  • 24:06with you.
  • 24:09This will allow you
  • 24:11to get access to the
  • 24:12OMOP limited dataset for for
  • 24:14ninety days with the goal
  • 24:15of using it. You have
  • 24:16access to,
  • 24:18the dataset which, you know,
  • 24:19the thirty seven tables with
  • 24:21direct identifiers
  • 24:22removed so that you can
  • 24:24query the data,
  • 24:25do some detailed analysis of
  • 24:27your cohorts, maybe even define
  • 24:28the queries for your datasets
  • 24:30so that you can you
  • 24:31can, you know, create, datasets
  • 24:34to use, develop your protocol.
  • 24:36At the end of the
  • 24:37ninety days,
  • 24:38you can
  • 24:39switch that over, convert that
  • 24:41over with the help of,
  • 24:43the JADAP team into an
  • 24:45IRB once you have your
  • 24:46IRB protocol,
  • 24:47into an IRB approved project,
  • 24:50we can help provision those
  • 24:51datasets,
  • 24:52essentially execute those queries, give
  • 24:54you exactly the data that
  • 24:55you were looking for,
  • 24:57along with identifiers.
  • 25:00So
  • 25:01this can be done by
  • 25:03submitting a research data request,
  • 25:05which I'll talk about next,
  • 25:06and including this form
  • 25:08and the statement that you're
  • 25:09looking for prep to research
  • 25:10because you are working on,
  • 25:13you know, developing a protocol.
  • 25:20I I think, Nate's gonna
  • 25:21talk about this more, but
  • 25:22I just wanted to talk
  • 25:22a little bit about the
  • 25:23the OMAP dataset,
  • 25:25with a small graphic just
  • 25:27to show you it is
  • 25:28is not as complex of
  • 25:29a data model as some
  • 25:30of as as Clarity, we
  • 25:31certainly couldn't,
  • 25:33propose.
  • 25:34But the the the you
  • 25:36know, based on the standards,
  • 25:38these are some of the
  • 25:39tables that people are finding
  • 25:40to be most useful.
  • 25:42You have direct access for
  • 25:44to query these except for
  • 25:45the MRNs and some of
  • 25:46the direct identifiers.
  • 25:48Once you have prep to
  • 25:50research, you know, and
  • 25:51you will need to,
  • 25:54so so one thing, you'll
  • 25:55need to, submit for RBA
  • 25:57so they have access to
  • 25:58the platforms.
  • 26:00But on this,
  • 26:02thirty seven tables,
  • 26:04these are some of the
  • 26:05most, frequent I have a
  • 26:06asterisk next to the note
  • 26:07table.
  • 26:08The notes, and I assume
  • 26:10if you're you're gonna work
  • 26:11a lot with LLMs, that's
  • 26:12probably one of the key
  • 26:13things you're looking for.
  • 26:16Notes are because they they're
  • 26:17not easily de identified, are
  • 26:19not included as part of
  • 26:20the limited dataset.
  • 26:22However, once you've, you know,
  • 26:24developed a patient cohort,
  • 26:26JDAT can
  • 26:28can execute those queries
  • 26:30and provide you the the
  • 26:32the notes that map to
  • 26:33your dataset and linking values,
  • 26:36so that you have the
  • 26:37PHI readily available.
  • 26:44Alright. So maybe the question
  • 26:46is a lot of people
  • 26:47ask is, okay. How how
  • 26:48do I get this? How
  • 26:49do I request data from
  • 26:50JEDT?
  • 26:52And what we're gonna say,
  • 26:53again, encourage you to start
  • 26:55with the self-service tools, such
  • 26:56as slice or dice it.
  • 26:57Do do some research, prepare,
  • 27:00prepare your questions,
  • 27:01prepare as much of your
  • 27:02cohort as you can.
  • 27:06Use the slice or dice
  • 27:07of tools.
  • 27:09And, you know, the starting
  • 27:10point is is is really
  • 27:12the,
  • 27:14the the YBIC website. So,
  • 27:16well, thanks for showing that.
  • 27:17I'll I'll show you specifically
  • 27:19where you can get it.
  • 27:19But if you can get
  • 27:20to the YBIC website, you
  • 27:21can make a data request,
  • 27:23through JADA. You can submit,
  • 27:25get the link to research
  • 27:26of basic access,
  • 27:27and hopefully, very quickly get
  • 27:29to,
  • 27:30you know, the starting part
  • 27:31for what you need.
  • 27:32Prerequisites
  • 27:34for data requests or before
  • 27:36at least to get the
  • 27:37data is research with basic
  • 27:38access. That's gonna give you
  • 27:39the tools and the security,
  • 27:40so we need to submit
  • 27:41that.
  • 27:42And if you're looking for
  • 27:43data, especially PHI,
  • 27:45we will need you to
  • 27:46have your compliance documents, your
  • 27:48IRB protocol, the approval.
  • 27:50If you're if you are
  • 27:51planning to release the data,
  • 27:52we have a daily work
  • 27:53worksheet that will help walk
  • 27:55you through the questions to
  • 27:56determine if you need a
  • 27:56data use agreement or executive
  • 27:58sponsors to sign off.
  • 28:00Typically, that's only in the
  • 28:02case of
  • 28:03above a certain threshold of
  • 28:04data or if the data
  • 28:06is being requested to lead
  • 28:07the organization.
  • 28:09Now always encourage if you're
  • 28:11working for with our team
  • 28:12to extract the data,
  • 28:14some things to make things
  • 28:15go smoothly for for you
  • 28:16and for us
  • 28:17is to really define
  • 28:19your inclusion criteria
  • 28:21as much detail as you
  • 28:22can provide, help us with
  • 28:24definitions.
  • 28:25SlicerDicer
  • 28:26is a great tool to
  • 28:27do that. So instead of
  • 28:28telling us you need patients
  • 28:29with diabetes,
  • 28:30you might be able to
  • 28:31tell us about, I need
  • 28:32these
  • 28:33ICD ten codes. I need
  • 28:35these values
  • 28:37of hemoglobin a one c's.
  • 28:39So the more detail you
  • 28:41can provide, the more accurately
  • 28:42and the more quickly we
  • 28:43can assist with with that.
  • 28:51Alright. As I mentioned, this
  • 28:52is the page
  • 28:54on Webex website. It's under
  • 28:56the YNIH data extracts.
  • 28:58And, you know, for the
  • 29:00first step, submitting
  • 29:01your research basic access, the
  • 29:03orange button at the top
  • 29:04will direct you right to
  • 29:05that that link in the
  • 29:07ServiceNow for the health system
  • 29:09to make that request directly.
  • 29:10Again, if you have trouble,
  • 29:12you can email me. I
  • 29:13can assist with that. And
  • 29:14then below it is how
  • 29:15to submit a research data
  • 29:17request. That will bring you
  • 29:18to the,
  • 29:19the Helix analytics portal,
  • 29:21which is our request system
  • 29:22for creating a request.
  • 29:29So speaking of requests, this
  • 29:30is the, the research data
  • 29:32request for the Helix request
  • 29:34portal.
  • 29:36All requests, both research as
  • 29:38well as clinical and operational,
  • 29:39come through this.
  • 29:41So there are request types.
  • 29:42This one's prefilled as a
  • 29:44research request type as opposed
  • 29:45to a regular
  • 29:46JADAP data request.
  • 29:49We would I would like
  • 29:50to point out up in
  • 29:50the top left hand corner,
  • 29:52it's kinda small, but that's
  • 29:53where you can search for
  • 29:54the reports.
  • 29:55So by keying in a
  • 29:57subject, I'd, some content,
  • 29:59that's where you'll get every
  • 30:01report that Jada has published
  • 30:02in the dashboards
  • 30:04so that you can get
  • 30:05access to an explorer. So
  • 30:06if you need inpatient data
  • 30:08information, you click in inpatient.
  • 30:10You'll get a list of
  • 30:11probably a hundred reports,
  • 30:13as a starting point.
  • 30:17The rest of the form
  • 30:17is fairly fairly
  • 30:20self explanatory. There's a couple
  • 30:22of key fields. Again, the
  • 30:23things I would point out,
  • 30:25again, please,
  • 30:26if you if you're submitting
  • 30:27prep to research,
  • 30:28attach your your your request
  • 30:30form, you know, that your
  • 30:31signed document.
  • 30:33Put a note in there
  • 30:33saying, yes. I'm looking for,
  • 30:36prep to research.
  • 30:37Fill out the description.
  • 30:39Here's the sections there to
  • 30:41put in the criteria that
  • 30:42you're looking for. If you
  • 30:44have complex criteria,
  • 30:45go ahead and attach it.
  • 30:46The button down the bottom
  • 30:47will allow you to attach
  • 30:49forms, and we're also looking
  • 30:50for you to add attach
  • 30:51your IRB protocol and, as
  • 30:54well as your approval letter.
  • 31:03One last thing. So when
  • 31:03you submit the request, it's
  • 31:04gonna give you a request
  • 31:05number.
  • 31:07That request is also a
  • 31:08number.
  • 31:09If you need to reach
  • 31:10out to us, please include
  • 31:11that. We are very project
  • 31:13number oriented.
  • 31:14That's how we kinda link
  • 31:15all these things together. So
  • 31:17if you can send that,
  • 31:18that will help us.
  • 31:19Additionally, that's the request number
  • 31:21that will show up in
  • 31:22the my request link. This
  • 31:23is everything you submitted to
  • 31:25our team,
  • 31:26the status of it, the
  • 31:27comments that we have have
  • 31:29added to it as well.
  • 31:31But, additionally, you can
  • 31:33you can also make changes
  • 31:35by adding,
  • 31:36comments of your own and
  • 31:37attaching,
  • 31:38documents. So if we reach
  • 31:40out, say, by the way,
  • 31:41we're looking for a particular
  • 31:42compliance document,
  • 31:44you can attach it here.
  • 31:45This is a a way
  • 31:46to streamline and try and
  • 31:47get some of these communications
  • 31:49out of email.
  • 31:53And this is,
  • 31:55again, this I guess this
  • 31:56presentation will be shared, but
  • 31:58here are all the links
  • 31:59to the things I've I've
  • 32:00talked about,
  • 32:01hopefully, for for easy access,
  • 32:03and for more reference.
  • 32:06And that's all I have
  • 32:07for today. I'm happy to
  • 32:08take any questions
  • 32:10either now or via email
  • 32:11as they come up.
  • 32:23Yes.
  • 32:24So,
  • 32:26we don't have this question
  • 32:28at your mention. We have
  • 32:29a download have a data
  • 32:31set. Right? And
  • 32:32if
  • 32:39we
  • 32:46disseminate
  • 32:48the.
  • 32:49Yeah. Yeah. It it's not
  • 32:51as straightforward.
  • 32:53That's that's a great question.
  • 32:54You know, part of it
  • 32:55is going to be
  • 32:57where it's being disseminated to.
  • 32:59Even if it's de identified,
  • 33:00if it's leaving the organization,
  • 33:01it's going to have some
  • 33:03review by compliance.
  • 33:05Sometimes,
  • 33:06you know, again, is it
  • 33:06de identified to safe harbor
  • 33:08methods? What's in your data?
  • 33:10What are you sharing? Sometimes
  • 33:11the imaging data falls into
  • 33:13that. So,
  • 33:14I can help advise on
  • 33:16those on a case by
  • 33:17case level, but we still
  • 33:18work with the IT office
  • 33:19on that.
  • 33:31What batch of modem will
  • 33:32be there?
  • 33:33You'll need, at least a
  • 33:35YNHH ID.
  • 33:37So if you don't already
  • 33:38have that, you know, research
  • 33:39your basic access request, we'll
  • 33:41help we'll get that for
  • 33:41you.
  • 33:49Thank
  • 33:51you
  • 33:53very
  • 33:55much.
  • 33:57Thanks, Rich.
  • 34:00We'll move to the next
  • 34:01speaker.
  • 34:03Nate is gonna talk about,
  • 34:05ChipSafe
  • 34:06computational health platform. Nate, if
  • 34:08you could introduce yourself and
  • 34:09then start.
  • 34:10Thank you.
  • 34:15Hi, everyone. I'm Nate Price.
  • 34:17I've been with the Yale
  • 34:19New Haven Health System
  • 34:20since we were just a
  • 34:22hospital.
  • 34:23And,
  • 34:24back in the back in
  • 34:25the beginning, I was just,
  • 34:28an engineer working with a
  • 34:29bunch of like minded nerds
  • 34:30in the department of lab
  • 34:31medicine.
  • 34:32I joined Charley Torrey's Helix
  • 34:35data science,
  • 34:37group about
  • 34:38seven years ago now. And,
  • 34:41so we've done a lot
  • 34:42of great work since then.
  • 34:43We've developed some cool stuff,
  • 34:45and,
  • 34:46we're enjoying
  • 34:47post collaboration with our colleagues
  • 34:49in in,
  • 34:50bids and and and why
  • 34:51that.
  • 34:55Let's
  • 34:58see.
  • 35:00Yeah. First, we're working here.
  • 35:03There we go.
  • 35:07Before I get into a
  • 35:08lot of tech stuff, I
  • 35:09wanted to take you back
  • 35:10nearly a century,
  • 35:12and share with you,
  • 35:14my favorite quotation of all
  • 35:15time.
  • 35:17I'm gonna I can't resist
  • 35:18reading the entire thing.
  • 35:20It is the, introduction
  • 35:22to AA Mills'
  • 35:23classic Winnie the Pooh.
  • 35:25Here is Edward Bair coming
  • 35:27downstairs now. Bump. Bump. Bump.
  • 35:30On the back of his
  • 35:30head behind Christopher Robin.
  • 35:33It is, as far as
  • 35:34he knows, the only way
  • 35:36of coming downstairs, but sometimes
  • 35:38he feels that there really
  • 35:39is another way if only
  • 35:41he could stop bumping for
  • 35:42a moment and think of
  • 35:43it.
  • 35:45And I feel,
  • 35:46I don't know about you,
  • 35:47but I feel I feel
  • 35:48like that's sort of the
  • 35:49story of my life. I
  • 35:50think we often get stuck
  • 35:52in ways of doing things
  • 35:53that are not ideal,
  • 35:55but we're too much in
  • 35:57the middle of it to
  • 35:57step back and think of
  • 35:58a of a better way.
  • 36:01But,
  • 36:02the the, ChipSafe platform,
  • 36:04is a way of help
  • 36:06helping us to keep from
  • 36:07bumping our heads. Like, if
  • 36:09you've been trying to do
  • 36:10data science on your laptop
  • 36:12and its compute power, that's
  • 36:13kind of a bump.
  • 36:15Trying to figure out where
  • 36:16to get large datasets from
  • 36:19could be a bump. You
  • 36:20need GPU com compute power.
  • 36:23On your own, that's that's
  • 36:24kind of a bump.
  • 36:26And if you're trying to
  • 36:27do stuff on your own
  • 36:28in a compliant way, that's
  • 36:29a huge bump.
  • 36:32So this is why we
  • 36:34have CHIP and SAFE. CHIP,
  • 36:35of course, stands for computational
  • 36:37health platform.
  • 36:39SAFE is the secure, aligned,
  • 36:41flexible environment,
  • 36:44which, you know, really, they
  • 36:46they they mean they mean
  • 36:47the same thing.
  • 36:49There's a lot to unpack
  • 36:51in this environment.
  • 36:53Let me just
  • 36:55wanna see if I can
  • 36:56get,
  • 36:59can I get the laser
  • 36:59pointer working,
  • 37:06Yeah? I don't see that.
  • 37:07I don't see the cursor
  • 37:08there. Oh, here's my oh,
  • 37:10finally got my cursor. Okay.
  • 37:11Thank you.
  • 37:12Great. Yep. I'll
  • 37:14get that done.
  • 37:15Yes. There's a pointer. Okay.
  • 37:17Thank you.
  • 37:19Alright. I'm gonna work from
  • 37:21the bottom and sort of,
  • 37:23go in not not exactly
  • 37:24the top, top to bottom,
  • 37:25way this diagram is organized.
  • 37:27But, you know, we have
  • 37:29a great deal of storage,
  • 37:31which we'll detail
  • 37:33shortly. We have both sort
  • 37:35of hot,
  • 37:37SSD storage and, and a
  • 37:39lot and a great deal
  • 37:40of of cold storage via,
  • 37:43NetApp
  • 37:44and the CompRise application.
  • 37:47In terms of computation,
  • 37:49we have a huge computational
  • 37:50array,
  • 37:51by Nutanix that provides us
  • 37:53all of the CPUs and
  • 37:55the and the memory used
  • 37:56to spin up the many,
  • 37:58many, VMs that comprise
  • 38:00the computational health platform.
  • 38:03We also have,
  • 38:05a a significant and growing
  • 38:07number of GPUs,
  • 38:09both
  • 38:10NVIDIA and and Tesla.
  • 38:14We're gonna take a look
  • 38:15at this little,
  • 38:16ship's wheel symbol here. That
  • 38:17is the symbol for Kubernetes,
  • 38:19which is a Greek word
  • 38:20that,
  • 38:21originally means governor or helmsman.
  • 38:25It is
  • 38:27the helmsman that kind of
  • 38:29steers a lot of what
  • 38:30goes on in Chip.
  • 38:31It is a platform for
  • 38:33orchestrating
  • 38:34container based applications, which is
  • 38:36pretty much all of Chip.
  • 38:39It makes applications scalable and
  • 38:41fault tolerant.
  • 38:42It allows,
  • 38:44applications to add more compute
  • 38:46power or take some away
  • 38:48depending on what's needed.
  • 38:52And a lot of the
  • 38:53other, many of the other
  • 38:55components you'll you'll you'll see
  • 38:56in more detail in,
  • 38:57upcoming slides.
  • 38:59And then Kamino
  • 39:00up here is, of course,
  • 39:02a key part of Chip,
  • 39:02and I'm gonna tell you
  • 39:04a lot more about that
  • 39:05shortly.
  • 39:06Gonna take a quick look
  • 39:07at our data assets over
  • 39:09here.
  • 39:10We do have genomic data
  • 39:12in,
  • 39:13in Chip in the form
  • 39:14of,
  • 39:15DCF data
  • 39:17from,
  • 39:18ACTX,
  • 39:19from the act from ACTEX
  • 39:20patients and the generations project.
  • 39:24As Rich has described, we
  • 39:26have the EHR,
  • 39:28in OMOP format.
  • 39:30We have real time EHR
  • 39:32data in terms of some,
  • 39:33we have some HL seven
  • 39:35feeds that are,
  • 39:36updating
  • 39:37things like,
  • 39:39ADT information, lab results. We
  • 39:41can actually
  • 39:42have that stuff available
  • 39:44faster than it's available in
  • 39:45OMOP depending on the application.
  • 39:48We have high speed bed
  • 39:49monitor and vent data with,
  • 39:51like I don't know. We've
  • 39:53we we stopped counting the
  • 39:54number of billions of data
  • 39:55points per month that we're
  • 39:56ingesting,
  • 39:57but we're getting bed monitors,
  • 39:59vents, anesthesia data,
  • 40:01in real time into chip.
  • 40:04There's clinical data I I
  • 40:06won't go into too much
  • 40:07detail about a lot of
  • 40:08these things. I will mention
  • 40:10that you that there is
  • 40:11BYO capability that if you
  • 40:13need compute power,
  • 40:14but you have a dataset,
  • 40:16of your own that you
  • 40:17want to work on, it's
  • 40:18possible to bring your own
  • 40:19data to chip.
  • 40:21And then, of course, we
  • 40:21have LLMs, which is really
  • 40:23the primary reason we're all
  • 40:24here this afternoon.
  • 40:32In Hitchhiker's Guide to the
  • 40:33Galaxy, Douglas Adams said,
  • 40:36space is big.
  • 40:38You just won't believe how
  • 40:39hugely mind bogglingly big it
  • 40:41is. I mean, you might
  • 40:42think it's a long way
  • 40:43down the road to the
  • 40:44chemist,
  • 40:45but that's just peanuts to
  • 40:47space.
  • 40:48Now we have not made
  • 40:49our environment quite as big
  • 40:51as space, but we've made
  • 40:52it,
  • 40:53pretty large, and it's continuing
  • 40:55to grow as our user
  • 40:57requirements and application requirements do
  • 40:59too.
  • 41:01Just a few of these
  • 41:02numbers. We have,
  • 41:04fourteen over fourteen hundred CPU
  • 41:06cores available.
  • 41:07About half of them are
  • 41:08currently in use.
  • 41:10Nearly thirteen terabytes of memory.
  • 41:14In terms of storage, we've
  • 41:16got two tiers of fast
  • 41:17storage that total,
  • 41:20over four hundred,
  • 41:21well, four hundred and sixteen
  • 41:23terabytes.
  • 41:24We have a storage grid
  • 41:25for colder storage that contains
  • 41:27seven hundred terabytes.
  • 41:29And there's a, storage management
  • 41:32application called CompRise that that
  • 41:34handles the transfer of data
  • 41:35between hot and cold.
  • 41:38GPUs, which, of course, everyone's
  • 41:40very interested in, we have,
  • 41:43sixteen
  • 41:45Tesla,
  • 41:46GPUs altogether, two Tesla cards
  • 41:49of eight each. We have,
  • 41:51eight currently, we have eight
  • 41:53NVIDIA a one hundreds and
  • 41:54eight NVIDIA,
  • 41:56h one hundreds.
  • 41:57And I think that number
  • 41:58is expected to grow as
  • 42:00we, as we as we
  • 42:02progress.
  • 42:07I just wanted to look
  • 42:08briefly at our major chip
  • 42:10data sources. You know, the
  • 42:11the first thing they tell
  • 42:12you when you go to
  • 42:14make a PowerPoint presentation is
  • 42:15don't stand in front of
  • 42:16a room and read your
  • 42:17slide.
  • 42:18But now I'm gonna stand
  • 42:19in front of the room
  • 42:20and read my slide.
  • 42:23We've already mentioned,
  • 42:24EHR data is in the
  • 42:26OMOP common data model. We'll
  • 42:27get into that in a
  • 42:28little bit of detail.
  • 42:30Real time HL seven data
  • 42:32includes things like ADT, lab
  • 42:34results, flow sheets,
  • 42:36data innovations,
  • 42:37orders, and results, which actually
  • 42:39contain
  • 42:40if you're interested in,
  • 42:42laboratory data and instrument data,
  • 42:45some of the results and
  • 42:47orders going to and from
  • 42:48data innovations contain things that
  • 42:50are not,
  • 42:52in the Beaker Lab system
  • 42:54or in or even in
  • 42:55the EPIC EHR.
  • 42:56We have
  • 42:58clinical images
  • 42:59that are not generally they're
  • 43:00not
  • 43:02all stored in SHIP,
  • 43:04but,
  • 43:05they can be pulled on
  • 43:06demand using our,
  • 43:08Camino data request
  • 43:10process. So we have access
  • 43:11to all of the clinical
  • 43:12images that are in the
  • 43:13vendor neutral archive.
  • 43:16CT scans, ultrasounds, X rays,
  • 43:18ophthalmological data, ECGs,
  • 43:20you know,
  • 43:21anything that's a clinical image
  • 43:23that was done in the
  • 43:24health system is in the
  • 43:25v in the VNA
  • 43:27and can be pulled,
  • 43:29via
  • 43:30the our our extraction process.
  • 43:33And as I mentioned before,
  • 43:34we do have genomic data.
  • 43:35We have a lot of,
  • 43:36thousands of clinical and research,
  • 43:39VCF
  • 43:40files from,
  • 43:41patients who have, have gone
  • 43:43undergone ACTX
  • 43:45testing,
  • 43:46and generations patients. And we
  • 43:47have for generations patients, we
  • 43:49have full exome data as
  • 43:50well. So that's that's another
  • 43:52thing that's available to researchers.
  • 43:58Right. So here we are
  • 43:59at the OMOP common data
  • 44:00model again.
  • 44:02Don't wanna dwell on this
  • 44:03too much because Rich did
  • 44:04a really good job of
  • 44:05describing it. But,
  • 44:08I just wanna, emphasize that
  • 44:10common data models are really
  • 44:11the key to collaborative research.
  • 44:14Two different health systems may
  • 44:15have very different and largely
  • 44:17incompatible
  • 44:18EHRs, but the data in
  • 44:20their common data models,
  • 44:22should be at least mostly
  • 44:23compatible with each other.
  • 44:25And OMOP,
  • 44:27maintained by Odysee, the,
  • 44:30I forget what the I
  • 44:31forget what the acronym expands
  • 44:33to, but there's the, there's
  • 44:35the URL for it.
  • 44:37That's the pre preeminent,
  • 44:39common data model these days.
  • 44:40And we've,
  • 44:41many of our research have
  • 44:42all researchers have already used
  • 44:44OMOP data extracts to to
  • 44:46do collaborative research with other
  • 44:48institutions.
  • 44:52OMOP does not contain everything
  • 44:54that's in the EHR,
  • 44:55but its fairly simple data
  • 44:57model covers eighty to ninety
  • 44:59percent of what's in EPIC.
  • 45:01And as Rich already mentioned
  • 45:03that there's a a gradual
  • 45:04reduction in complexity as you
  • 45:06get further away from the
  • 45:07Epic Chronicles database. Epic,
  • 45:10has an, an ETL to
  • 45:12Clarity, which has more than
  • 45:14eighteen thousand tables, I guess,
  • 45:15close to twenty thousand.
  • 45:18That in turn is, extracted
  • 45:20to Caboodle, which has some
  • 45:22more like five hundred tables.
  • 45:24And when you get to
  • 45:25OMOP, it actually has a
  • 45:27relatively small number, like a
  • 45:28few dozen
  • 45:30without it it cuts down
  • 45:31on data complexity without
  • 45:34leaving out significant amounts of
  • 45:36the data itself.
  • 45:38Means that queries are a
  • 45:39lot easier to construct compared
  • 45:40to Clarity, and it's eighteen
  • 45:42thousand tables. You you still
  • 45:44have to understand your data,
  • 45:46but the process is a
  • 45:47lot simpler.
  • 45:49Our OMOP database is pseudo
  • 45:51relational, meaning
  • 45:53it can behave like a
  • 45:54SQL database, and you can
  • 45:56actually construct queries
  • 45:58in
  • 45:59pure SQL,
  • 46:01including joins, windows, and things
  • 46:03like that, or you can
  • 46:04use pure spark syntax.
  • 46:06Our computing environment can let
  • 46:08you do that either way
  • 46:09depending on what you prefer.
  • 46:12Either way, you can run
  • 46:14queries on OMOP data many,
  • 46:16many times faster than you
  • 46:17could run data on on,
  • 46:19run queries on the equivalent
  • 46:20data in a SQL Server.
  • 46:24One of the things I
  • 46:25didn't, get into that that
  • 46:26I didn't even touch on
  • 46:28in our big system diagram
  • 46:29was data robot.
  • 46:32We have a fully functioning
  • 46:33data robot installation and ship.
  • 46:36I've laid out kind of
  • 46:38the basic features,
  • 46:40on my slide, but in
  • 46:41my sort of limited experience
  • 46:43with data robot, it's a
  • 46:45really good tool for understanding
  • 46:47a dataset even if you're
  • 46:48not trying to develop predict
  • 46:49a predictive model.
  • 46:51But if you are,
  • 46:53it's relatively painless way to
  • 46:55pull a dataset in,
  • 46:58have the, have DataRobot kind
  • 47:00of analyze it. It will
  • 47:02tell you stuff. It'll basically
  • 47:04take care of a lot
  • 47:05of your data cleaning and
  • 47:07data engineering for you.
  • 47:09If you wanna make a
  • 47:10predictive model, it will run
  • 47:12many, many competing models against
  • 47:14each other and will tell
  • 47:15you what what the best
  • 47:17one seems to be.
  • 47:19You can easily tune hyperparameters.
  • 47:21It's not a black box
  • 47:23because it's very good at
  • 47:24actually showing you
  • 47:25which parameters,
  • 47:27and and which data elements
  • 47:28matter.
  • 47:31It has some, what is
  • 47:32called, MLOps built into it,
  • 47:34machine language ops, meaning that
  • 47:36if you develop a, predictive
  • 47:38model,
  • 47:39you can set up an,
  • 47:41a method for sort of
  • 47:42continuous monitoring of the quality
  • 47:44of your prediction and sort
  • 47:46of evaluating tuning.
  • 47:47And,
  • 47:48they've also put a lot
  • 47:49of effort into generative AI
  • 47:51integration,
  • 47:52which I have to say,
  • 47:53I
  • 47:54don't know exactly how that
  • 47:55works, but suffice it to
  • 47:57say that DataRobot is highly
  • 47:58committed to gen GenAI as
  • 48:00well.
  • 48:01So I I recommend, you
  • 48:02know, data robot if you
  • 48:04wanna try
  • 48:05to even just play around
  • 48:06with, developing a a predictive
  • 48:07model.
  • 48:10Now we're into Camino,
  • 48:13which is kind of the
  • 48:14core of Chip.
  • 48:16It's the way that most
  • 48:17of us will interact with
  • 48:18the computational health platform.
  • 48:20Here, I'm showing you an
  • 48:21art a link to a
  • 48:22nice article that Fang Chi
  • 48:23Lin and others published last
  • 48:24summer.
  • 48:25I recommend reading it. It
  • 48:27has a slightly different take,
  • 48:30on Camino than what I'm
  • 48:31presenting here, although I think
  • 48:33the information concepts
  • 48:35overlap.
  • 48:36And,
  • 48:37my colleague, Alpa Paselli, will
  • 48:39be giving, giving a a
  • 48:40proper demo of Camino in
  • 48:42a bit, so stay tuned
  • 48:43for that.
  • 48:47Here is the team architecture.
  • 48:52It's you, for for,
  • 48:54I think Rich,
  • 48:56touched on prep to research
  • 48:58teams, which is a, another
  • 49:00kind of Camino team. But
  • 49:02for most research purposes, if
  • 49:04you start with an IRB,
  • 49:06and a Helix data request
  • 49:08and a PI,
  • 49:10if you've got those things
  • 49:11established, then,
  • 49:13and your request is,
  • 49:15submitted and granted, you get
  • 49:17a Camino team,
  • 49:18whose name is built from
  • 49:21the IRB, the Helix data
  • 49:22request, and your PI.
  • 49:24Every team gets a quota,
  • 49:27which,
  • 49:28allows you a total number
  • 49:29of g, CPUs, GPUs, and
  • 49:31memory.
  • 49:33Those quotas are flexible. It's
  • 49:34not it it's it the
  • 49:35quota is assigned depending on
  • 49:38the size of the team,
  • 49:39the size of the data
  • 49:40request, and your and your
  • 49:41expected needs. Quotas can also
  • 49:43be changed as necessary.
  • 49:47So
  • 49:48teams,
  • 49:50contain
  • 49:51users, and users have environments.
  • 49:56So a compute
  • 49:58every user
  • 49:59can spin up
  • 50:01one or more,
  • 50:02what we call environments, which
  • 50:03are essentially
  • 50:05fully,
  • 50:06fully provisioned,
  • 50:08Linux virtual machines.
  • 50:10User environments
  • 50:12are private. So if you
  • 50:13create an environment,
  • 50:15you are the only one
  • 50:16that accesses the compute power
  • 50:18in that environment.
  • 50:20So your colleague next to
  • 50:21you will have their own
  • 50:22environment,
  • 50:23and they may have a
  • 50:24slightly different they may,
  • 50:28define a slightly different environment
  • 50:29than you. More or less
  • 50:31memory, more or less CPUs,
  • 50:33maybe with GPUs, maybe without.
  • 50:36Data requests
  • 50:38can be added to environments
  • 50:41depending on how many you
  • 50:42know, the nature of your
  • 50:43research and your IRB.
  • 50:47Data is shared within the
  • 50:49team. So
  • 50:50this user I've got you
  • 50:51know, I have
  • 50:53I'm showing as mounting three
  • 50:55separate directories of data requests.
  • 50:58They can add that to
  • 50:59their environment. But there,
  • 51:01another user in that team
  • 51:02can also access those data
  • 51:04requests.
  • 51:06So data requests are shared
  • 51:08within a team, and then
  • 51:10every team has a shared
  • 51:11data folder that is automatically
  • 51:13part of everyone's
  • 51:15environment.
  • 51:20Let's see.
  • 51:24Oh, yeah. So a little
  • 51:25bit more on environments.
  • 51:27Environment is a complete Linux
  • 51:29virtual machine.
  • 51:31The what you get in
  • 51:33your, you can
  • 51:35request certain configurations
  • 51:38of CPU, GPU, and memory.
  • 51:40It is subject to your,
  • 51:42availability and your team quotas.
  • 51:45When you've defined an environment,
  • 51:47what we're looking at here
  • 51:48on on the left is
  • 51:49what an environment actually looks
  • 51:51like in Camino.
  • 51:53And there are two,
  • 51:56there are two, items here
  • 51:58under the active sessions in
  • 51:59the active sessions box. One
  • 52:01is a is a is
  • 52:02a hyperlink.
  • 52:03That's the,
  • 52:04link to the JupyterLab,
  • 52:07GUI,
  • 52:08which allows you access to
  • 52:10Jupyter,
  • 52:11Jupyter notebook,
  • 52:13things like our,
  • 52:15and
  • 52:18the, those of you who
  • 52:19are
  • 52:19command line fans can also,
  • 52:23do SSH directly from your,
  • 52:25say, your VDI session,
  • 52:28and you can go straight
  • 52:29to the command line of
  • 52:30your environment. It's the same
  • 52:32thing. If you're running a
  • 52:34Python or
  • 52:35or or some or any
  • 52:36sort of script, from the
  • 52:38command line, you're really accessing
  • 52:40the same machine, the same
  • 52:41data, the same directory structure
  • 52:42that JupyterLab does.
  • 52:47See.
  • 52:48And that's I think that's
  • 52:49about all I've got to
  • 52:50say for that slide.
  • 52:54So
  • 52:55I've already said that Camino
  • 52:57environments have flexible computing power.
  • 53:00They can do a lot
  • 53:00of different things depending on
  • 53:02circumstances. And I'm I'm gonna
  • 53:04apologize in advance to our
  • 53:05fans in the audience here,
  • 53:07but I'm gonna leave it
  • 53:08out in my in this
  • 53:09slide and focus on Python
  • 53:11and PySpark.
  • 53:13You could have an environment
  • 53:14with two CPUs and eight
  • 53:16gigabytes of memory or sixty
  • 53:18four CPUs and two hundred
  • 53:20and fifty six gigabytes of
  • 53:21memory and with or without
  • 53:23GPUs.
  • 53:26The key thing about PySpark,
  • 53:28Python with Spark, is that
  • 53:30it is it creates distributed
  • 53:32computing.
  • 53:33So you can with if
  • 53:34you're running a PySpark script,
  • 53:36which looks very much like
  • 53:37Python
  • 53:38with SQL statements thrown in,
  • 53:41You can,
  • 53:42distribute very large queries over
  • 53:44many executors to immensely speed
  • 53:46up your your processing. You're
  • 53:47not simply
  • 53:49even in even with your
  • 53:50environment, you've got you're running
  • 53:51in one little,
  • 53:53virtual machine. But when you're
  • 53:55running,
  • 53:56queries from JupyterLab,
  • 53:58you can actually be spinning
  • 53:59up many, many executors to
  • 54:01speed up your, compute task.
  • 54:05We have,
  • 54:06flexible means of putting in
  • 54:08data requests. Now everything
  • 54:10that I'm describing here should
  • 54:12be subject to your
  • 54:14IRB and your,
  • 54:15Helix data request.
  • 54:17And Helix data the data
  • 54:19requests made in Camino have
  • 54:21to be approved by an
  • 54:22admin.
  • 54:23But for instance, you can
  • 54:26you have a, you know,
  • 54:27a few different ways of
  • 54:28requesting data.
  • 54:30I mentioned genomics, and genomics
  • 54:33data is available on Chip,
  • 54:34but it's not part of
  • 54:35the,
  • 54:37sort of GUI based
  • 54:40data request mechanism.
  • 54:43But I've I've got I'm
  • 54:44showing one of the pathways
  • 54:46for, for instance, selecting image
  • 54:48data.
  • 54:51For instance, you can you
  • 54:52start by picking imaging or
  • 54:54OMOP data,
  • 54:56possibly or genomic data eventually
  • 54:58once we have that in
  • 54:59the GUI.
  • 55:00Let's say that you're interested
  • 55:01in imaging. You can select
  • 55:03your clinical images
  • 55:05based on a set of
  • 55:06medical record numbers, a set
  • 55:08of accession numbers,
  • 55:09a bulk upload of IDs
  • 55:11if you have a very
  • 55:11large number, or or a
  • 55:13predefined cohort.
  • 55:15And that's one of the
  • 55:16really,
  • 55:17powerful things we have in
  • 55:18Camino.
  • 55:19We have a number of
  • 55:20cohorts defined based on computed
  • 55:23phenotypes.
  • 55:25And
  • 55:26if there's a cohort phenotype
  • 55:28you need that isn't in
  • 55:29CHIP, we can that we
  • 55:31can easily add it. And
  • 55:32cohorts provide a sort of
  • 55:34a slicer, dicer style,
  • 55:36statistics page so you can
  • 55:37actually see what, you know,
  • 55:39what the the salient characteristics
  • 55:41of your cohort is.
  • 55:48We've mentioned VDI.
  • 55:51VDI
  • 55:51complements
  • 55:53Camino
  • 55:53environments nicely because,
  • 55:55you know, VDI being a
  • 55:57virtual Windows desktop,
  • 56:00and I should emphasize, it's
  • 56:02not chip, but it's chip
  • 56:03adjacent. So it works with
  • 56:05chip, but it's not like,
  • 56:06our group doesn't maintain it.
  • 56:08This is something that desktop
  • 56:09engineering maintains.
  • 56:11All researchers
  • 56:12get access
  • 56:13to the Yale Research VDI.
  • 56:16And,
  • 56:18so the,
  • 56:19VDI, as I said, and
  • 56:20and,
  • 56:21Camino environments complement each other.
  • 56:22Camino environments have a lot
  • 56:24of raw compute power, but
  • 56:26they don't have,
  • 56:27a lot of nice GUI
  • 56:28tools.
  • 56:29Conversely, VDI,
  • 56:32with Windows doesn't have the
  • 56:34data processing power of Camino
  • 56:36and PySpark,
  • 56:37but they have,
  • 56:38apps that are good for,
  • 56:40statistics presentation
  • 56:42in finished finished datasets, like
  • 56:44like Stata, like SAS,
  • 56:47even, like, our studio is
  • 56:48available.
  • 56:50So,
  • 56:52the other key thing is
  • 56:53that
  • 56:54once you have if you
  • 56:56are a Camino user
  • 56:58and a VDI user,
  • 57:00the team shared directory right
  • 57:02here
  • 57:04can be accessed by your
  • 57:05team members from VDI. So
  • 57:07let's say you've done a
  • 57:09great deal of data crunching
  • 57:10on some multimodal thing, and
  • 57:12you've got
  • 57:14you've got a certain number
  • 57:15of images or you've,
  • 57:17or you've,
  • 57:19produced a dataset derived from
  • 57:21OMOP using compute power in
  • 57:23Camino,
  • 57:24you any team member can
  • 57:26drop that information into the
  • 57:28your shared folder in Camino.
  • 57:30And then
  • 57:31in VDI,
  • 57:32you can bring up that
  • 57:34folder as a as as
  • 57:35if it's a Windows shared
  • 57:36folder and then operate it
  • 57:38on it in
  • 57:39your you know, the application
  • 57:40of choice.
  • 57:42So you get that flexibility
  • 57:44of raw compute power and,
  • 57:46you know, sort of nice
  • 57:47GUI tools and presentation
  • 57:49through the sharing of the
  • 57:50team directory.
  • 57:55Now let's get on to
  • 57:56to large language models.
  • 58:01Secure computing at Y and
  • 58:03HH doesn't,
  • 58:04doesn't and many of you
  • 58:05have probably already encountered this.
  • 58:08You can you can, you
  • 58:09know, get your own account
  • 58:10with OpenAI, but you can't
  • 58:11necessarily do that from
  • 58:13within,
  • 58:14you can't do that from
  • 58:15within the hospital network, and
  • 58:16you certainly can't access,
  • 58:19arbitrary cloud based
  • 58:21large language models and computing
  • 58:22resources with,
  • 58:24with patient data.
  • 58:26So
  • 58:27we have,
  • 58:28developed a pretty significant,
  • 58:31library of large language models
  • 58:33within Chip.
  • 58:34Here's the list of what
  • 58:35we've got right now,
  • 58:37and, you know, we are
  • 58:38adding to them on a
  • 58:39regular basis. So if you,
  • 58:43have an application that requires
  • 58:45a particular LLM that is
  • 58:47not here,
  • 58:48you can, you know, you
  • 58:49can ask us and we
  • 58:50will you know, there's some
  • 58:52a little bit of security
  • 58:53review, but we'll be happy
  • 58:54to include it in our,
  • 58:56library of large language models.
  • 59:03So the one the one
  • 59:06method of accessing
  • 59:07large language models in Camino
  • 59:09is through a Camino environment
  • 59:11with a dedicated GPU.
  • 59:14If you
  • 59:15reserve
  • 59:16GPUs in Camino and there's
  • 59:18a formal,
  • 59:19GPU request process,
  • 59:21you can have GPUs allocated
  • 59:23to you for a certain
  • 59:25period of time.
  • 59:27The obvious advantages
  • 59:29are that you get a
  • 59:31lot of flexibility because you
  • 59:32can say, well, I've got
  • 59:33my GPU. I'm gonna try
  • 59:35I'm gonna run
  • 59:36my,
  • 59:38I'm gonna operate on my
  • 59:39dataset with,
  • 59:40two or three different GPUs
  • 59:42in you know, over the
  • 59:43course of a few days
  • 59:44and see or sorry. Two
  • 59:45or three different LLMs
  • 59:46and see,
  • 59:47how they differ.
  • 59:50You get maximum compute power
  • 59:51because while you have the
  • 59:52GPU, it is yours and
  • 59:53yours alone.
  • 59:56Really good for multimodal studies
  • 59:57because, you know, many we
  • 59:59know of at least a
  • 60:00couple of research groups that
  • 01:00:01are doing studies involving OMOP
  • 01:00:03data and clinical image data
  • 01:00:05and possibly other things, and
  • 01:00:07it's very useful to have
  • 01:00:09the the GPU available for
  • 01:00:10that.
  • 01:00:13Disadvantages
  • 01:00:14are kind of what you'd
  • 01:00:15expect, that there's a higher
  • 01:00:16cost.
  • 01:00:17Once you reserve a GPU
  • 01:00:19and that reservation is accepted,
  • 01:00:20the meter is running, and
  • 01:00:21then you're gonna be responsible
  • 01:00:23for paying for that resource.
  • 01:00:26It's a resource bottleneck
  • 01:00:27because if you have a
  • 01:00:29GPU,
  • 01:00:30reserved and you actually,
  • 01:00:32spun up an environment with
  • 01:00:35four h one hundreds,
  • 01:00:36those h one hundreds are
  • 01:00:37not available to anyone else.
  • 01:00:39They are not a shared
  • 01:00:39resource. They are tied to
  • 01:00:41an environment, and they cannot
  • 01:00:42be used by anybody else
  • 01:00:43until your environment has stopped.
  • 01:00:48Also, they require you doing
  • 01:00:49things this way require,
  • 01:00:51requires more programming expertise.
  • 01:00:54For some people, that might
  • 01:00:55not be desirable for simpler
  • 01:00:56use cases. Like, you may
  • 01:00:58simply be wanting to,
  • 01:01:01sort of, you know, ex
  • 01:01:02yeah, exchange prompts and and
  • 01:01:04prompt responses with
  • 01:01:06with an LLM in the
  • 01:01:07way that we interact with,
  • 01:01:09you know, open a OpenAI
  • 01:01:10and chat GPT.
  • 01:01:15Let's see.
  • 01:01:16We have an example. We
  • 01:01:18have a sample notebook and
  • 01:01:20project for anybody who wants
  • 01:01:22to try using dedicated GPU.
  • 01:01:25Vincent Zhang of, Hua's team
  • 01:01:27wrote,
  • 01:01:28a Jupyter notebook
  • 01:01:30that it does a simple,
  • 01:01:32does a demo demo of
  • 01:01:33simple inference and classification.
  • 01:01:36And I took that notebook
  • 01:01:37and adapted it into a
  • 01:01:39fully self contained repo in
  • 01:01:41our GitHub enterprise,
  • 01:01:43installation.
  • 01:01:45And it it's,
  • 01:01:46if you have an environment
  • 01:01:47with at least one GPU,
  • 01:01:49you can run this Jupyter
  • 01:01:51notebook and then tailor it
  • 01:01:52to your,
  • 01:01:53to your requirements. You can
  • 01:01:54customize your prompts. You can
  • 01:01:56change the data that's being
  • 01:01:57fed in. You can see
  • 01:01:58how the how the, how
  • 01:02:00the LLM behaves. So that
  • 01:02:01may be a very useful
  • 01:02:03thing if you're looking to
  • 01:02:04get started.
  • 01:02:09The more efficient way,
  • 01:02:11or a more effective way
  • 01:02:12to get at,
  • 01:02:14LLMs and GPUs is is
  • 01:02:16doing it via software as
  • 01:02:18a service, which I know
  • 01:02:19Vincent and,
  • 01:02:20other colleagues are gonna be
  • 01:02:22talking about in a lot
  • 01:02:22more detail this afternoon.
  • 01:02:25But just from the diagram,
  • 01:02:26you can see how it
  • 01:02:27kind of makes,
  • 01:02:29adds flexibility because you have
  • 01:02:31one container with a number
  • 01:02:32of GPUs,
  • 01:02:34multiple teams and multiple users
  • 01:02:36can be sending queries to
  • 01:02:38it at the same time.
  • 01:02:41Containers like the Kiwi system
  • 01:02:43can queue requests and queue
  • 01:02:45results
  • 01:02:46so that
  • 01:02:47you're not if you are
  • 01:02:49issuing a query, you're not
  • 01:02:51you don't get a busy
  • 01:02:51signal if the if there's
  • 01:02:53a lot going on, but
  • 01:02:54you may have to wait
  • 01:02:55a little bit.
  • 01:02:58The,
  • 01:02:59it creates some obvious improvements
  • 01:03:01in efficiency because,
  • 01:03:03with many people querying a
  • 01:03:05GPU, the GPU can be
  • 01:03:06running continuously,
  • 01:03:08rather than sort of in
  • 01:03:10a stop start way than
  • 01:03:11having somebody
  • 01:03:12with a dedicated GPU run
  • 01:03:14a couple of things,
  • 01:03:15stop, wait a few hours
  • 01:03:16or a few days while
  • 01:03:17nobody else can use it.
  • 01:03:20So we have we've
  • 01:03:22just begun to roll out
  • 01:03:24the Kiwi
  • 01:03:26containerized
  • 01:03:26application in Chip.
  • 01:03:28It will be I think
  • 01:03:30we're gonna be doing some
  • 01:03:31beta testing of it in
  • 01:03:32the test environment. Soon, we're
  • 01:03:34rolling out to production. And
  • 01:03:35then I believe we have
  • 01:03:36other,
  • 01:03:38other,
  • 01:03:39SaaS versions of,
  • 01:03:41GPUs and LMs,
  • 01:03:43that we'll be rolling out
  • 01:03:44in Chip as well to
  • 01:03:45maximize people's
  • 01:03:48access to to those resources.
  • 01:03:51And I think that is
  • 01:03:53everything that I have to
  • 01:03:55say about Chip and SAFE.
  • 01:03:58So thank you for bearing
  • 01:03:59with me, and,
  • 01:04:00happy to take any questions
  • 01:04:02for a couple of minutes
  • 01:04:03if anyone has any.
  • 01:04:12Thanks.
  • 01:04:20There are more time.
  • 01:04:21I'll
  • 01:04:25so I'll use a data
  • 01:04:26science software engineer.
  • 01:04:28I'll move from you.
  • 01:04:42Yeah. Good afternoon, everyone.
  • 01:04:44I'm Al Pacelle.
  • 01:04:46I have I'm probably the
  • 01:04:48newest member of the team,
  • 01:04:49I think,
  • 01:04:51almost a year
  • 01:04:52with Chip.
  • 01:04:55Very exciting for me. I
  • 01:04:56came from UnitedHealthcare,
  • 01:04:58after twenty seven years, and,
  • 01:05:01it
  • 01:05:02the data science information
  • 01:05:04has been, really exciting to
  • 01:05:06get into and start to
  • 01:05:07work with.
  • 01:05:09So I start wanted to
  • 01:05:10start off with a couple
  • 01:05:11of good reasons to use
  • 01:05:13Chip.
  • 01:05:16I think
  • 01:05:17kind of a resounding message
  • 01:05:19that everybody has spoken about
  • 01:05:21so far has been
  • 01:05:22the availability of data.
  • 01:05:25I think Chip has probably
  • 01:05:26one of the best,
  • 01:05:29amounts of data available.
  • 01:05:30There's three point one million
  • 01:05:32individuals
  • 01:05:33that have been,
  • 01:05:35to
  • 01:05:36the hospital or or in
  • 01:05:38the system,
  • 01:05:39since
  • 01:05:40twenty twelve, twenty thirteen.
  • 01:05:45All that data is HIPAA
  • 01:05:46compliant.
  • 01:05:47So, you know, we have
  • 01:05:49pretty much everything that,
  • 01:05:51everything you could possibly want.
  • 01:05:53It's
  • 01:05:54data from Epic. I think
  • 01:05:56we mentioned the the path
  • 01:05:58from
  • 01:06:01the the path basically from
  • 01:06:02Epic all the way to
  • 01:06:03Caboodle.
  • 01:06:05And we also have imaging
  • 01:06:07data
  • 01:06:08in the vendor neutral archive.
  • 01:06:10So you can get images
  • 01:06:11and data from data off
  • 01:06:13those images as well.
  • 01:06:19The thing
  • 01:06:20Nate mentioned that we have
  • 01:06:21a bunch of preload
  • 01:06:22preloaded large language models,
  • 01:06:25and,
  • 01:06:26Nate showed that information as
  • 01:06:27well. I'm gonna I was
  • 01:06:28gonna demo that, but,
  • 01:06:31I'll I'll hit the actual,
  • 01:06:34the actual repository so you
  • 01:06:35can see them live.
  • 01:06:39And
  • 01:06:40it's a high speed compute
  • 01:06:42environment. I think Nate's numbers
  • 01:06:43are probably better than MindMiner,
  • 01:06:45I think from
  • 01:06:47couple years ago.
  • 01:06:48But,
  • 01:06:50you know, over over seventeen
  • 01:06:51hundred CPUs,
  • 01:06:53twenty eight GPUs,
  • 01:06:55petabyte of storage and change.
  • 01:06:57So all
  • 01:06:59all super good.
  • 01:07:04So Camino. Camino is a
  • 01:07:06curated data broker. It's a
  • 01:07:08front end chip.
  • 01:07:11Users get their own custom
  • 01:07:13compute environment.
  • 01:07:14Like we mentioned, it was
  • 01:07:15Linux.
  • 01:07:19We also provide Jupyter Notebooks
  • 01:07:20as an interface for coding.
  • 01:07:22So you can code in
  • 01:07:23Python
  • 01:07:24three, PySpark,
  • 01:07:26PyTorch,
  • 01:07:28and Nate's favorite, R.
  • 01:07:34Amino comes preloaded with Python
  • 01:07:36packages. We have a pretty
  • 01:07:38good number of common packages,
  • 01:07:41as well as our analytics
  • 01:07:43packages as well.
  • 01:07:46There's also the ability to
  • 01:07:47add additional packages. So if
  • 01:07:50you're if you have questions
  • 01:07:51about whether your favorite package
  • 01:07:53is available or not,
  • 01:07:55I will show you in
  • 01:07:56a few minutes how to
  • 01:07:57actually ask me to add
  • 01:07:59more packages.
  • 01:08:00So and same goes for
  • 01:08:02LLMs.
  • 01:08:03We have a bunch of
  • 01:08:04preloaded LLMs.
  • 01:08:06I think we have a
  • 01:08:07pretty nice,
  • 01:08:09cross section of them. But
  • 01:08:11if you find something that
  • 01:08:12you're interested in that
  • 01:08:15you really want,
  • 01:08:17you can put in a
  • 01:08:18request and talk to me,
  • 01:08:19and I will be happy
  • 01:08:20to see if we can
  • 01:08:21get that loaded for you.
  • 01:08:24Also, there's more information on,
  • 01:08:26we have a
  • 01:08:28on the com Camino chip
  • 01:08:30user group website.
  • 01:08:32Sorry. In that website, but
  • 01:08:34in the, Yale University
  • 01:08:36Microsoft Teams instance, we have
  • 01:08:38a Camino chip user group.
  • 01:08:40And in that Camino chip
  • 01:08:41user group,
  • 01:08:43there's a welcome packet. It
  • 01:08:44has all that information, the
  • 01:08:46analytics packages,
  • 01:08:47the LLMs, etcetera.
  • 01:08:50And just a little fun
  • 01:08:51fact,
  • 01:08:52this picture here up on
  • 01:08:53the wall,
  • 01:08:55that is our data center.
  • 01:08:56That picture came from our
  • 01:08:58data center.
  • 01:08:59I'm not sure which closet
  • 01:09:01has the LLMs in it,
  • 01:09:02but or sorry. The, GPUs
  • 01:09:05in it, but they're right
  • 01:09:07there, literally. Yeah.
  • 01:09:11But with all the heat
  • 01:09:12coming off of it.
  • 01:09:17Alright.
  • 01:09:19So I'm gonna show you
  • 01:09:20a few things right now.
  • 01:09:21I'm gonna show you how
  • 01:09:22to reserve a GPU. So
  • 01:09:24Nate mentioned that,
  • 01:09:26we do have GPUs available,
  • 01:09:28and you can request
  • 01:09:30a GPU.
  • 01:09:31And, I will show you
  • 01:09:32how to do that reservation.
  • 01:09:35Also show you the,
  • 01:09:38how to request additional analytics
  • 01:09:40packages and models if you
  • 01:09:42have
  • 01:09:42interest in loading up additional
  • 01:09:44models.
  • 01:09:46I'll show you how to
  • 01:09:47create an environment, and I'll
  • 01:09:48show you an actual Jupyter
  • 01:09:50notebook.
  • 01:09:50I'm not gonna actually run
  • 01:09:51through it, but I will
  • 01:09:53I will I've already prerun
  • 01:09:55it, and I will show
  • 01:09:56you the results.
  • 01:09:58So without further
  • 01:10:02ado,
  • 01:10:24And this is actually the
  • 01:10:26OHDSI,
  • 01:10:28Odysee
  • 01:10:30website that Nate was discussing
  • 01:10:32earlier.
  • 01:10:36Alright. So this is Camino.
  • 01:10:38Sure.
  • 01:10:44Screen's not sharing. Gotcha. Yeah.
  • 01:10:51Yep.
  • 01:10:56System share.
  • 01:11:01One sharing, one stop.
  • 01:11:15Yeah.
  • 01:11:30Alright. Thanks, Juan.
  • 01:11:33Alright. Sorry about that, folks.
  • 01:11:38Alright. So the first thing
  • 01:11:40I'm gonna show you is
  • 01:11:40how to request a GPU.
  • 01:11:42So,
  • 01:11:44on everybody's
  • 01:11:46account,
  • 01:11:47there's a drop down
  • 01:11:49that shows,
  • 01:11:51GPU reservation in the list.
  • 01:11:55K. And when you go
  • 01:11:56to create a GPU reservation,
  • 01:11:57you just click on the
  • 01:11:58new GPU
  • 01:11:59reservation.
  • 01:12:02It'll prefill with your team
  • 01:12:03name and your username.
  • 01:12:06It'll also give you a
  • 01:12:08drop down that will list
  • 01:12:09all the available
  • 01:12:11GPU models that we have
  • 01:12:13as well as the sizes.
  • 01:12:14So two GPU, four GPU,
  • 01:12:16eight GPU.
  • 01:12:19Hey. I'm gonna just pick,
  • 01:12:22everybody seems to like the
  • 01:12:23h one hundred, so I'll
  • 01:12:24do a two h one
  • 01:12:25hundreds.
  • 01:12:29You can choose
  • 01:12:30how long you wanna reserve
  • 01:12:31them.
  • 01:12:32I'll give you an advanced
  • 01:12:33warning.
  • 01:12:34Typically, if you get a
  • 01:12:35request in today,
  • 01:12:37I may
  • 01:12:39I may actually grant it
  • 01:12:41to you today.
  • 01:12:42The day I grant it
  • 01:12:43to you, I usually give
  • 01:12:44you the rest of the
  • 01:12:45day for free because
  • 01:12:47I don't
  • 01:12:48I don't always,
  • 01:12:51I don't always know if
  • 01:12:52you put it in at,
  • 01:12:52like, eight AM or if
  • 01:12:53you put it you know,
  • 01:12:54if I approve it at
  • 01:12:55noon or twelve thirty or
  • 01:12:57one o'clock. I don't wanna
  • 01:12:58shortchange you by half a
  • 01:13:00day. So you get you
  • 01:13:01get the full you get
  • 01:13:02the rest of the day
  • 01:13:02for free, then you get
  • 01:13:04the next day completely.
  • 01:13:06So, I'm gonna put it
  • 01:13:08in for one day.
  • 01:13:10Whoops.
  • 01:13:11Give me that.
  • 01:13:14COA is the chart of
  • 01:13:15accounts field.
  • 01:13:16So if you are,
  • 01:13:19we're we're planning at some
  • 01:13:20point to start billing for
  • 01:13:21these. So if you have,
  • 01:13:24for your grant, a chart
  • 01:13:25of accounts available, you can
  • 01:13:26put your COA number in
  • 01:13:27here, which will help you
  • 01:13:29to,
  • 01:13:30not have to get contacted
  • 01:13:31by me to find out
  • 01:13:32what your COA is before
  • 01:13:33I approve it.
  • 01:13:35And then,
  • 01:13:36if you put something in
  • 01:13:37for
  • 01:13:38what the purpose of the
  • 01:13:39GPU is to help kinda
  • 01:13:41understand what you're doing with
  • 01:13:43it.
  • 01:13:56K. Obviously, I can't type.
  • 01:13:59Alright. When you submit it,
  • 01:14:02you'll see it will show
  • 01:14:03as pending.
  • 01:14:05Okay? And then when somebody
  • 01:14:06goes and approves that, it
  • 01:14:08will change to approved.
  • 01:14:10Usually, if I can't approve
  • 01:14:11it, I'll reach out to
  • 01:14:12you and find out what
  • 01:14:13you'd like to do.
  • 01:14:15Typically, the only reason I
  • 01:14:17don't approve them is because
  • 01:14:19I don't have one available
  • 01:14:21because other people already are
  • 01:14:22using them.
  • 01:14:23So if I reach out
  • 01:14:24to you, it's probably gonna
  • 01:14:25be, can I do this
  • 01:14:27for you in five days
  • 01:14:28or four days, or
  • 01:14:31is it okay if we
  • 01:14:31do this next week?
  • 01:14:33But for the most part,
  • 01:14:35we've been pretty good about
  • 01:14:36sharing the GPUs, so they've
  • 01:14:38been going pretty pretty fast.
  • 01:14:42Alright. I did mention that,
  • 01:14:44I also wanted to show
  • 01:14:45you how to ask for
  • 01:14:46additional models if you want
  • 01:14:48a different model.
  • 01:14:50Also, if you,
  • 01:14:52have any analytics packages you
  • 01:14:54wanna install,
  • 01:14:55there is a
  • 01:14:56report and enhancement on the
  • 01:14:57left hand side here.
  • 01:15:03This will take you
  • 01:15:04hopefully, if I yep.
  • 01:15:07You may have to log
  • 01:15:08in to your y n
  • 01:15:09h h credentials
  • 01:15:10before you do this, but
  • 01:15:12it will take you to
  • 01:15:13a
  • 01:15:13form.
  • 01:15:15It's a pretty simple form.
  • 01:15:16It'll prefill with all your
  • 01:15:17information.
  • 01:15:19All you need to do
  • 01:15:19is put in a title
  • 01:15:21and a description. So,
  • 01:15:23I mean, on the title,
  • 01:15:24you just put requesting new
  • 01:15:25large language model. And in
  • 01:15:28the, description,
  • 01:15:30tell me
  • 01:15:31what you'd like, where it
  • 01:15:32comes from,
  • 01:15:33how it can get it.
  • 01:15:36We are
  • 01:15:37heavily invested in Hugging Face,
  • 01:15:40so we have a lot
  • 01:15:40of,
  • 01:15:41opportunity to grab Hugging Face
  • 01:15:43models if you use Hugging
  • 01:15:45Face.
  • 01:15:48Those those are pretty much,
  • 01:15:50approved for use, so we
  • 01:15:52can we can pull them
  • 01:15:54from hugging face.
  • 01:15:56For the analytics models, basically,
  • 01:15:58anything you can pip install,
  • 01:16:00let me know.
  • 01:16:01I tried
  • 01:16:03to I tried to do
  • 01:16:03a juggling act of, do
  • 01:16:05we wanna install
  • 01:16:07something that we already have
  • 01:16:08a similar
  • 01:16:10a similar analytics package for?
  • 01:16:14So if there's something out
  • 01:16:15there
  • 01:16:16that's special for you, it
  • 01:16:18helps for me to understand
  • 01:16:19why it's different than something
  • 01:16:21else that's out there. You
  • 01:16:23know? Why why seaborne when
  • 01:16:25we could use matplotlib?
  • 01:16:27You know, what we actually
  • 01:16:28have both of those. But,
  • 01:16:30you know, if you let
  • 01:16:31me know what the difference
  • 01:16:32what what you're interested in
  • 01:16:34and why, that'll help me
  • 01:16:36kinda make a decision to
  • 01:16:37move forward with it or
  • 01:16:38not.
  • 01:16:46Alright. I'm gonna show you
  • 01:16:47a little bit in about
  • 01:16:48the environments. So we
  • 01:16:51there we say, don't do
  • 01:16:52live demos.
  • 01:16:54Oh, question? Question.
  • 01:16:56They talk about the rates.
  • 01:16:57How much would it be
  • 01:16:59possible?
  • 01:17:01I'll I'll just say it's
  • 01:17:02very competitive with Hugging Face.
  • 01:17:04So Hugging Face charges about
  • 01:17:06a hundred and twenty dollars
  • 01:17:07a day.
  • 01:17:09Ours are about a hundred
  • 01:17:10and twenty dollars for seven
  • 01:17:12days. So
  • 01:17:13it's it's pretty competitive.
  • 01:17:16I think the if I
  • 01:17:18recall correctly, the h one
  • 01:17:19hundreds are forty cents
  • 01:17:21per hour reservation,
  • 01:17:23and the a one hundreds
  • 01:17:25are thirty cents per hour,
  • 01:17:26and the v one hundreds
  • 01:17:28have not been
  • 01:17:30charged before yet. So
  • 01:17:32So you've gone to amount
  • 01:17:35of May where we are
  • 01:17:36trial period.
  • 01:17:53Yeah.
  • 01:18:03Yeah. It's it's it's very
  • 01:18:05it's very competitive compared to
  • 01:18:07other providers like Hugging Face.
  • 01:18:11I'm gonna show you how
  • 01:18:12to create a new environment.
  • 01:18:14Alright. So I'm just gonna
  • 01:18:15create an environment here.
  • 01:18:20I think Rich mentioned we're
  • 01:18:21kind of
  • 01:18:22helix number
  • 01:18:24oriented, so
  • 01:18:25a lot of my names
  • 01:18:26tend to have helix numbers
  • 01:18:28in them.
  • 01:18:30I'm gonna just make this
  • 01:18:32one
  • 01:18:34my favorite number.
  • 01:18:50Alright. When you
  • 01:18:56when you are selecting an
  • 01:18:57environment,
  • 01:18:58you need to make sure
  • 01:18:59that your image matches whatever
  • 01:19:02environment you're gonna build. So
  • 01:19:05in this case, I'm gonna
  • 01:19:06build an h one hundred
  • 01:19:08environment. So I need to
  • 01:19:10match the h one hundred
  • 01:19:12environment with the GPU and
  • 01:19:14PyTorch
  • 01:19:15using a one hundred and
  • 01:19:16h one hundred.
  • 01:19:18So the last one here
  • 01:19:19will do either a one
  • 01:19:21hundred or h one hundred.
  • 01:19:22The one above it, only
  • 01:19:23v one hundreds,
  • 01:19:25and the other one is
  • 01:19:26for just Python and PySpark.
  • 01:19:34Alright. So like I said,
  • 01:19:35when you match them,
  • 01:19:36the size of the environment,
  • 01:19:38the GPUs you're gonna use,
  • 01:19:39the CPUs you're gonna use
  • 01:19:40needs to match with the
  • 01:19:42image that you're using. So
  • 01:19:44this will have to be
  • 01:19:45an h one hundred.
  • 01:19:48I'll keep you the eight.
  • 01:19:49Let's see.
  • 01:19:58K. So two h one
  • 01:19:59hundreds
  • 01:20:02should do it.
  • 01:20:05K. And when you create
  • 01:20:06an environment, you'll have an
  • 01:20:07opportunity to put a data
  • 01:20:10request in. So,
  • 01:20:12I'm gonna put in a
  • 01:20:13couple of different data alert
  • 01:20:14data requests.
  • 01:20:19The first data request I
  • 01:20:20put in is gonna be
  • 01:20:22this large language model access.
  • 01:20:24When you request
  • 01:20:26a large when you request
  • 01:20:27using a GPU, I will
  • 01:20:29automatically give you an LLM
  • 01:20:31access to go with it.
  • 01:20:33That way you can access
  • 01:20:35the predownloaded
  • 01:20:36large language models.
  • 01:20:38Additionally, I'm gonna throw in
  • 01:20:39some data here.
  • 01:20:45So I have a limited
  • 01:20:46dataset,
  • 01:20:49folder, which I'll throw in
  • 01:20:51there as well.
  • 01:20:58Now for the tricky part.
  • 01:20:59If I start this up,
  • 01:21:03I'm actually starting up a
  • 01:21:05two GPU environment now. So,
  • 01:21:08in theory, it will start
  • 01:21:10up. In practice,
  • 01:21:12right now, I think we
  • 01:21:12have a lot of people
  • 01:21:13using h one hundreds, so
  • 01:21:14I may not actually get
  • 01:21:15the h one hundred started.
  • 01:21:17It may throw an error
  • 01:21:18for me,
  • 01:21:19but I did wanna just
  • 01:21:20at least show it.
  • 01:21:24H one hundreds do take
  • 01:21:25up a little bit of
  • 01:21:26time to spin up.
  • 01:21:28The virtual CPUs, the regular
  • 01:21:30CPUs,
  • 01:21:31spin up pretty quickly. They're
  • 01:21:32they take about a minute,
  • 01:21:33two minutes.
  • 01:21:36The
  • 01:21:37LLM I'm sorry. The, the
  • 01:21:39h one hundreds, the GPUs,
  • 01:21:41the a one hundreds take
  • 01:21:43a little bit more time
  • 01:21:44to spin up. They sometimes
  • 01:21:46go, like, four or five
  • 01:21:48minutes. But while we're waiting
  • 01:21:49for that spin up, we'll
  • 01:21:51do something else as well.
  • 01:21:53So
  • 01:21:55while that's going,
  • 01:22:01we'll go to an environment
  • 01:22:02I already have running.
  • 01:22:06K. So
  • 01:22:09this environment is a two
  • 01:22:10CPU,
  • 01:22:12eight gigabyte of RAM environment.
  • 01:22:15So no GPUs on this
  • 01:22:16one. This is just plain
  • 01:22:17pie
  • 01:22:18pie spark.
  • 01:22:24And
  • 01:22:26clicking on this link will
  • 01:22:27bring up the JupyterLab.
  • 01:22:32So in this JupyterLab,
  • 01:22:34this is actually
  • 01:22:36one that I set up
  • 01:22:37a while back. I did
  • 01:22:38some
  • 01:22:40some SQL testing in here
  • 01:22:41just for fun, just to
  • 01:22:43get used to the environment.
  • 01:22:48So you can see here
  • 01:22:49I have
  • 01:22:50data. So this is the
  • 01:22:51data that Rich was talking
  • 01:22:52about. These are literally the,
  • 01:22:56parquet files.
  • 01:22:57So in here
  • 01:23:00are parquet files for each,
  • 01:23:04for each and every
  • 01:23:07grouping of data. So
  • 01:23:10this is like the concept
  • 01:23:12tables.
  • 01:23:15Almost everybody deals with person
  • 01:23:17tables. So these parquet files
  • 01:23:19are not a one to
  • 01:23:20one relationship. You won't have
  • 01:23:21three million if you're looking
  • 01:23:23for everybody.
  • 01:23:24Each parquet file can contain
  • 01:23:26thousands of records. So
  • 01:23:29the the fun part is
  • 01:23:30reading them all together,
  • 01:23:33which I've already done for
  • 01:23:34us because I didn't wanna
  • 01:23:36spend too much time on
  • 01:23:37this.
  • 01:23:40So you can see here,
  • 01:23:41this is the code that
  • 01:23:42reads in the data.
  • 01:23:46Alright?
  • 01:23:48Every now and then, you
  • 01:23:49have little warnings from,
  • 01:23:51from
  • 01:23:52Jupyter
  • 01:23:53because of things that come
  • 01:23:55up.
  • 01:23:56Some of them usually, the
  • 01:23:57warnings you can ignore, but
  • 01:23:59errors you have to fix.
  • 01:24:02Right? So this is where
  • 01:24:04I lead read in the
  • 01:24:05OMOP files.
  • 01:24:07You can see here, I
  • 01:24:08actually read in the OMOP
  • 01:24:09files. I got output.
  • 01:24:13And this is only the
  • 01:24:14first twenty five lines. There's
  • 01:24:15about two thousand,
  • 01:24:17if I recall correctly, in
  • 01:24:18this file.
  • 01:24:21So, basically, I read in
  • 01:24:23information about a small cohort,
  • 01:24:28and I actually
  • 01:24:29did a little bit. I
  • 01:24:31just split it up to
  • 01:24:32see, like, okay. Analytic you
  • 01:24:34know, just to do some
  • 01:24:35light analytics,
  • 01:24:36what
  • 01:24:37what group were males versus
  • 01:24:39females. So
  • 01:24:41it's a graph of male
  • 01:24:42versus female.
  • 01:24:45I also spit out the
  • 01:24:46data into my
  • 01:24:50team directory.
  • 01:24:57So the actual data is
  • 01:24:59here.
  • 01:25:02Actually, we have it open.
  • 01:25:04So these were the results.
  • 01:25:06So people
  • 01:25:08who have
  • 01:25:10chronic myeloid
  • 01:25:12leukemia.
  • 01:25:13Right?
  • 01:25:15And I have it split
  • 01:25:15up based on what their,
  • 01:25:18what their condition source code
  • 01:25:20was. So these source codes
  • 01:25:22are ICD nine codes.
  • 01:25:29This is the SQL that
  • 01:25:30ran.
  • 01:25:31So the c you can
  • 01:25:32see from the SQL,
  • 01:25:34basically, you're looking at,
  • 01:25:37persons who have a specific
  • 01:25:39set of source
  • 01:25:41condition codes, and their birth
  • 01:25:43date is before March first
  • 01:25:45two thousand five.
  • 01:25:47And specifically in the date
  • 01:25:49range
  • 01:25:50where they were seen between,
  • 01:25:52December first twenty twenty two
  • 01:25:55and
  • 01:25:56January first twenty twenty three.
  • 01:25:58So that's,
  • 01:25:59when I say two thousand,
  • 01:26:01two thousand people
  • 01:26:03just in
  • 01:26:04that one month time frame.
  • 01:26:06So
  • 01:26:08let's see if that other
  • 01:26:09environment started.
  • 01:26:19Yeah. It's still pending. So,
  • 01:26:22I'm not gonna wait for
  • 01:26:23it to start, but I
  • 01:26:23did wanna just mention that
  • 01:26:26when it when it does
  • 01:26:27start,
  • 01:26:28it's a good day. When
  • 01:26:30it doesn't start, you'll get
  • 01:26:31an error message that shows
  • 01:26:33you why it didn't start.
  • 01:26:34So
  • 01:26:36with that, are there any
  • 01:26:38questions?
  • 01:26:42Is there a way that
  • 01:26:44you can delete your
  • 01:26:46environment that, like, you know,
  • 01:26:49Yeah. So,
  • 01:26:51we're working on
  • 01:26:53the the delete functionality,
  • 01:26:55so we can just get
  • 01:26:56rid of them. But, basically,
  • 01:26:57if you don't start the
  • 01:26:58environment, if you shut down
  • 01:27:00the environment, it's not taking
  • 01:27:01up any resources, so it's
  • 01:27:04not a not a major
  • 01:27:05concern.
  • 01:27:06I do ask people to
  • 01:27:07shut down environments when they're
  • 01:27:09done because, like everything else
  • 01:27:11on this planet, it's finite.
  • 01:27:12And,
  • 01:27:13you know, if we shut
  • 01:27:15down environments where we're not
  • 01:27:16using them, you'll have plenty
  • 01:27:17of
  • 01:27:18resources for other people to
  • 01:27:20share. So
  • 01:27:22Yeah. Good question. I was
  • 01:27:22just wondering, when within the
  • 01:27:24stats, when do you create
  • 01:27:26That before the environment
  • 01:27:29Yeah. So good question. So
  • 01:27:32so you can see here,
  • 01:27:34right now, I'm a member
  • 01:27:35of chip admin.
  • 01:27:37If you don't have more
  • 01:27:38if you have when you
  • 01:27:39request the team
  • 01:27:41through the process of that,
  • 01:27:45that Rich showed earlier where
  • 01:27:47you put in a a
  • 01:27:48helix request. So you first,
  • 01:27:49you request RBA.
  • 01:27:52K. Once you have researcher
  • 01:27:53basic access, you'll have a
  • 01:27:55YNHH ID,
  • 01:27:57then you can request a
  • 01:27:58team. When you request the
  • 01:28:00team
  • 01:28:01through the Helix request, you'll
  • 01:28:02get a team,
  • 01:28:06okay,
  • 01:28:07which will contain your IRB
  • 01:28:09number
  • 01:28:10or p two r, if
  • 01:28:11you're a p two r
  • 01:28:11team,
  • 01:28:13the helix number, and the
  • 01:28:15PI's last name.
  • 01:28:19When when you get the
  • 01:28:20team,
  • 01:28:22if you only have one
  • 01:28:23team, you won't see this
  • 01:28:24team drop down
  • 01:28:26because
  • 01:28:27you only show it once
  • 01:28:28you have multiple teams.
  • 01:28:32You'll then be able
  • 01:28:33to have a project.
  • 01:28:35Typically, the
  • 01:28:37data the the JADAT RIO
  • 01:28:39team will put
  • 01:28:41a actual,
  • 01:28:42project out here for you.
  • 01:28:44And under the project is
  • 01:28:46where
  • 01:28:47that data information gets loaded.
  • 01:28:49So,
  • 01:28:51we do have
  • 01:28:53we do have self-service capability,
  • 01:28:55but it does need to
  • 01:28:56be approved. So
  • 01:28:58but right now, asking people
  • 01:29:00to work with the JADAP
  • 01:29:01team
  • 01:29:02to make sure that they
  • 01:29:03have their data,
  • 01:29:04their their cohort
  • 01:29:06well identified before they go
  • 01:29:08through this.
  • 01:29:11And then once once your
  • 01:29:12data is here,
  • 01:29:14then under environments, when you
  • 01:29:16create an environment,
  • 01:29:18you'll actually be able to
  • 01:29:19load that data into that
  • 01:29:20environment.
  • 01:29:21That answer your question?
  • 01:29:24Yeah. It's on a solution
  • 01:29:25where to.
  • 01:29:28I can.
  • 01:29:56Alright. That yeah. You need
  • 01:29:57to have a YNHH account
  • 01:29:59to get here.
  • 01:30:00This is
  • 01:30:01this is actually on which
  • 01:30:03is,
  • 01:30:05is this which one of
  • 01:30:06your links on your yeah.
  • 01:30:08Rich Rich's portion of the
  • 01:30:10presentation had the link to
  • 01:30:12Helix,
  • 01:30:12but this is the actual,
  • 01:30:14live website.
  • 01:30:18And, yeah, it's
  • 01:30:25yeah, it's it's pretty straightforward
  • 01:30:27once you start entering the
  • 01:30:28data in there.
  • 01:30:30Question?
  • 01:30:45Yeah. It does. It just
  • 01:30:52didn't even think of it.
  • 01:30:53Could you get a gold
  • 01:30:55star.
  • 01:30:57Yes.
  • 01:31:03Great question. So if you
  • 01:31:04have custom data, I'm I'm
  • 01:31:05gonna let Rich field this
  • 01:31:07one.
  • 01:31:08If you have custom data
  • 01:31:10and I'll tell you when
  • 01:31:10I field it, and then
  • 01:31:11you can tell me where
  • 01:31:12I'm wrong.
  • 01:31:13If you have custom data,
  • 01:31:15you'll be working with one
  • 01:31:16of the JADAT team.
  • 01:31:18Somebody from Rich's team will
  • 01:31:19work with you
  • 01:31:20to load that data,
  • 01:31:23into
  • 01:31:24Camino in the chip. That's
  • 01:31:26probably the best answer. There
  • 01:31:27are some things going on
  • 01:31:28as you're aware of to
  • 01:31:30map directories in the DDI,
  • 01:31:32the CHP.
  • 01:31:33That's not a single type
  • 01:31:34of process. So for some
  • 01:31:35time use, I would remove
  • 01:31:37the new point.
  • 01:31:41Right. Any other questions?
  • 01:31:45Alright. Thank you.