“Competency Assessment - Are Your Measures Reliable?”
March 26, 2026“Competency Assessment - Are Your Measures Reliable?”
Joseph Donroe MD, MPH, MHS - Yale School of Medicine
March 19, 2026
Yale GIM “Research in Progress” Meeting Presented by: Yale School of Medicine’s Department of Internal Medicine, Section of General Internal Medicine
About the speakers
Information
- ID
- 14007
- To Cite
- DCA Citation Guide
Transcript
- 00:00Yeah.
- 00:04Alright. Well, welcome everybody.
- 00:07Thank you for coming to,
- 00:09noon conference, general medicine.
- 00:12Today's,
- 00:14CME CME code is five
- 00:16five nine zero one.
- 00:21Okay. So upcoming,
- 00:23retreats. Next one will be
- 00:25the education retreat on the
- 00:27West Campus.
- 00:28Stay tuned.
- 00:30Important
- 00:31but,
- 00:33familiar.
- 00:34F DAC reminders, please be
- 00:35on the lookout for your
- 00:37next steps, which
- 00:38most likely most likely will
- 00:40be including meeting with your
- 00:42mentors
- 00:43or your delegates.
- 00:48Research
- 00:48in progress,
- 00:51I guess the next one,
- 00:52March twenty sixth will be
- 00:53Nate Wood.
- 00:55No. That's not okay. Got
- 00:56it. Sorry.
- 00:57Grant Rounds, seven thirty AM
- 00:59at Nate Wood. Food insecurity
- 01:01and culinary medicine,
- 01:03followed by,
- 01:04noon conference,
- 01:06next week. Ben Amba,
- 01:08speaking about the impact of
- 01:10the current landscape
- 01:12on GME
- 01:13nationwide.
- 01:16Disclosure and accreditation.
- 01:20So I'm excited to introduce,
- 01:23Joe Donrow, who is going
- 01:25to be joining us. So
- 01:26the original plan for today
- 01:27was to have two thirty
- 01:29minute time slots. Our second,
- 01:31presenter is unfortunately unavailable at
- 01:33the last minute. So we
- 01:35have one thirty minute time
- 01:36slot, but we're really glad
- 01:37to have,
- 01:38Joe joining us. Joe started
- 01:41his training career at Tufts
- 01:42where he learned his MD
- 01:44earned his MD and Miles
- 01:45per hour, spent two years
- 01:47in Peru,
- 01:48before heading back north to
- 01:50New Haven,
- 01:51to, complete his med peds,
- 01:53residency and chief resident. And
- 01:55he's here now on faculty
- 01:56where he focuses on
- 01:59teaching clinical skills,
- 02:01clinical education overall, and also
- 02:03taking care of patients with,
- 02:05addiction.
- 02:06He's done a lot of
- 02:07work on,
- 02:09teaching pocus to our house
- 02:11staff and others. And here
- 02:13is, going to be speaking
- 02:14about, more generally competency assessment
- 02:17and are your measures reliable.
- 02:19So thank you.
- 02:22Alright. Thanks for the opportunity
- 02:23to come and talk.
- 02:27It's,
- 02:28it's about competency assessment, but
- 02:30it's it's gonna just closely
- 02:32parallel a study that, that
- 02:34we
- 02:35did here and and recently,
- 02:37completed. So,
- 02:39actually,
- 02:40when I was approached to
- 02:41do this, it it it
- 02:42was a research in in
- 02:43progress.
- 02:45I'm happy to say now
- 02:46that it's,
- 02:48we've been published this week,
- 02:50with a group of
- 02:52authors from from here and
- 02:54and elsewhere.
- 02:55And, I can
- 02:57I can pause for a
- 02:57second to hit that QR
- 02:59code and download it and
- 03:00get the metrics up a
- 03:01little a little bit?
- 03:08I'll go through the the
- 03:09the background and the the
- 03:11methodology
- 03:11of of what we did,
- 03:14briefly,
- 03:15because we only have twenty
- 03:16five minutes or so.
- 03:18I really wanted to try
- 03:19and focus the
- 03:20the conversation
- 03:22around,
- 03:23a form of reliability
- 03:25testing that's called generalizability
- 03:27theory and, decision study,
- 03:31and give a brief overview
- 03:33of that and how we
- 03:34interpreted it in the in
- 03:36the context of our of
- 03:37our study. And my disclaimer
- 03:38is I am I am
- 03:40not an expert in these,
- 03:42this analytic,
- 03:44technique.
- 03:44And it was actually really
- 03:46hard to find, expertise
- 03:48to to move our our
- 03:49project forward, but I'll I'll
- 03:50circle back to that in
- 03:51a in a moment.
- 03:55So some,
- 03:56background information.
- 03:58My interest in this stems
- 03:59from the work I do
- 04:00in, in terms of leading
- 04:02the point of care ultrasound
- 04:04programs for internal medicine,
- 04:07training
- 04:07residents and and training faculty
- 04:10to use this as a
- 04:10tool in, in the clinical
- 04:12environment.
- 04:13Point of care ultrasound
- 04:15is the utilization
- 04:16of ultrasound at the point
- 04:18of care
- 04:19by the treating physician. And
- 04:21so we use it to
- 04:22help diagnose
- 04:23and manage,
- 04:25and in contrast to comprehensive
- 04:27ultrasound,
- 04:27this is used to really
- 04:29address very focused problems.
- 04:35What's
- 04:37what's been lagging
- 04:38as Point of Georgetown becomes
- 04:40more and more popular amongst
- 04:43medical schools, amongst residents, and
- 04:44amongst faculty,
- 04:46The utilization of Pocus
- 04:48is increasing,
- 04:50at a tremendous rate. However,
- 04:51our ability to understand, are
- 04:53people actually competent to use
- 04:55this tool?
- 04:56That is lagging way behind.
- 04:58We don't have very good
- 04:59measures,
- 05:00to be able to do
- 05:01that, especially at that top
- 05:03part of Miller's pyramid,
- 05:05where it's really,
- 05:07competency in action. You know,
- 05:09how is the learner actually
- 05:10performing in the clinical, clinical
- 05:12arena? And so
- 05:17we formed a research question
- 05:19around this this existing gap,
- 05:22and the question became,
- 05:24what is the validity evidence
- 05:25supporting the use of an
- 05:26entrustable professional activity
- 05:29framework
- 05:30to assess point of care
- 05:31ultrasound competency
- 05:33in internal medicine
- 05:35learners.
- 05:37Yeah.
- 05:39What is the state of
- 05:40care at
- 05:42The level of certification of
- 05:44what's required for someone to
- 05:45roll out the focus machine
- 05:47in their own practice, I
- 05:48guess? That's the level. Yeah.
- 05:50Yeah. It's,
- 05:51it's it's a little bit
- 05:52of, of the wild west.
- 05:56Right now,
- 05:57most,
- 05:59departments at Yale do not
- 06:00have a a privileging
- 06:02mechanism,
- 06:03for point of care ultrasound.
- 06:04There are a few that
- 06:05do. Emergency medicine,
- 06:07does.
- 06:08Surprisingly, you know, groups like
- 06:10Palm Crit do not.
- 06:12Internal medicine does not.
- 06:14And so as these are
- 06:15being used more and more,
- 06:17they're being used in the
- 06:18absence of a privileging process.
- 06:20In the absence of privilege
- 06:22process for the hospital, there's
- 06:23no formal
- 06:25credentialing process either for which
- 06:27to verify,
- 06:28competency.
- 06:29And so it's a lot
- 06:30of, sort of up to
- 06:32the professional to make a
- 06:33decision on whether or not
- 06:34they feel comfortable using that
- 06:37in the in the clinical
- 06:37arena.
- 06:38And as we as we
- 06:40know, clinicians are not always
- 06:41the best self assessors,
- 06:43which,
- 06:44you know, invites a problem,
- 06:46I think. But we're moving
- 06:47in that direction. So, actually,
- 06:48I I chair the committee
- 06:50for establishing a standard, process
- 06:52for privileging across the hospital
- 06:54and the delivery networks.
- 06:58That that committee has been
- 07:00in, together for about five
- 07:01years now,
- 07:03but I think we are
- 07:04close. I I I would
- 07:05expect that we there's probably
- 07:07privileging that that's gonna happen
- 07:08within the next six months
- 07:10or so. Now that I've
- 07:11said that, I've cursed it,
- 07:12but I think we are
- 07:13closer than than we ever
- 07:15have been. So there there
- 07:16should be a credentialing privileging
- 07:17process soon.
- 07:23The methodology
- 07:25for,
- 07:26the study that we that
- 07:27we did. So,
- 07:30we developed an EPA or
- 07:32intractable professional activity
- 07:34framework
- 07:34and instrument to use. That
- 07:36process was guided by a
- 07:38panel of experts in point
- 07:40of care ultrasound
- 07:41and, medical education,
- 07:44and it followed a very
- 07:44standardized way to create,
- 07:47create an EPA.
- 07:48The tool we created, the
- 07:49instrument we created is online
- 07:52so learners access it on
- 07:53their phones,
- 07:54so it can be used
- 07:55in in real time in
- 07:56the workplace.
- 07:57Then we trained a group
- 07:59of,
- 08:00ultrasound experts to become assessors
- 08:02for us,
- 08:03so that they can do
- 08:04the assessments with our, with
- 08:06our learners
- 08:07at the bedside.
- 08:09And then we evaluated the
- 08:10framework and the and the
- 08:12instrument that we're using for
- 08:13sources of, evidence of validity,
- 08:15reliability, and and feasibility.
- 08:21The EPA that we that
- 08:22we came up with is
- 08:24this, assessing the acutely ill
- 08:26patient using point of care
- 08:27ultrasound,
- 08:28and the scale that we
- 08:30use as our,
- 08:32assessment
- 08:34assessment assessment scale is up
- 08:36there. So
- 08:37with entrustable professional activities,
- 08:40the the key cutoff is
- 08:42where is somebody
- 08:43the level at which somebody
- 08:44can be entrusted to perform
- 08:46the activity
- 08:47by themselves in an unsupervised
- 08:49way. In our on our
- 08:51scale, that is level four,
- 08:52allowed to practice the EPA
- 08:54unsupervised.
- 08:55And between level one to
- 08:57four, there's there's there's a
- 08:59gradation.
- 09:00What's nice about this tool
- 09:02is that at each level,
- 09:03it really directs the feedback
- 09:05that the learner needs to
- 09:07advance to the next step.
- 09:08So it it becomes an
- 09:10important way to to to
- 09:11track competency, but also
- 09:13to,
- 09:15to make sure that the
- 09:16feedback that's given is, is
- 09:18the right feedback for where
- 09:19the learner is on their
- 09:20competency pathway.
- 09:25So
- 09:26skipping some steps because I
- 09:27I just wanted to get
- 09:28to to really what's the
- 09:29focus of of today, which
- 09:31is reliability testing. So one
- 09:34source of validity evidence when
- 09:36we're thinking about,
- 09:38developing a tool is,
- 09:40is reliability. And when we
- 09:41think about reliability, what we're
- 09:43we're asking is are the
- 09:45measures consistent across,
- 09:47different workplace conditions and across
- 09:49different assessors and learners?
- 09:54Another way to think about
- 09:55re reliability testing is how
- 09:57close is the observed score
- 09:59to the true score. Right?
- 10:01How close
- 10:02is my observations
- 10:03of competence?
- 10:05How close is that to
- 10:06the learner's true competence?
- 10:09If you're prefer to think
- 10:11about in terms of formula,
- 10:12you see the formula on
- 10:13the on the screen there,
- 10:14observed score equals true score
- 10:16plus some some error in
- 10:17our measurement. Right? We can
- 10:19never really get to the
- 10:20true, the true score. There's
- 10:22always some error that we
- 10:23wanna try and understand
- 10:25and minimize.
- 10:29The classical approach to reliability
- 10:32testing,
- 10:35really looks at,
- 10:36or focuses on one source
- 10:38of errors. So studies are
- 10:39designed to look at things
- 10:40like interrater reliability or intercase
- 10:43reliability or internal consistency alpha
- 10:45sicknesses, the Cronbach alpha that
- 10:47you're probably familiar with.
- 10:52The challenge with that, though,
- 10:53is that in medical education
- 10:55and the assessments that we
- 10:56do, there's there's more than
- 10:58just one source of error
- 11:00that we have to worry
- 11:01about.
- 11:03There's multiple potential sources of
- 11:04error. And so in reality,
- 11:07we have to move from
- 11:09that,
- 11:10that classical
- 11:12formula.
- 11:13And we have to consider,
- 11:15you know, what is the
- 11:16error that we can attribute
- 11:17to
- 11:18the learner?
- 11:19What is the error that
- 11:20we can attribute to the
- 11:21raters? Some raters are more
- 11:22lenient. Some are more strict.
- 11:24Some know the,
- 11:26some know the learner and
- 11:27that influence the scores.
- 11:28We have to,
- 11:29think about error attributed to
- 11:32the clinical case. Are there
- 11:34differences in difficulty between the
- 11:36cases that the that the
- 11:37learners are being assessed on?
- 11:38And all of those factor
- 11:40into
- 11:40that error value.
- 11:42And so in reality, what
- 11:44we really need our formula
- 11:46to look like is this.
- 11:47So our observed score equals
- 11:49the true score plus multiple
- 11:51sources of error. How do
- 11:52we get to evaluating what
- 11:54those sources of of error
- 11:56are and what the relative
- 11:58contributions are to the overall
- 12:00error number.
- 12:04And that's and we I
- 12:06was stuck there for a
- 12:07long time. We had collected
- 12:09our data,
- 12:11and I was, you know,
- 12:12really trying to move forward.
- 12:14And the problem was there
- 12:16just wasn't the expertise to
- 12:17to run the studies that
- 12:18we needed to run, at
- 12:20least that that I could
- 12:21find after,
- 12:22a lot of a lot
- 12:23of emails and communications around
- 12:25this, trying to find somebody
- 12:27to to run the studies
- 12:28that we needed to do
- 12:29to to get to this
- 12:31multiple sources of of error.
- 12:33And it's a type of
- 12:34of analysis that's called the
- 12:35generalizability
- 12:36theory.
- 12:39Fortunately,
- 12:40two things happen.
- 12:42One,
- 12:43Donna Windisch in the department
- 12:45started,
- 12:46started the Department of Medicine
- 12:48educational
- 12:49grant.
- 12:50That came out about the
- 12:51same time as,
- 12:53as I was in the
- 12:53the struggle to to do
- 12:54this analysis.
- 12:56And I was introduced to
- 12:57Haidong Lu, who's who's here
- 13:00today as well. And,
- 13:02with the the funding support,
- 13:03I was able to connect
- 13:05with Haidong, and, and we
- 13:06were able to to plan
- 13:08together and and,
- 13:10he he became my my
- 13:12expert for for getting this
- 13:13done and was really the,
- 13:15the
- 13:16the the key piece to
- 13:18to be able to move
- 13:18this, this forward. So I'm
- 13:21extremely, extremely grateful, both for
- 13:23the educational research grant and
- 13:25for, for.
- 13:28And so what what he
- 13:29was able to do is
- 13:30this analysis called, generalizability
- 13:34theory.
- 13:35And what this does is
- 13:37it,
- 13:39it tries to,
- 13:41to distill down the various
- 13:43sources of error that could
- 13:45be contributing to our overall
- 13:46reliability
- 13:48and,
- 13:49figure out the the relative
- 13:51contributions of each. So within
- 13:53this,
- 13:53this framework of this analysis,
- 13:56we see that there are
- 13:57effects,
- 13:58otherwise known as as facets.
- 14:00These are the potential sources
- 14:02of error as we're,
- 14:04as we're performing our assessment.
- 14:05So we see things on
- 14:06there like the learner,
- 14:08the rater.
- 14:09The syndrome refers to,
- 14:12within our EPA, students are
- 14:14or learners are evaluating the
- 14:16dyspnic patient, the patient with
- 14:17abdominal distension, the patient with
- 14:19hypotension. So various syndromes that
- 14:21they're, they're evaluating.
- 14:23And there are
- 14:24interactions between these things as
- 14:26well. So there are interactions
- 14:27between the learner and the
- 14:29rater, the learner and the
- 14:29syndrome, the rater and and
- 14:31on and on and
- 14:33on. And the idea is
- 14:34to try and get to
- 14:35how much are each of
- 14:36these contributing to the overall
- 14:39error. And,
- 14:41we call that the percent
- 14:42variance. So if we think
- 14:44about,
- 14:44there is
- 14:46an absolute number that is
- 14:48that error, and within that
- 14:49absolute number, there are contributions
- 14:51from each one of these
- 14:52things. How much does each
- 14:53of these contribute to that
- 14:54error number? And it also
- 14:56gives us a measure of
- 14:58reliability, and we're gonna circle
- 15:00back to this,
- 15:01because one of
- 15:03the the powers of this,
- 15:05assessment technique is it allows
- 15:06us to do what's called
- 15:07the decision study
- 15:09where we can estimate,
- 15:11how many observations or how
- 15:13many raters do we need
- 15:14to achieve a certain level
- 15:15of reliability,
- 15:17which really helps us to
- 15:18optimize our processes
- 15:20of assessment moving forward.
- 15:22So we're just gonna take
- 15:22a quick peek at,
- 15:24at each of these and
- 15:25talk briefly about, some of
- 15:26the the the main effects.
- 15:29So we looked at learner
- 15:30variance. And for medical education
- 15:33studies, what you really wanna
- 15:34see is that the learner
- 15:35variance is high. You want
- 15:38the error attributable to differences
- 15:41in the learner,
- 15:42different skill sets, different, degrees
- 15:45of competence.
- 15:47A high
- 15:48learner variance is tells you
- 15:50that you are able to
- 15:51accurately
- 15:52discriminate between differences in competency
- 15:55between your your your learners.
- 15:57And one of the things
- 15:58I had to kind of
- 15:59wrap my head around was,
- 16:00well, what is what is
- 16:01high?
- 16:03You know, this number, twenty
- 16:04seven point seven,
- 16:06felt low when it came
- 16:07out. As it turns out,
- 16:08that's actually,
- 16:09quite a robust number for
- 16:10this type of study. And
- 16:12so when we're looking at
- 16:13numbers above twenty five percent,
- 16:16that's actually considered,
- 16:17quite good for,
- 16:19for a medical education reliability
- 16:22study. So we're we're quite
- 16:23pleased with our learner variance.
- 16:29We looked at rater variance.
- 16:31So this is the idea
- 16:32of, can some of that
- 16:34error term or or how
- 16:35much of that error term
- 16:36is attributed to just difference
- 16:37in how the raters are
- 16:38scoring.
- 16:39And that could be, as
- 16:41we know, some some of
- 16:42us are very strict when
- 16:43we evaluate our learners. Some
- 16:45of us are very lenient
- 16:47when we evaluate our learners.
- 16:49There's also the element of
- 16:50we're using EPAs, and and
- 16:52that's that's a newer way
- 16:54of assessment. And so,
- 16:56you know, how well did
- 16:57our raters understand this tool
- 16:59that we're that we're using?
- 17:01We train them. We we
- 17:02would hope that they would
- 17:02understand it well, but,
- 17:04but did they?
- 17:05Ideally, we want this portion
- 17:07of the variance to be
- 17:08quite small. We don't want,
- 17:10the the, a large portion
- 17:12of the error being attributed
- 17:14to the raters. And for
- 17:15us, the number was sixteen
- 17:17point five percent.
- 17:19And,
- 17:20boy, I was happy because
- 17:21that seemed really low. But
- 17:23as it turns out, sixteen
- 17:25point five is it's not
- 17:26high or low. It's right
- 17:27in the middle. I would
- 17:28call it a modest contribution
- 17:30to,
- 17:31to the error value.
- 17:33And what's nice about this
- 17:34is it really points us
- 17:35in a direction to say,
- 17:37you know, where can we
- 17:38improve in our assessment methodology
- 17:40and gives us a target
- 17:42for that, perhaps more training
- 17:43of our of our raters.
- 17:45Carrie, did you have a
- 17:45question?
- 17:46In this data set Yeah.
- 17:49How many
- 17:50rate
- 17:51learner have? Yeah. It's a
- 17:53good question. There was a
- 17:54range. There was, six hundred
- 17:56and four assessments that were
- 17:58done by
- 17:59I think our final number
- 18:00was fifteen
- 18:02different
- 18:03raters.
- 18:04And there was variability in
- 18:06terms of
- 18:07how many,
- 18:08how many assessments were done
- 18:10by each rater. I don't
- 18:11have off the top of
- 18:11my head what the average
- 18:12number of
- 18:14assessments per rater was.
- 18:16But the the analysis,
- 18:19factors
- 18:20factors that in. How? I
- 18:22don't I'd have to ask.
- 18:23I don't agree with her
- 18:24to to get into go
- 18:25into the depths with them.
- 18:27Like, if I No. No.
- 18:29Each learner has,
- 18:31has encounters with multiple raters.
- 18:33Yeah. Yeah.
- 18:39And then the last, the
- 18:40last of the effects that
- 18:41I'll I'll highlight is, is
- 18:43case variance.
- 18:44And this is really looking
- 18:45at how much of the
- 18:46variance is due to difficulties
- 18:48in in case
- 18:50variability or case, case difficulty.
- 18:52And, ideally, you want this
- 18:53to be to be quite
- 18:55low.
- 18:57That number,
- 18:59of one percent looks low
- 19:00and and is low, so
- 19:01we were actually quite happy
- 19:02with,
- 19:03with our our case variance.
- 19:05To be honest, I was
- 19:05I was a bit surprised
- 19:07because there's such a range
- 19:08of different clinical syndromes that
- 19:10the,
- 19:12that the the residents were
- 19:13were seeing. I have some
- 19:14theories around why it might
- 19:16be low,
- 19:17such as,
- 19:18it's really the the the
- 19:20difficulty is in the the
- 19:21use of the ultrasound, not
- 19:23in the approach to the
- 19:24to the patient. The the
- 19:25residents have a certain skill
- 19:26level with the the patients.
- 19:28The new skill is the
- 19:29ultrasound, and so residents of
- 19:30a certain level of competence
- 19:32with ultrasound are gonna score
- 19:34the same regardless of,
- 19:36of the patient that that's
- 19:37in front of them. And
- 19:38that's that's,
- 19:40my assessment of why that
- 19:41number is so low.
- 19:45As I mentioned before, one
- 19:46of the the powerful parts
- 19:48of the generalizable
- 19:50theory analysis is that it
- 19:52can,
- 19:53lead to what's called a
- 19:54decision study. And decision study
- 19:56allows us to predict
- 19:59the
- 20:00the reliability
- 20:01of the assessments
- 20:02for varying levels of effect
- 20:04or or facets. And so
- 20:05in this hypothetical,
- 20:08dataset here,
- 20:10we can say, how much
- 20:12does the reliability
- 20:15estimate change if we keep
- 20:16the number of raters the
- 20:18same,
- 20:19but we increase this is
- 20:21an OSCE, but we increase
- 20:22the number of stations within
- 20:24the OSCE. And we see
- 20:25that by increasing the number
- 20:27of stations, you actually get
- 20:28a a nice jump in
- 20:30your reliability. And our thresholds
- 20:32for reliability
- 20:33here,
- 20:34for most most clinical
- 20:37items, you want a reliability
- 20:38of point seven or point
- 20:39eight. And And so by
- 20:40increasing the number of stations,
- 20:41we're able to get the
- 20:43the these authors were able
- 20:44to get the reliability up
- 20:45to over, over point eight.
- 20:48You might ask the questions,
- 20:49well, what happens if we
- 20:50increase the number of raters
- 20:51instead of increasing the number
- 20:52of stations? Can we improve
- 20:53our our reliability that way?
- 20:55And going from two raters
- 20:57to eight raters really didn't
- 20:58make a meaningful impact in
- 21:00reliability. And and so you
- 21:01can take this and you
- 21:02can say, alright. Well, if
- 21:03we're designing an assessment
- 21:04tool and assessment process, really,
- 21:06we wanna put our focus
- 21:08on,
- 21:09the number of observations or
- 21:10the number of stations. And
- 21:11and so that's just an
- 21:12example of sort of how
- 21:14decision study can be can
- 21:15be utilized. Yeah.
- 21:17About that.
- 21:18That would suggest to me
- 21:19that the variability is largely
- 21:21in a rater than across
- 21:23raters.
- 21:24Is that correct on you?
- 21:29In fact, it doesn't I
- 21:31mean, you would think it
- 21:32it there's a lot of
- 21:33variability among rater. Yeah.
- 21:35Some are really conservatives. Right.
- 21:38Then you expect
- 21:39increasing the number of raters
- 21:41would have a substantial effect
- 21:42would have an impact. Averaging
- 21:43of that. Yeah. So I
- 21:45would agree with you. I
- 21:45would say that this in
- 21:47this particular this isn't my
- 21:48data. This is a hypothetical
- 21:49dataset
- 21:50that,
- 21:51there
- 21:52probably wasn't a lot of
- 21:53variability amongst the raters, and
- 21:54so adding more raters didn't
- 21:56make a didn't make a
- 21:57difference in terms of reliability.
- 21:59Well, but and
- 22:00is this consistent with the
- 22:01numbers you showed us before
- 22:02for the percentage of variability
- 22:03was attributable to the raters?
- 22:05No. And so I'll I'll
- 22:06show you what it looked
- 22:07like for our data. I've
- 22:09just this was just this
- 22:09is just a hypothetical just
- 22:11to make the point of
- 22:12what sort of what decision
- 22:13studies can do if we
- 22:14if we if we change
- 22:16the different elements.
- 22:20I just got a text.
- 22:21Please repeat the question. In
- 22:23general
- 22:24Oh, okay.
- 22:25I think the microphone's not
- 22:26working. Just when you get
- 22:28a question, just repeat it
- 22:30so online people can hear
- 22:31it. Okay. We'll we'll we'll
- 22:33do that moving forward.
- 22:36So this is,
- 22:37this is this is our
- 22:39our data.
- 22:41And this is the final
- 22:42product
- 22:43of our data, meaning this
- 22:45was the,
- 22:46this is the data that
- 22:48seemed to improve,
- 22:50reliability
- 22:52best.
- 22:53And
- 22:54what it what it came
- 22:55out with, it is really
- 22:57the number of observations
- 22:58made the biggest impact on
- 23:01moving our
- 23:03reliability,
- 23:05curve towards,
- 23:06towards that point eight. We
- 23:07chose the higher value point
- 23:09eight rather than point seven
- 23:10as our as our cutoff.
- 23:12And what it looked like
- 23:13is to get our you
- 23:15know, given the parameters, keeping
- 23:16everything else stable, and just
- 23:17changing the number of observations,
- 23:21getting our observations up to
- 23:22about ten gives us a
- 23:25reliability
- 23:26to a level point eight
- 23:27to understanding
- 23:28where our learners are with
- 23:30their pocus competency. Doesn't mean
- 23:32ten observations and your learner
- 23:34is competent in pocus. It
- 23:36means after ten observations,
- 23:38I can reliably
- 23:39understand
- 23:40what level that they are
- 23:42at.
- 23:44So that's that's quite useful
- 23:46for helping us to to
- 23:47understand sort of sort of
- 23:48next steps.
- 23:50The limitations of the of
- 23:51our of our work, I
- 23:53think the main limitation, we
- 23:54did it across three large
- 23:56academic hospitals. So it was,
- 23:58it was us, it was
- 23:59MGH, and it was, OHSU
- 24:01that were, part of it.
- 24:03Most of the observations came
- 24:04from us here at Yale.
- 24:07I think,
- 24:08I think that
- 24:10influences the generalizability
- 24:11of of what we're what
- 24:13we're putting out there. How,
- 24:14you know, how would a
- 24:15tool like this work at
- 24:16a smaller program, someplace that
- 24:18it does not have such,
- 24:19robust, point of care ultrasound,
- 24:22expertise,
- 24:24unclear.
- 24:25And it's mostly in an
- 24:26inpatient setting. How does this
- 24:27translate to an outpatient setting,
- 24:29where ultrasound is also being
- 24:31used, also unclear.
- 24:34So conclusions and and next
- 24:36steps,
- 24:37the,
- 24:38you know, the within the
- 24:40study, we were able to
- 24:41generate validity,
- 24:43and feasibility
- 24:44evidence to support,
- 24:45what is a a a
- 24:47very novel,
- 24:48approach to looking at point
- 24:49of care ultrasound,
- 24:51competency.
- 24:52We
- 24:53need to put more time,
- 24:54I think, into rater training,
- 24:56to make sure that raters
- 24:58are being consistent in their
- 25:00in their assessments,
- 25:01of of the learners, which
- 25:03probably means,
- 25:04both
- 25:05reorienting them to EPAs and
- 25:07making sure they feel comfortable
- 25:08with that and and probably
- 25:10doing some calibration training to
- 25:12make sure that my level
- 25:13three is the same as
- 25:14your level three, etcetera.
- 25:17When you find the outlier
- 25:18using these data, can you
- 25:19find the people that's
- 25:21find the raters who are
- 25:22giving everybody Yeah. Who yeah.
- 25:24We probably can. Yeah. We
- 25:26probably could probably jump in
- 25:27and and figure out, like,
- 25:28who, like, pinpoint who who
- 25:29really who really needs the
- 25:31help.
- 25:32But I guess it's, you
- 25:33know, it's it's challenging because
- 25:34you always gotta say, like,
- 25:35what's your like, who's the
- 25:36standard, I guess, that you
- 25:37would compare to. So,
- 25:40maybe it's me. Maybe me.
- 25:41I'm too lenient or too
- 25:42strict. I don't know. So
- 25:44it'd be interesting to to
- 25:45think about. That might be
- 25:45another another study that we
- 25:47look at. We'll group the
- 25:48standard. I mean, basically, what
- 25:50you you do is predict
- 25:52the score based on the
- 25:53rater identification.
- 25:54Interesting.
- 25:55People who have higher than
- 25:57average scores, you can do
- 25:58that. People who have lower
- 25:59than average scores, you can
- 26:00do that. Yeah. Nice. So
- 26:02the the the question for
- 26:03the, for the Zoom room,
- 26:06was about using the data
- 26:08to,
- 26:09to predict who who are
- 26:10the more lenient or the
- 26:11more strict,
- 26:12raters.
- 26:14And, doctor Justice was just
- 26:16giving some some tips on
- 26:17how we might, might design
- 26:18that.
- 26:21I think one of the
- 26:22things that that I'm interested
- 26:23in thinking about is, you
- 26:24know, particularly as we're working
- 26:25on this privileging process at
- 26:27the hospital for point of
- 26:28care ultrasound is is thinking
- 26:30about using this as a
- 26:31tool, for more summative level
- 26:33decision making,
- 26:35around,
- 26:36around the privileging,
- 26:37process,
- 26:38here at here at Yale.
- 26:43Some
- 26:44thank you. So,
- 26:46Janet and John, I did
- 26:47this work as part of
- 26:48my masters of health
- 26:50science, Donna in the department,
- 26:52for making the the grant
- 26:54available.
- 26:56David and and Jeanette just,
- 26:58just master
- 26:59mentors and and really encouraging
- 27:02and,
- 27:02and facilitating my interaction with,
- 27:05with Haidong Liu, which is
- 27:06really what what made this
- 27:07project move, move forward,
- 27:10and then, the team of,
- 27:12of researchers
- 27:13that I was able to
- 27:14work with.
- 27:15Alright. That's it. Matt.
- 27:17Yeah. That's really great. Thank
- 27:18you.
- 27:20So I had a couple
- 27:21questions. One was,
- 27:23are there other at the
- 27:24hospital level, in terms of
- 27:25privileging,
- 27:27is there anything analogous to
- 27:28this sort of
- 27:30level of really assessing like,
- 27:32a heart transplant is probably
- 27:33more competency assessment than a
- 27:35heart transplant,
- 27:37for surgeon, I would think.
- 27:39Yeah. So and ask question
- 27:41one was, is is there
- 27:42something comparable to this,
- 27:44type of assessment in in
- 27:46other areas of privileging?
- 27:48And then do you have
- 27:49second question too? Second one
- 27:51was, I know it's very
- 27:52different, but you was there
- 27:53anything useful
- 27:54in the radiology world
- 27:57in terms of how competency
- 27:59is assessed
- 28:00for either technicians or ultrasonographer
- 28:03radiologists?
- 28:04Yeah. And then the second
- 28:05question was there is there
- 28:06anything comparable in the in
- 28:07the radiology world?
- 28:09So so the first question,
- 28:11there's nothing comparable that I'm
- 28:12aware of in within privileging.
- 28:15If you're, you know, for
- 28:16example,
- 28:18privileging for,
- 28:19you know,
- 28:20if you're a heart surgeon
- 28:21to do a heart transplant
- 28:22is really kind of number
- 28:23of cases that you've done
- 28:25in graduating from a, you
- 28:26know, an accredited program or
- 28:28you did your fellowship in
- 28:30x y or x y
- 28:31or z.
- 28:34A lot of training works
- 28:35that way. I think it's
- 28:36probably more comparable to sort
- 28:38of how we
- 28:40we privilege around procedures where
- 28:42it's like you have to
- 28:43do, you know, five
- 28:45central lines, and then you're
- 28:46you're magically competent in, in
- 28:49that,
- 28:50which which creates a real
- 28:51problem. So, you know, a
- 28:52lot of hospital systems
- 28:54use, like, a number based
- 28:56algorithm for deciding who's privileged
- 28:57or not. So you've done
- 28:59fifty cardiac studies. Now you're
- 29:01privileged. But the number
- 29:04definitely does not tell the
- 29:05story.
- 29:06I work with, with trainees.
- 29:08Some have done fifty cardiac
- 29:09studies, and they're great. And
- 29:11I work with others that
- 29:12have done fifty, and they
- 29:13really still stink. And so
- 29:14the number
- 29:15but there's always a feasibility
- 29:16element, you know, for the,
- 29:17you know, like, the credentialing
- 29:19committee where where they have
- 29:20to say, like,
- 29:22you know, if it gets
- 29:22too complicated,
- 29:23it it it gets unmanageable
- 29:25for them to do. So
- 29:26numbers make it very simple.
- 29:27Alright? I I can check
- 29:28the box. They've done x
- 29:30number and therefore you're you're
- 29:31privileged,
- 29:33which may work for for
- 29:34privileging. I I think if
- 29:36we run a true assessment
- 29:37of competency though, we we
- 29:39have to take a more
- 29:40holistic
- 29:41way of, of looking at
- 29:42that. And then,
- 29:45nothing
- 29:46from the radiology
- 29:47world.
- 29:49I think also because
- 29:51you you finish your residency
- 29:53in in radiology and and
- 29:55then you are privileged to
- 29:56to be a a radiologist.
- 29:57And and so I don't
- 29:59know that they're necessarily
- 30:00faced with this with this
- 30:02problem.
- 30:03And they have a whole
- 30:04residency
- 30:05to to learn this stuff,
- 30:06whereas we're trying to say
- 30:07how quickly can I get
- 30:08somebody from, you know, being
- 30:10a novice to an expert
- 30:11so they can start using
- 30:12this in clinical practice?
- 30:15I'll be mindful of the
- 30:16fact that you mentioned you
- 30:17had
- 30:19a obligation. So,
- 30:21Thank you.
- 30:22If you have questions, follow-up
- 30:23that. I'm gonna
- 30:25maybe I should tell people.
- 30:26Oh, great. Yeah. So,
- 30:28if there are more more
- 30:29questions,
- 30:31please feel free to to
- 30:32email me.
- 30:33Happy to to answer things
- 30:35over over email as well.
- 30:36I will. Great. Thanks.
- 30:42Great testaments to finding the
- 30:44interest.