Skip to Main Content

“Competency Assessment - Are Your Measures Reliable?”

March 26, 2026

“Competency Assessment - Are Your Measures Reliable?”

Joseph Donroe MD, MPH, MHS - Yale School of Medicine

March 19, 2026

Yale GIM “Research in Progress” Meeting Presented by: Yale School of Medicine’s Department of Internal Medicine, Section of General Internal Medicine

ID
14007

Transcript

  • 00:00Yeah.
  • 00:04Alright. Well, welcome everybody.
  • 00:07Thank you for coming to,
  • 00:09noon conference, general medicine.
  • 00:12Today's,
  • 00:14CME CME code is five
  • 00:16five nine zero one.
  • 00:21Okay. So upcoming,
  • 00:23retreats. Next one will be
  • 00:25the education retreat on the
  • 00:27West Campus.
  • 00:28Stay tuned.
  • 00:30Important
  • 00:31but,
  • 00:33familiar.
  • 00:34F DAC reminders, please be
  • 00:35on the lookout for your
  • 00:37next steps, which
  • 00:38most likely most likely will
  • 00:40be including meeting with your
  • 00:42mentors
  • 00:43or your delegates.
  • 00:48Research
  • 00:48in progress,
  • 00:51I guess the next one,
  • 00:52March twenty sixth will be
  • 00:53Nate Wood.
  • 00:55No. That's not okay. Got
  • 00:56it. Sorry.
  • 00:57Grant Rounds, seven thirty AM
  • 00:59at Nate Wood. Food insecurity
  • 01:01and culinary medicine,
  • 01:03followed by,
  • 01:04noon conference,
  • 01:06next week. Ben Amba,
  • 01:08speaking about the impact of
  • 01:10the current landscape
  • 01:12on GME
  • 01:13nationwide.
  • 01:16Disclosure and accreditation.
  • 01:20So I'm excited to introduce,
  • 01:23Joe Donrow, who is going
  • 01:25to be joining us. So
  • 01:26the original plan for today
  • 01:27was to have two thirty
  • 01:29minute time slots. Our second,
  • 01:31presenter is unfortunately unavailable at
  • 01:33the last minute. So we
  • 01:35have one thirty minute time
  • 01:36slot, but we're really glad
  • 01:37to have,
  • 01:38Joe joining us. Joe started
  • 01:41his training career at Tufts
  • 01:42where he learned his MD
  • 01:44earned his MD and Miles
  • 01:45per hour, spent two years
  • 01:47in Peru,
  • 01:48before heading back north to
  • 01:50New Haven,
  • 01:51to, complete his med peds,
  • 01:53residency and chief resident. And
  • 01:55he's here now on faculty
  • 01:56where he focuses on
  • 01:59teaching clinical skills,
  • 02:01clinical education overall, and also
  • 02:03taking care of patients with,
  • 02:05addiction.
  • 02:06He's done a lot of
  • 02:07work on,
  • 02:09teaching pocus to our house
  • 02:11staff and others. And here
  • 02:13is, going to be speaking
  • 02:14about, more generally competency assessment
  • 02:17and are your measures reliable.
  • 02:19So thank you.
  • 02:22Alright. Thanks for the opportunity
  • 02:23to come and talk.
  • 02:27It's,
  • 02:28it's about competency assessment, but
  • 02:30it's it's gonna just closely
  • 02:32parallel a study that, that
  • 02:34we
  • 02:35did here and and recently,
  • 02:37completed. So,
  • 02:39actually,
  • 02:40when I was approached to
  • 02:41do this, it it it
  • 02:42was a research in in
  • 02:43progress.
  • 02:45I'm happy to say now
  • 02:46that it's,
  • 02:48we've been published this week,
  • 02:50with a group of
  • 02:52authors from from here and
  • 02:54and elsewhere.
  • 02:55And, I can
  • 02:57I can pause for a
  • 02:57second to hit that QR
  • 02:59code and download it and
  • 03:00get the metrics up a
  • 03:01little a little bit?
  • 03:08I'll go through the the
  • 03:09the background and the the
  • 03:11methodology
  • 03:11of of what we did,
  • 03:14briefly,
  • 03:15because we only have twenty
  • 03:16five minutes or so.
  • 03:18I really wanted to try
  • 03:19and focus the
  • 03:20the conversation
  • 03:22around,
  • 03:23a form of reliability
  • 03:25testing that's called generalizability
  • 03:27theory and, decision study,
  • 03:31and give a brief overview
  • 03:33of that and how we
  • 03:34interpreted it in the in
  • 03:36the context of our of
  • 03:37our study. And my disclaimer
  • 03:38is I am I am
  • 03:40not an expert in these,
  • 03:42this analytic,
  • 03:44technique.
  • 03:44And it was actually really
  • 03:46hard to find, expertise
  • 03:48to to move our our
  • 03:49project forward, but I'll I'll
  • 03:50circle back to that in
  • 03:51a in a moment.
  • 03:55So some,
  • 03:56background information.
  • 03:58My interest in this stems
  • 03:59from the work I do
  • 04:00in, in terms of leading
  • 04:02the point of care ultrasound
  • 04:04programs for internal medicine,
  • 04:07training
  • 04:07residents and and training faculty
  • 04:10to use this as a
  • 04:10tool in, in the clinical
  • 04:12environment.
  • 04:13Point of care ultrasound
  • 04:15is the utilization
  • 04:16of ultrasound at the point
  • 04:18of care
  • 04:19by the treating physician. And
  • 04:21so we use it to
  • 04:22help diagnose
  • 04:23and manage,
  • 04:25and in contrast to comprehensive
  • 04:27ultrasound,
  • 04:27this is used to really
  • 04:29address very focused problems.
  • 04:35What's
  • 04:37what's been lagging
  • 04:38as Point of Georgetown becomes
  • 04:40more and more popular amongst
  • 04:43medical schools, amongst residents, and
  • 04:44amongst faculty,
  • 04:46The utilization of Pocus
  • 04:48is increasing,
  • 04:50at a tremendous rate. However,
  • 04:51our ability to understand, are
  • 04:53people actually competent to use
  • 04:55this tool?
  • 04:56That is lagging way behind.
  • 04:58We don't have very good
  • 04:59measures,
  • 05:00to be able to do
  • 05:01that, especially at that top
  • 05:03part of Miller's pyramid,
  • 05:05where it's really,
  • 05:07competency in action. You know,
  • 05:09how is the learner actually
  • 05:10performing in the clinical, clinical
  • 05:12arena? And so
  • 05:17we formed a research question
  • 05:19around this this existing gap,
  • 05:22and the question became,
  • 05:24what is the validity evidence
  • 05:25supporting the use of an
  • 05:26entrustable professional activity
  • 05:29framework
  • 05:30to assess point of care
  • 05:31ultrasound competency
  • 05:33in internal medicine
  • 05:35learners.
  • 05:37Yeah.
  • 05:39What is the state of
  • 05:40care at
  • 05:42The level of certification of
  • 05:44what's required for someone to
  • 05:45roll out the focus machine
  • 05:47in their own practice, I
  • 05:48guess? That's the level. Yeah.
  • 05:50Yeah. It's,
  • 05:51it's it's a little bit
  • 05:52of, of the wild west.
  • 05:56Right now,
  • 05:57most,
  • 05:59departments at Yale do not
  • 06:00have a a privileging
  • 06:02mechanism,
  • 06:03for point of care ultrasound.
  • 06:04There are a few that
  • 06:05do. Emergency medicine,
  • 06:07does.
  • 06:08Surprisingly, you know, groups like
  • 06:10Palm Crit do not.
  • 06:12Internal medicine does not.
  • 06:14And so as these are
  • 06:15being used more and more,
  • 06:17they're being used in the
  • 06:18absence of a privileging process.
  • 06:20In the absence of privilege
  • 06:22process for the hospital, there's
  • 06:23no formal
  • 06:25credentialing process either for which
  • 06:27to verify,
  • 06:28competency.
  • 06:29And so it's a lot
  • 06:30of, sort of up to
  • 06:32the professional to make a
  • 06:33decision on whether or not
  • 06:34they feel comfortable using that
  • 06:37in the in the clinical
  • 06:37arena.
  • 06:38And as we as we
  • 06:40know, clinicians are not always
  • 06:41the best self assessors,
  • 06:43which,
  • 06:44you know, invites a problem,
  • 06:46I think. But we're moving
  • 06:47in that direction. So, actually,
  • 06:48I I chair the committee
  • 06:50for establishing a standard, process
  • 06:52for privileging across the hospital
  • 06:54and the delivery networks.
  • 06:58That that committee has been
  • 07:00in, together for about five
  • 07:01years now,
  • 07:03but I think we are
  • 07:04close. I I I would
  • 07:05expect that we there's probably
  • 07:07privileging that that's gonna happen
  • 07:08within the next six months
  • 07:10or so. Now that I've
  • 07:11said that, I've cursed it,
  • 07:12but I think we are
  • 07:13closer than than we ever
  • 07:15have been. So there there
  • 07:16should be a credentialing privileging
  • 07:17process soon.
  • 07:23The methodology
  • 07:25for,
  • 07:26the study that we that
  • 07:27we did. So,
  • 07:30we developed an EPA or
  • 07:32intractable professional activity
  • 07:34framework
  • 07:34and instrument to use. That
  • 07:36process was guided by a
  • 07:38panel of experts in point
  • 07:40of care ultrasound
  • 07:41and, medical education,
  • 07:44and it followed a very
  • 07:44standardized way to create,
  • 07:47create an EPA.
  • 07:48The tool we created, the
  • 07:49instrument we created is online
  • 07:52so learners access it on
  • 07:53their phones,
  • 07:54so it can be used
  • 07:55in in real time in
  • 07:56the workplace.
  • 07:57Then we trained a group
  • 07:59of,
  • 08:00ultrasound experts to become assessors
  • 08:02for us,
  • 08:03so that they can do
  • 08:04the assessments with our, with
  • 08:06our learners
  • 08:07at the bedside.
  • 08:09And then we evaluated the
  • 08:10framework and the and the
  • 08:12instrument that we're using for
  • 08:13sources of, evidence of validity,
  • 08:15reliability, and and feasibility.
  • 08:21The EPA that we that
  • 08:22we came up with is
  • 08:24this, assessing the acutely ill
  • 08:26patient using point of care
  • 08:27ultrasound,
  • 08:28and the scale that we
  • 08:30use as our,
  • 08:32assessment
  • 08:34assessment assessment scale is up
  • 08:36there. So
  • 08:37with entrustable professional activities,
  • 08:40the the key cutoff is
  • 08:42where is somebody
  • 08:43the level at which somebody
  • 08:44can be entrusted to perform
  • 08:46the activity
  • 08:47by themselves in an unsupervised
  • 08:49way. In our on our
  • 08:51scale, that is level four,
  • 08:52allowed to practice the EPA
  • 08:54unsupervised.
  • 08:55And between level one to
  • 08:57four, there's there's there's a
  • 08:59gradation.
  • 09:00What's nice about this tool
  • 09:02is that at each level,
  • 09:03it really directs the feedback
  • 09:05that the learner needs to
  • 09:07advance to the next step.
  • 09:08So it it becomes an
  • 09:10important way to to to
  • 09:11track competency, but also
  • 09:13to,
  • 09:15to make sure that the
  • 09:16feedback that's given is, is
  • 09:18the right feedback for where
  • 09:19the learner is on their
  • 09:20competency pathway.
  • 09:25So
  • 09:26skipping some steps because I
  • 09:27I just wanted to get
  • 09:28to to really what's the
  • 09:29focus of of today, which
  • 09:31is reliability testing. So one
  • 09:34source of validity evidence when
  • 09:36we're thinking about,
  • 09:38developing a tool is,
  • 09:40is reliability. And when we
  • 09:41think about reliability, what we're
  • 09:43we're asking is are the
  • 09:45measures consistent across,
  • 09:47different workplace conditions and across
  • 09:49different assessors and learners?
  • 09:54Another way to think about
  • 09:55re reliability testing is how
  • 09:57close is the observed score
  • 09:59to the true score. Right?
  • 10:01How close
  • 10:02is my observations
  • 10:03of competence?
  • 10:05How close is that to
  • 10:06the learner's true competence?
  • 10:09If you're prefer to think
  • 10:11about in terms of formula,
  • 10:12you see the formula on
  • 10:13the on the screen there,
  • 10:14observed score equals true score
  • 10:16plus some some error in
  • 10:17our measurement. Right? We can
  • 10:19never really get to the
  • 10:20true, the true score. There's
  • 10:22always some error that we
  • 10:23wanna try and understand
  • 10:25and minimize.
  • 10:29The classical approach to reliability
  • 10:32testing,
  • 10:35really looks at,
  • 10:36or focuses on one source
  • 10:38of errors. So studies are
  • 10:39designed to look at things
  • 10:40like interrater reliability or intercase
  • 10:43reliability or internal consistency alpha
  • 10:45sicknesses, the Cronbach alpha that
  • 10:47you're probably familiar with.
  • 10:52The challenge with that, though,
  • 10:53is that in medical education
  • 10:55and the assessments that we
  • 10:56do, there's there's more than
  • 10:58just one source of error
  • 11:00that we have to worry
  • 11:01about.
  • 11:03There's multiple potential sources of
  • 11:04error. And so in reality,
  • 11:07we have to move from
  • 11:09that,
  • 11:10that classical
  • 11:12formula.
  • 11:13And we have to consider,
  • 11:15you know, what is the
  • 11:16error that we can attribute
  • 11:17to
  • 11:18the learner?
  • 11:19What is the error that
  • 11:20we can attribute to the
  • 11:21raters? Some raters are more
  • 11:22lenient. Some are more strict.
  • 11:24Some know the,
  • 11:26some know the learner and
  • 11:27that influence the scores.
  • 11:28We have to,
  • 11:29think about error attributed to
  • 11:32the clinical case. Are there
  • 11:34differences in difficulty between the
  • 11:36cases that the that the
  • 11:37learners are being assessed on?
  • 11:38And all of those factor
  • 11:40into
  • 11:40that error value.
  • 11:42And so in reality, what
  • 11:44we really need our formula
  • 11:46to look like is this.
  • 11:47So our observed score equals
  • 11:49the true score plus multiple
  • 11:51sources of error. How do
  • 11:52we get to evaluating what
  • 11:54those sources of of error
  • 11:56are and what the relative
  • 11:58contributions are to the overall
  • 12:00error number.
  • 12:04And that's and we I
  • 12:06was stuck there for a
  • 12:07long time. We had collected
  • 12:09our data,
  • 12:11and I was, you know,
  • 12:12really trying to move forward.
  • 12:14And the problem was there
  • 12:16just wasn't the expertise to
  • 12:17to run the studies that
  • 12:18we needed to run, at
  • 12:20least that that I could
  • 12:21find after,
  • 12:22a lot of a lot
  • 12:23of emails and communications around
  • 12:25this, trying to find somebody
  • 12:27to to run the studies
  • 12:28that we needed to do
  • 12:29to to get to this
  • 12:31multiple sources of of error.
  • 12:33And it's a type of
  • 12:34of analysis that's called the
  • 12:35generalizability
  • 12:36theory.
  • 12:39Fortunately,
  • 12:40two things happen.
  • 12:42One,
  • 12:43Donna Windisch in the department
  • 12:45started,
  • 12:46started the Department of Medicine
  • 12:48educational
  • 12:49grant.
  • 12:50That came out about the
  • 12:51same time as,
  • 12:53as I was in the
  • 12:53the struggle to to do
  • 12:54this analysis.
  • 12:56And I was introduced to
  • 12:57Haidong Lu, who's who's here
  • 13:00today as well. And,
  • 13:02with the the funding support,
  • 13:03I was able to connect
  • 13:05with Haidong, and, and we
  • 13:06were able to to plan
  • 13:08together and and,
  • 13:10he he became my my
  • 13:12expert for for getting this
  • 13:13done and was really the,
  • 13:15the
  • 13:16the the key piece to
  • 13:18to be able to move
  • 13:18this, this forward. So I'm
  • 13:21extremely, extremely grateful, both for
  • 13:23the educational research grant and
  • 13:25for, for.
  • 13:28And so what what he
  • 13:29was able to do is
  • 13:30this analysis called, generalizability
  • 13:34theory.
  • 13:35And what this does is
  • 13:37it,
  • 13:39it tries to,
  • 13:41to distill down the various
  • 13:43sources of error that could
  • 13:45be contributing to our overall
  • 13:46reliability
  • 13:48and,
  • 13:49figure out the the relative
  • 13:51contributions of each. So within
  • 13:53this,
  • 13:53this framework of this analysis,
  • 13:56we see that there are
  • 13:57effects,
  • 13:58otherwise known as as facets.
  • 14:00These are the potential sources
  • 14:02of error as we're,
  • 14:04as we're performing our assessment.
  • 14:05So we see things on
  • 14:06there like the learner,
  • 14:08the rater.
  • 14:09The syndrome refers to,
  • 14:12within our EPA, students are
  • 14:14or learners are evaluating the
  • 14:16dyspnic patient, the patient with
  • 14:17abdominal distension, the patient with
  • 14:19hypotension. So various syndromes that
  • 14:21they're, they're evaluating.
  • 14:23And there are
  • 14:24interactions between these things as
  • 14:26well. So there are interactions
  • 14:27between the learner and the
  • 14:29rater, the learner and the
  • 14:29syndrome, the rater and and
  • 14:31on and on and
  • 14:33on. And the idea is
  • 14:34to try and get to
  • 14:35how much are each of
  • 14:36these contributing to the overall
  • 14:39error. And,
  • 14:41we call that the percent
  • 14:42variance. So if we think
  • 14:44about,
  • 14:44there is
  • 14:46an absolute number that is
  • 14:48that error, and within that
  • 14:49absolute number, there are contributions
  • 14:51from each one of these
  • 14:52things. How much does each
  • 14:53of these contribute to that
  • 14:54error number? And it also
  • 14:56gives us a measure of
  • 14:58reliability, and we're gonna circle
  • 15:00back to this,
  • 15:01because one of
  • 15:03the the powers of this,
  • 15:05assessment technique is it allows
  • 15:06us to do what's called
  • 15:07the decision study
  • 15:09where we can estimate,
  • 15:11how many observations or how
  • 15:13many raters do we need
  • 15:14to achieve a certain level
  • 15:15of reliability,
  • 15:17which really helps us to
  • 15:18optimize our processes
  • 15:20of assessment moving forward.
  • 15:22So we're just gonna take
  • 15:22a quick peek at,
  • 15:24at each of these and
  • 15:25talk briefly about, some of
  • 15:26the the the main effects.
  • 15:29So we looked at learner
  • 15:30variance. And for medical education
  • 15:33studies, what you really wanna
  • 15:34see is that the learner
  • 15:35variance is high. You want
  • 15:38the error attributable to differences
  • 15:41in the learner,
  • 15:42different skill sets, different, degrees
  • 15:45of competence.
  • 15:47A high
  • 15:48learner variance is tells you
  • 15:50that you are able to
  • 15:51accurately
  • 15:52discriminate between differences in competency
  • 15:55between your your your learners.
  • 15:57And one of the things
  • 15:58I had to kind of
  • 15:59wrap my head around was,
  • 16:00well, what is what is
  • 16:01high?
  • 16:03You know, this number, twenty
  • 16:04seven point seven,
  • 16:06felt low when it came
  • 16:07out. As it turns out,
  • 16:08that's actually,
  • 16:09quite a robust number for
  • 16:10this type of study. And
  • 16:12so when we're looking at
  • 16:13numbers above twenty five percent,
  • 16:16that's actually considered,
  • 16:17quite good for,
  • 16:19for a medical education reliability
  • 16:22study. So we're we're quite
  • 16:23pleased with our learner variance.
  • 16:29We looked at rater variance.
  • 16:31So this is the idea
  • 16:32of, can some of that
  • 16:34error term or or how
  • 16:35much of that error term
  • 16:36is attributed to just difference
  • 16:37in how the raters are
  • 16:38scoring.
  • 16:39And that could be, as
  • 16:41we know, some some of
  • 16:42us are very strict when
  • 16:43we evaluate our learners. Some
  • 16:45of us are very lenient
  • 16:47when we evaluate our learners.
  • 16:49There's also the element of
  • 16:50we're using EPAs, and and
  • 16:52that's that's a newer way
  • 16:54of assessment. And so,
  • 16:56you know, how well did
  • 16:57our raters understand this tool
  • 16:59that we're that we're using?
  • 17:01We train them. We we
  • 17:02would hope that they would
  • 17:02understand it well, but,
  • 17:04but did they?
  • 17:05Ideally, we want this portion
  • 17:07of the variance to be
  • 17:08quite small. We don't want,
  • 17:10the the, a large portion
  • 17:12of the error being attributed
  • 17:14to the raters. And for
  • 17:15us, the number was sixteen
  • 17:17point five percent.
  • 17:19And,
  • 17:20boy, I was happy because
  • 17:21that seemed really low. But
  • 17:23as it turns out, sixteen
  • 17:25point five is it's not
  • 17:26high or low. It's right
  • 17:27in the middle. I would
  • 17:28call it a modest contribution
  • 17:30to,
  • 17:31to the error value.
  • 17:33And what's nice about this
  • 17:34is it really points us
  • 17:35in a direction to say,
  • 17:37you know, where can we
  • 17:38improve in our assessment methodology
  • 17:40and gives us a target
  • 17:42for that, perhaps more training
  • 17:43of our of our raters.
  • 17:45Carrie, did you have a
  • 17:45question?
  • 17:46In this data set Yeah.
  • 17:49How many
  • 17:50rate
  • 17:51learner have? Yeah. It's a
  • 17:53good question. There was a
  • 17:54range. There was, six hundred
  • 17:56and four assessments that were
  • 17:58done by
  • 17:59I think our final number
  • 18:00was fifteen
  • 18:02different
  • 18:03raters.
  • 18:04And there was variability in
  • 18:06terms of
  • 18:07how many,
  • 18:08how many assessments were done
  • 18:10by each rater. I don't
  • 18:11have off the top of
  • 18:11my head what the average
  • 18:12number of
  • 18:14assessments per rater was.
  • 18:16But the the analysis,
  • 18:19factors
  • 18:20factors that in. How? I
  • 18:22don't I'd have to ask.
  • 18:23I don't agree with her
  • 18:24to to get into go
  • 18:25into the depths with them.
  • 18:27Like, if I No. No.
  • 18:29Each learner has,
  • 18:31has encounters with multiple raters.
  • 18:33Yeah. Yeah.
  • 18:39And then the last, the
  • 18:40last of the effects that
  • 18:41I'll I'll highlight is, is
  • 18:43case variance.
  • 18:44And this is really looking
  • 18:45at how much of the
  • 18:46variance is due to difficulties
  • 18:48in in case
  • 18:50variability or case, case difficulty.
  • 18:52And, ideally, you want this
  • 18:53to be to be quite
  • 18:55low.
  • 18:57That number,
  • 18:59of one percent looks low
  • 19:00and and is low, so
  • 19:01we were actually quite happy
  • 19:02with,
  • 19:03with our our case variance.
  • 19:05To be honest, I was
  • 19:05I was a bit surprised
  • 19:07because there's such a range
  • 19:08of different clinical syndromes that
  • 19:10the,
  • 19:12that the the residents were
  • 19:13were seeing. I have some
  • 19:14theories around why it might
  • 19:16be low,
  • 19:17such as,
  • 19:18it's really the the the
  • 19:20difficulty is in the the
  • 19:21use of the ultrasound, not
  • 19:23in the approach to the
  • 19:24to the patient. The the
  • 19:25residents have a certain skill
  • 19:26level with the the patients.
  • 19:28The new skill is the
  • 19:29ultrasound, and so residents of
  • 19:30a certain level of competence
  • 19:32with ultrasound are gonna score
  • 19:34the same regardless of,
  • 19:36of the patient that that's
  • 19:37in front of them. And
  • 19:38that's that's,
  • 19:40my assessment of why that
  • 19:41number is so low.
  • 19:45As I mentioned before, one
  • 19:46of the the powerful parts
  • 19:48of the generalizable
  • 19:50theory analysis is that it
  • 19:52can,
  • 19:53lead to what's called a
  • 19:54decision study. And decision study
  • 19:56allows us to predict
  • 19:59the
  • 20:00the reliability
  • 20:01of the assessments
  • 20:02for varying levels of effect
  • 20:04or or facets. And so
  • 20:05in this hypothetical,
  • 20:08dataset here,
  • 20:10we can say, how much
  • 20:12does the reliability
  • 20:15estimate change if we keep
  • 20:16the number of raters the
  • 20:18same,
  • 20:19but we increase this is
  • 20:21an OSCE, but we increase
  • 20:22the number of stations within
  • 20:24the OSCE. And we see
  • 20:25that by increasing the number
  • 20:27of stations, you actually get
  • 20:28a a nice jump in
  • 20:30your reliability. And our thresholds
  • 20:32for reliability
  • 20:33here,
  • 20:34for most most clinical
  • 20:37items, you want a reliability
  • 20:38of point seven or point
  • 20:39eight. And And so by
  • 20:40increasing the number of stations,
  • 20:41we're able to get the
  • 20:43the these authors were able
  • 20:44to get the reliability up
  • 20:45to over, over point eight.
  • 20:48You might ask the questions,
  • 20:49well, what happens if we
  • 20:50increase the number of raters
  • 20:51instead of increasing the number
  • 20:52of stations? Can we improve
  • 20:53our our reliability that way?
  • 20:55And going from two raters
  • 20:57to eight raters really didn't
  • 20:58make a meaningful impact in
  • 21:00reliability. And and so you
  • 21:01can take this and you
  • 21:02can say, alright. Well, if
  • 21:03we're designing an assessment
  • 21:04tool and assessment process, really,
  • 21:06we wanna put our focus
  • 21:08on,
  • 21:09the number of observations or
  • 21:10the number of stations. And
  • 21:11and so that's just an
  • 21:12example of sort of how
  • 21:14decision study can be can
  • 21:15be utilized. Yeah.
  • 21:17About that.
  • 21:18That would suggest to me
  • 21:19that the variability is largely
  • 21:21in a rater than across
  • 21:23raters.
  • 21:24Is that correct on you?
  • 21:29In fact, it doesn't I
  • 21:31mean, you would think it
  • 21:32it there's a lot of
  • 21:33variability among rater. Yeah.
  • 21:35Some are really conservatives. Right.
  • 21:38Then you expect
  • 21:39increasing the number of raters
  • 21:41would have a substantial effect
  • 21:42would have an impact. Averaging
  • 21:43of that. Yeah. So I
  • 21:45would agree with you. I
  • 21:45would say that this in
  • 21:47this particular this isn't my
  • 21:48data. This is a hypothetical
  • 21:49dataset
  • 21:50that,
  • 21:51there
  • 21:52probably wasn't a lot of
  • 21:53variability amongst the raters, and
  • 21:54so adding more raters didn't
  • 21:56make a didn't make a
  • 21:57difference in terms of reliability.
  • 21:59Well, but and
  • 22:00is this consistent with the
  • 22:01numbers you showed us before
  • 22:02for the percentage of variability
  • 22:03was attributable to the raters?
  • 22:05No. And so I'll I'll
  • 22:06show you what it looked
  • 22:07like for our data. I've
  • 22:09just this was just this
  • 22:09is just a hypothetical just
  • 22:11to make the point of
  • 22:12what sort of what decision
  • 22:13studies can do if we
  • 22:14if we if we change
  • 22:16the different elements.
  • 22:20I just got a text.
  • 22:21Please repeat the question. In
  • 22:23general
  • 22:24Oh, okay.
  • 22:25I think the microphone's not
  • 22:26working. Just when you get
  • 22:28a question, just repeat it
  • 22:30so online people can hear
  • 22:31it. Okay. We'll we'll we'll
  • 22:33do that moving forward.
  • 22:36So this is,
  • 22:37this is this is our
  • 22:39our data.
  • 22:41And this is the final
  • 22:42product
  • 22:43of our data, meaning this
  • 22:45was the,
  • 22:46this is the data that
  • 22:48seemed to improve,
  • 22:50reliability
  • 22:52best.
  • 22:53And
  • 22:54what it what it came
  • 22:55out with, it is really
  • 22:57the number of observations
  • 22:58made the biggest impact on
  • 23:01moving our
  • 23:03reliability,
  • 23:05curve towards,
  • 23:06towards that point eight. We
  • 23:07chose the higher value point
  • 23:09eight rather than point seven
  • 23:10as our as our cutoff.
  • 23:12And what it looked like
  • 23:13is to get our you
  • 23:15know, given the parameters, keeping
  • 23:16everything else stable, and just
  • 23:17changing the number of observations,
  • 23:21getting our observations up to
  • 23:22about ten gives us a
  • 23:25reliability
  • 23:26to a level point eight
  • 23:27to understanding
  • 23:28where our learners are with
  • 23:30their pocus competency. Doesn't mean
  • 23:32ten observations and your learner
  • 23:34is competent in pocus. It
  • 23:36means after ten observations,
  • 23:38I can reliably
  • 23:39understand
  • 23:40what level that they are
  • 23:42at.
  • 23:44So that's that's quite useful
  • 23:46for helping us to to
  • 23:47understand sort of sort of
  • 23:48next steps.
  • 23:50The limitations of the of
  • 23:51our of our work, I
  • 23:53think the main limitation, we
  • 23:54did it across three large
  • 23:56academic hospitals. So it was,
  • 23:58it was us, it was
  • 23:59MGH, and it was, OHSU
  • 24:01that were, part of it.
  • 24:03Most of the observations came
  • 24:04from us here at Yale.
  • 24:07I think,
  • 24:08I think that
  • 24:10influences the generalizability
  • 24:11of of what we're what
  • 24:13we're putting out there. How,
  • 24:14you know, how would a
  • 24:15tool like this work at
  • 24:16a smaller program, someplace that
  • 24:18it does not have such,
  • 24:19robust, point of care ultrasound,
  • 24:22expertise,
  • 24:24unclear.
  • 24:25And it's mostly in an
  • 24:26inpatient setting. How does this
  • 24:27translate to an outpatient setting,
  • 24:29where ultrasound is also being
  • 24:31used, also unclear.
  • 24:34So conclusions and and next
  • 24:36steps,
  • 24:37the,
  • 24:38you know, the within the
  • 24:40study, we were able to
  • 24:41generate validity,
  • 24:43and feasibility
  • 24:44evidence to support,
  • 24:45what is a a a
  • 24:47very novel,
  • 24:48approach to looking at point
  • 24:49of care ultrasound,
  • 24:51competency.
  • 24:52We
  • 24:53need to put more time,
  • 24:54I think, into rater training,
  • 24:56to make sure that raters
  • 24:58are being consistent in their
  • 25:00in their assessments,
  • 25:01of of the learners, which
  • 25:03probably means,
  • 25:04both
  • 25:05reorienting them to EPAs and
  • 25:07making sure they feel comfortable
  • 25:08with that and and probably
  • 25:10doing some calibration training to
  • 25:12make sure that my level
  • 25:13three is the same as
  • 25:14your level three, etcetera.
  • 25:17When you find the outlier
  • 25:18using these data, can you
  • 25:19find the people that's
  • 25:21find the raters who are
  • 25:22giving everybody Yeah. Who yeah.
  • 25:24We probably can. Yeah. We
  • 25:26probably could probably jump in
  • 25:27and and figure out, like,
  • 25:28who, like, pinpoint who who
  • 25:29really who really needs the
  • 25:31help.
  • 25:32But I guess it's, you
  • 25:33know, it's it's challenging because
  • 25:34you always gotta say, like,
  • 25:35what's your like, who's the
  • 25:36standard, I guess, that you
  • 25:37would compare to. So,
  • 25:40maybe it's me. Maybe me.
  • 25:41I'm too lenient or too
  • 25:42strict. I don't know. So
  • 25:44it'd be interesting to to
  • 25:45think about. That might be
  • 25:45another another study that we
  • 25:47look at. We'll group the
  • 25:48standard. I mean, basically, what
  • 25:50you you do is predict
  • 25:52the score based on the
  • 25:53rater identification.
  • 25:54Interesting.
  • 25:55People who have higher than
  • 25:57average scores, you can do
  • 25:58that. People who have lower
  • 25:59than average scores, you can
  • 26:00do that. Yeah. Nice. So
  • 26:02the the the question for
  • 26:03the, for the Zoom room,
  • 26:06was about using the data
  • 26:08to,
  • 26:09to predict who who are
  • 26:10the more lenient or the
  • 26:11more strict,
  • 26:12raters.
  • 26:14And, doctor Justice was just
  • 26:16giving some some tips on
  • 26:17how we might, might design
  • 26:18that.
  • 26:21I think one of the
  • 26:22things that that I'm interested
  • 26:23in thinking about is, you
  • 26:24know, particularly as we're working
  • 26:25on this privileging process at
  • 26:27the hospital for point of
  • 26:28care ultrasound is is thinking
  • 26:30about using this as a
  • 26:31tool, for more summative level
  • 26:33decision making,
  • 26:35around,
  • 26:36around the privileging,
  • 26:37process,
  • 26:38here at here at Yale.
  • 26:43Some
  • 26:44thank you. So,
  • 26:46Janet and John, I did
  • 26:47this work as part of
  • 26:48my masters of health
  • 26:50science, Donna in the department,
  • 26:52for making the the grant
  • 26:54available.
  • 26:56David and and Jeanette just,
  • 26:58just master
  • 26:59mentors and and really encouraging
  • 27:02and,
  • 27:02and facilitating my interaction with,
  • 27:05with Haidong Liu, which is
  • 27:06really what what made this
  • 27:07project move, move forward,
  • 27:10and then, the team of,
  • 27:12of researchers
  • 27:13that I was able to
  • 27:14work with.
  • 27:15Alright. That's it. Matt.
  • 27:17Yeah. That's really great. Thank
  • 27:18you.
  • 27:20So I had a couple
  • 27:21questions. One was,
  • 27:23are there other at the
  • 27:24hospital level, in terms of
  • 27:25privileging,
  • 27:27is there anything analogous to
  • 27:28this sort of
  • 27:30level of really assessing like,
  • 27:32a heart transplant is probably
  • 27:33more competency assessment than a
  • 27:35heart transplant,
  • 27:37for surgeon, I would think.
  • 27:39Yeah. So and ask question
  • 27:41one was, is is there
  • 27:42something comparable to this,
  • 27:44type of assessment in in
  • 27:46other areas of privileging?
  • 27:48And then do you have
  • 27:49second question too? Second one
  • 27:51was, I know it's very
  • 27:52different, but you was there
  • 27:53anything useful
  • 27:54in the radiology world
  • 27:57in terms of how competency
  • 27:59is assessed
  • 28:00for either technicians or ultrasonographer
  • 28:03radiologists?
  • 28:04Yeah. And then the second
  • 28:05question was there is there
  • 28:06anything comparable in the in
  • 28:07the radiology world?
  • 28:09So so the first question,
  • 28:11there's nothing comparable that I'm
  • 28:12aware of in within privileging.
  • 28:15If you're, you know, for
  • 28:16example,
  • 28:18privileging for,
  • 28:19you know,
  • 28:20if you're a heart surgeon
  • 28:21to do a heart transplant
  • 28:22is really kind of number
  • 28:23of cases that you've done
  • 28:25in graduating from a, you
  • 28:26know, an accredited program or
  • 28:28you did your fellowship in
  • 28:30x y or x y
  • 28:31or z.
  • 28:34A lot of training works
  • 28:35that way. I think it's
  • 28:36probably more comparable to sort
  • 28:38of how we
  • 28:40we privilege around procedures where
  • 28:42it's like you have to
  • 28:43do, you know, five
  • 28:45central lines, and then you're
  • 28:46you're magically competent in, in
  • 28:49that,
  • 28:50which which creates a real
  • 28:51problem. So, you know, a
  • 28:52lot of hospital systems
  • 28:54use, like, a number based
  • 28:56algorithm for deciding who's privileged
  • 28:57or not. So you've done
  • 28:59fifty cardiac studies. Now you're
  • 29:01privileged. But the number
  • 29:04definitely does not tell the
  • 29:05story.
  • 29:06I work with, with trainees.
  • 29:08Some have done fifty cardiac
  • 29:09studies, and they're great. And
  • 29:11I work with others that
  • 29:12have done fifty, and they
  • 29:13really still stink. And so
  • 29:14the number
  • 29:15but there's always a feasibility
  • 29:16element, you know, for the,
  • 29:17you know, like, the credentialing
  • 29:19committee where where they have
  • 29:20to say, like,
  • 29:22you know, if it gets
  • 29:22too complicated,
  • 29:23it it it gets unmanageable
  • 29:25for them to do. So
  • 29:26numbers make it very simple.
  • 29:27Alright? I I can check
  • 29:28the box. They've done x
  • 29:30number and therefore you're you're
  • 29:31privileged,
  • 29:33which may work for for
  • 29:34privileging. I I think if
  • 29:36we run a true assessment
  • 29:37of competency though, we we
  • 29:39have to take a more
  • 29:40holistic
  • 29:41way of, of looking at
  • 29:42that. And then,
  • 29:45nothing
  • 29:46from the radiology
  • 29:47world.
  • 29:49I think also because
  • 29:51you you finish your residency
  • 29:53in in radiology and and
  • 29:55then you are privileged to
  • 29:56to be a a radiologist.
  • 29:57And and so I don't
  • 29:59know that they're necessarily
  • 30:00faced with this with this
  • 30:02problem.
  • 30:03And they have a whole
  • 30:04residency
  • 30:05to to learn this stuff,
  • 30:06whereas we're trying to say
  • 30:07how quickly can I get
  • 30:08somebody from, you know, being
  • 30:10a novice to an expert
  • 30:11so they can start using
  • 30:12this in clinical practice?
  • 30:15I'll be mindful of the
  • 30:16fact that you mentioned you
  • 30:17had
  • 30:19a obligation. So,
  • 30:21Thank you.
  • 30:22If you have questions, follow-up
  • 30:23that. I'm gonna
  • 30:25maybe I should tell people.
  • 30:26Oh, great. Yeah. So,
  • 30:28if there are more more
  • 30:29questions,
  • 30:31please feel free to to
  • 30:32email me.
  • 30:33Happy to to answer things
  • 30:35over over email as well.
  • 30:36I will. Great. Thanks.
  • 30:42Great testaments to finding the
  • 30:44interest.