Skip to Main Content

BIS Seminar: Dealing with observed and observed effect moderators wehn estimating population average treatment effects

September 22, 2020
  • 00:00- Maybe one or two minutes and then,
  • 00:02I'll have you introduced.
  • 00:03- And it's about, and so I...
  • 00:05And it's gonna be more fun for me if it's a little
  • 00:07interactive, as much as we can make it.
  • 00:09So I won't be able to see all of you nodding and whatnot,
  • 00:12but please feel free to jump in.
  • 00:15And the talk's gonna be pretty non-technical.
  • 00:17My goal is mostly to sort of help
  • 00:19convey some of the concepts and ideas and so I will.
  • 00:23Hopefully it will be a reasonable topic to do via Zoom.
  • 00:30Great, so I think,
  • 00:33Frank basically gave this stuff that's relevant
  • 00:36on this slide.
  • 00:37I do also wanna apologize, those of you guys
  • 00:39who I was supposed to meet with this morning, we have a...
  • 00:41My husband broke his collarbone over the weekend.
  • 00:44So I've had to cancel things this morning,
  • 00:47but I'm glad I'm able to still do this seminar,
  • 00:51I didn't wanna,
  • 00:52have to cancel that.
  • 00:54So again,
  • 00:56the topic is gonna be sort of this idea of external
  • 00:59validity, which I think is a topic that people often
  • 01:01are interested in because it's the sort of thing
  • 01:04that we often think sort of qualitatively about,
  • 01:06but there hasn't been a lot of work thinking about it
  • 01:08quantitatively.
  • 01:09So again, my goal today will be to sort of help
  • 01:11give a framework for thinking about external validity
  • 01:15in sort of a more formal way.
  • 01:19So let's start out with the sorts of questions
  • 01:22that might be relevant when you're thinking about
  • 01:25external validity.
  • 01:27So it might be research questions like a health insurer
  • 01:30is deciding whether or not to approve some new treatment
  • 01:34for back pain.
  • 01:36There might be interested predicting overall population
  • 01:39impacts of a broad public health media campaign.
  • 01:43A physician practice might be deciding whether training
  • 01:46providers in a new intervention would actually be cost
  • 01:49effective given the patient population that they have.
  • 01:53And that I felt like I needed to get some COVID
  • 01:55example in...
  • 01:57But, for example, a healthcare system,
  • 01:59might wanna know whether it's sort of giving convalescent
  • 02:02plasma to all of the individuals recently diagnosed
  • 02:06with COVID-19 in their system, whether that would
  • 02:08sort of lead to better outcomes overall.
  • 02:12So all of these...
  • 02:15What I'm distinguishing here or sort of trying to convey
  • 02:17is that all of these reflect what I will call a population
  • 02:20average treatment effect.
  • 02:22So across some well-defined population,
  • 02:25does some intervention work sort of on average.
  • 02:28The population might be pretty narrow.
  • 02:30Again, it might be the patients in one particular
  • 02:33physician practice, or might be quite broad.
  • 02:35It could be everyone in the State of Connecticut
  • 02:38or in the entire country.
  • 02:40But either way, it's a well-defined kind of population
  • 02:44and we'll come back to that.
  • 02:46What's really important,
  • 02:48and this will sort of underlie much of the talk
  • 02:50is that kind of the whole point is that there might
  • 02:52be underlying treatment effect heterogeneity.
  • 02:55So there might be some individuals
  • 02:57for whom this treatment of interest is actually
  • 02:59more effective than others.
  • 03:01But what I wanna be clear about, is the goal of inference
  • 03:04that I'm talking about today, is gonna be about
  • 03:07this overall population average.
  • 03:09So we're not trying to say like which people
  • 03:11are gonna benefit more or sort of to which people
  • 03:14should we give this treatment.
  • 03:16It's really more a question of sort of more population
  • 03:20level decisions, sort of if we have...
  • 03:22If we're making a decision, that's sort of a policy
  • 03:24kind of population level,
  • 03:25on average is this gonna be something that makes sense.
  • 03:28So I hope that distinction makes sense.
  • 03:30I'm happy to come back to that.
  • 03:35So again until I don't know, five or,
  • 03:38well maybe now more than 10 years ago,
  • 03:41there had been relatively little attention
  • 03:43to the question of how well results from
  • 03:46kind of well-designed studies like a randomized trial
  • 03:50might carry over to a relevant target population.
  • 03:53I think in much of statistics as well as fields
  • 03:56like education research, public policy, even healthcare,
  • 04:00there's really been a focus on randomized trials
  • 04:03and getting internal validity,
  • 04:05and I'll formalize this in a minute.
  • 04:07But in the past 10 or so years, there's been more and more
  • 04:10interest in this idea of how well can we take the results
  • 04:13from a particular study and then project them
  • 04:17to well-defined target population.
  • 04:20And again, so today I'm gonna try to give
  • 04:21sort of an overview of the thinking in this area,
  • 04:24along with some of the limitations and in particular,
  • 04:27the data limitations that we have in thinking about this.
  • 04:33One thing I do wanna be clear about is there's a lot
  • 04:36of reasons why results from randomized trials
  • 04:38might not generalize.
  • 04:40There's some classic examples in education
  • 04:42where there are scale-up problems.
  • 04:44The classic example is one I'm looking at,
  • 04:50class size.
  • 04:51And so, in Tennessee, they randomly assign kids
  • 04:54to be in smaller versus larger classes
  • 04:57and found quite large effects of smaller classes.
  • 05:00But then, when the State of California tried to implement
  • 05:03this, the problem is that you need a lot more teachers
  • 05:06to kind of roll that out statewide.
  • 05:08And so, it led actually to a different pool of teachers
  • 05:11being hired.
  • 05:12And so, there's sort of scale-up problems
  • 05:14sometimes with the interventions and that might lead
  • 05:16to different contexts or different implementation.
  • 05:19Today, what I'm gonna be focusing on are differences
  • 05:21between a sample and a population.
  • 05:25Their difference is in sort of baseline characteristics,
  • 05:28that moderate treatment effects.
  • 05:29And again, I'll formalize this a little bit as we go along.
  • 05:33Just as a little bit of an aside,
  • 05:34but in case some of you know this field a little bit,
  • 05:37just to give you a little, just...
  • 05:39I wanna flag this.
  • 05:40Some people might use the term transportability.
  • 05:43So some of the literature in this field uses the term
  • 05:46transportability.
  • 05:47I tend to use generalizability.
  • 05:50There's some subtle differences between the two,
  • 05:52which we can come back to, but for all intents and purposes,
  • 05:55like they basically can think of them interchangeably
  • 05:59for now.
  • 06:00I also wanna note, if any of you kind of come
  • 06:02from like a survey world, these debates about
  • 06:06kind of how well a particular sample reflects a target
  • 06:09population are exactly, not exactly the same,
  • 06:12but very similar to the debates happening in the survey
  • 06:15world around non-probability samples and sort of concerns
  • 06:19about,
  • 06:21the use of like say online surveys and things that might not
  • 06:25have a true formal sort of survey sampling design,
  • 06:28and sort of some of the concerns that arise about
  • 06:31generalizability.
  • 06:32So there's this whole parallel literature in the survey
  • 06:34world.
  • 06:35Andrew Mercer has a nice summary of that.
  • 06:37Again, I'm happy to talk more about that.
  • 06:41Okay, any questions before I keep going?
  • 06:49Okay.
  • 06:49So let me formalize kind of what we're talking about
  • 06:52a little bit.
  • 06:53This is...
  • 06:55This framework is now, 12 years old.
  • 06:59Time goes quickly.
  • 07:01But we're just to formalize what we're interested in.
  • 07:05The goal is to estimate, again, this what I'll call
  • 07:07a population average treatment effect or PATE.
  • 07:10And so here,
  • 07:12hopefully you're familiar with sort of potential outcomes
  • 07:14and causal inference.
  • 07:16But the idea is that we have some well-defined population
  • 07:19of size N.
  • 07:20And Y(1) is the potential outcomes, if people
  • 07:24in that population receive the treatment condition
  • 07:28of interest.
  • 07:29Y(0) are the outcomes if they receive the control
  • 07:32or comparison condition of interest.
  • 07:34So here, we're just saying we're interested
  • 07:35in the average effect, basically sort of the difference
  • 07:40in potential outcomes, average across the population.
  • 07:46We could be doing this with risk ratios
  • 07:49or odds ratios or something.
  • 07:51Those are a little more complicated because the math
  • 07:53doesn't work as nicely.
  • 07:55So for now think about it more like risk differences
  • 07:57or something, if you have a binary outcome,
  • 08:00the same fundamental points hold.
  • 08:03So I'm not gonna tell you right now where
  • 08:05the data we have came from, but imagine that we just
  • 08:08have a simple estimate of this PATE,
  • 08:11as the difference in means of some outcome
  • 08:14between an observed treated group and an observed
  • 08:16control group.
  • 08:17So again, we see that there's a bunch of people
  • 08:20who got treated, a bunch of people who got control,
  • 08:22and we might estimate this PATE as just the simple
  • 08:25difference in means between again, the treatment group
  • 08:28and the control group.
  • 08:29So what I wanna talk through for the next couple of minutes,
  • 08:32is the bias in this sort of naive estimate of the PATE.
  • 08:36So we'll call that Delta.
  • 08:38So I'm being a little loose with notation here,
  • 08:40but sort of the PATE that the bias essentially
  • 08:43think of it as sort of the difference between
  • 08:45the true population effect and our naive estimate of it.
  • 08:49And what this paper did with Gary King and Kosuke Imai,
  • 08:54we sort of laid how different choices of study designs
  • 08:58impact the size of this bias.
  • 09:01And in particular, we showed that sort of under
  • 09:03some simplifying situations,
  • 09:05sort of mathematical simplicity,
  • 09:07you can decompose that overall bias into four pieces.
  • 09:11So the two Delta S terms are what are called,
  • 09:15what we call sample selection bias.
  • 09:17So basically, the bias that comes in if our data sample
  • 09:22is not representative of the target population
  • 09:25that we care about.
  • 09:27The Delta T terms are our typical sort of confounding bias.
  • 09:31So bias that comes in if our treatment group is dissimilar
  • 09:36from our control group.
  • 09:38The X refers to the variables we observe,
  • 09:40and the U refers to variables that we don't observe.
  • 09:45So what we then did in the paper,
  • 09:46and this is sort of what motivates a lot of this work
  • 09:49is to think through these, again, the trade offs
  • 09:51in these different designs.
  • 09:53And essentially what we're trying to sort of point out
  • 09:56is that...
  • 09:59Let's go to the second row of this table first actually,
  • 10:01a typical experiment.
  • 10:02So a typical experiment, I would say is one where
  • 10:06we kind of take whoever comes in the door,
  • 10:08we kind of try to recruit people for a randomized trial,
  • 10:11whether that's schools or patients or whatever it is.
  • 10:16And we randomized them to treatment and control groups.
  • 10:19So that is our typical randomized experiment.
  • 10:22The treatment selection bias in that case is zero.
  • 10:26In expectation, that's why we like randomized experiments.
  • 10:29In expectation, there is no confounding
  • 10:32and we get an unbiased treatment effect estimate
  • 10:34for the sample at hand.
  • 10:37The problem for population inference
  • 10:40is that the Delta S terms might be big,
  • 10:43because the people that agree to be in a randomized trial,
  • 10:46might be quite different from the overall population
  • 10:49that we care about.
  • 10:51So in this paper, we're trying to just sort of...
  • 10:53In some ways, be a little provocative and point this out
  • 10:56that our standard thinking about study designs
  • 10:59and sort of our prioritization of randomized trials,
  • 11:03implicitly prioritizes internal validity over external
  • 11:07validity.
  • 11:08And in particular, if we really care about
  • 11:12population effects, we really should be thinking about
  • 11:15these together and trying to sort of have small
  • 11:18sample selection bias and small treatment selection bias.
  • 11:22So an ideal experiment would be one where we can randomly
  • 11:25select people for our trial.
  • 11:28Let's say we have...
  • 11:30Well, actually, I'll come back to that in a second.
  • 11:31Randomly select people for our trial and then randomly
  • 11:34assign people to treatment or control groups.
  • 11:37And in expectation, we will have zero bias in our population
  • 11:41effect estimate.
  • 11:42But these other designs, and again,
  • 11:44like a typical experiment might end up having larger bias
  • 11:47overall, than a well designed non-experimental study,
  • 11:51where if we do a really good job like adjusting
  • 11:54for confounders,
  • 11:55it may be that well done non-experimental study
  • 11:59conducted using say the electronic health records
  • 12:02from a healthcare system might actually give us lower bias
  • 12:06for a population effect estimate.
  • 12:08Then does a non-representative small randomized trial.
  • 12:12Again, a little provocative,
  • 12:13but I think useful to be thinking about what is really our
  • 12:17target of inference and how do we get data that is most
  • 12:19relevant for that.
  • 12:22I will also just as a small aside,
  • 12:24maybe a little on the personal side,
  • 12:26but it's been striking to me in the past two days.
  • 12:28So my husband broke his collarbone over the weekend.
  • 12:31And it turns out the break is one where there's a little bit
  • 12:35of debate about whether you should have surgery or not.
  • 12:38Although kind of recent thinking is that
  • 12:39there should be surgery.
  • 12:40And I was doing a PubMed search as a good statistician
  • 12:44public health person whose family member
  • 12:47needs medical treatment.
  • 12:49And I found all these randomized trials that actually
  • 12:52randomized people to get surgery or not.
  • 12:55And then I came home...
  • 12:56Oh, no, I didn't come home, we were home all the time.
  • 12:59I asked my husband later, I was like,
  • 13:00would you ever agree to be randomized?
  • 13:02Like right now, we are trying to make this decision about,
  • 13:05should you have surgery or not.
  • 13:07And would we ever agree to be randomized?
  • 13:09And he's like, no, we wouldn't.
  • 13:11We're gonna go with what the physician recommends
  • 13:15and what we feel is comfortable.
  • 13:16And it really just hit home for me at this point that
  • 13:19the people who agree to be randomized or the context
  • 13:22under which we can sort of randomize
  • 13:26are sometimes fairly limited.
  • 13:28And again, so partly what this body of research is trying
  • 13:31to do is sort of think through what are the implications
  • 13:33of that when we do wanna make population inferences.
  • 13:38Make sense so far?
  • 13:39I can't see faces, so hopefully.
  • 13:43Okay.
  • 13:47So,
  • 13:48I will say a lot of my work in this area has actually,
  • 13:50in part been just helping or trying to raise awareness
  • 13:53of thinking about external validity bias.
  • 13:56So some of the research in this area has been trying
  • 14:00to understand how big of a problem is this.
  • 14:03If maybe people don't agree to be in randomized trials
  • 14:06very often,
  • 14:07but maybe that doesn't really cause bias in terms
  • 14:10of our population effect estimates.
  • 14:12So what I've done in a couple of papers on these
  • 14:15other sides on this slide is basically trying to formalize
  • 14:18this and it's pretty intuitive, but basically we show,
  • 14:22and I'm not showing you the formulas here.
  • 14:24But intuitively, there will be bias in a population effect
  • 14:28estimate essentially if participation in the trial
  • 14:33is associated with the size of the impacts.
  • 14:35So in particular,
  • 14:38what I'll call the external validity bias.
  • 14:39So,
  • 14:40those Delta S terms kind of the bias
  • 14:42due to the lack of representativeness
  • 14:45is a function of the variation of the probabilities
  • 14:48of participating in a trial,
  • 14:50variation and treatment effects,
  • 14:52and then the correlation between those things.
  • 14:54So if constant...
  • 14:56If we have treat constant treatment effects
  • 14:58or the treatment effect is zero
  • 14:59or is two for everyone, there's gonna be no external
  • 15:02validity bias.
  • 15:03It doesn't matter who is in our study.
  • 15:06Or if there...
  • 15:08If everyone has an equal probability of participating
  • 15:10in the study, we really do have a nice random selection,
  • 15:14then again, there's gonna be no external validity bias.
  • 15:17Or if the factors that influence whether or not you
  • 15:20participate in the study are independent of the factors
  • 15:23that moderate treatment effects,
  • 15:25again, there'll be no external validity bias.
  • 15:29The problem is that we often have very limited information
  • 15:32about these pieces.
  • 15:34We, as a field, I think medicine, public health, education,
  • 15:38all the fields I worked in, there has not been much
  • 15:41attention paid to these processes of how we actually
  • 15:44enroll people in studies.
  • 15:46And so it's hard to know kind of what factors relate
  • 15:49to those and if those then also moderate treatment effects.
  • 15:53(phone ringing)
  • 15:54Oops, sorry.
  • 15:55Incoming phone call, which I will ignore.
  • 15:58So,
  • 15:59there has been...
  • 16:01Sorry.
  • 16:03There has been a little bit of work trying to document this
  • 16:05in real data and find empirical evidence on these sizes.
  • 16:11The problem, and sorry, some of the...
  • 16:13Some of you might...
  • 16:14If any of you are familiar with the, like,
  • 16:16within what it's called the within study comparison
  • 16:18literature.
  • 16:19So there's this whole literature on non-experimental studies
  • 16:23that sort of try to estimate the bias due to non-random
  • 16:28treatment assignment.
  • 16:30This is sort of analogous to that.
  • 16:32But the problem here is that what you need is you need
  • 16:34an accurate estimate of the impact in the population.
  • 16:37And then you also need sort of estimates of the impact
  • 16:40in samples that are sort of obtained in kind of typical
  • 16:44ways.
  • 16:45So that's actually really hard to do.
  • 16:47So I'll just briefly talk through two examples.
  • 16:49And if any of you have data examples that you think might
  • 16:52sort of be useful for generating evidence,
  • 16:55that would be incredibly useful.
  • 16:57So one of the examples is...
  • 17:00So let me back up for a second.
  • 17:02In the field of mental health research,
  • 17:03there's been a push recently, or actually not so much
  • 17:06recently in the past, like 10, 15 years
  • 17:08to do what I call or what are called pragmatic trials
  • 17:12with the idea of enrolling much more...
  • 17:16A much broader set of people use a broader set of practices
  • 17:21or locations around the country.
  • 17:23And so what this Wisniewski et al people did was they took
  • 17:27the data from one of those large pragmatic trials.
  • 17:29And the idea they...
  • 17:30Again, the idea was that it should be more representative
  • 17:33of people in this case with depression
  • 17:35across the U.S.
  • 17:37And then, they said, well, what if...
  • 17:38In fact, we didn't have that.
  • 17:40What if we use sort of our normal study inclusion
  • 17:44and exclusion criteria, it's sort of been, we'd like subset,
  • 17:47this pragmatic trial data to the people that we think
  • 17:50would have been more typically included in a sort of more
  • 17:53standard randomized trial.
  • 17:55And sort of not surprisingly, they found that
  • 17:58the people in the sort of what they call
  • 17:59the efficacy sample, those sort of typical trial sample
  • 18:03had better outcomes and larger treatment effects
  • 18:05than the overall pragmatic trial sample as a whole.
  • 18:10We did something similar sort of in education research where
  • 18:15it's a little bit in the weeds.
  • 18:16I don't really wanna get into the details,
  • 18:18but we essentially had a pretty reasonable regression
  • 18:22discontinuity design.
  • 18:23So we were able to get estimates of the effects of this
  • 18:26reading first intervention across a number of states.
  • 18:30And we then compared those state wide impact estimates
  • 18:34to the estimates you would get if we enrolled only
  • 18:38the sorts of schools and school districts that are typically
  • 18:41included in educational evaluations.
  • 18:44And there we found that this external validity bias
  • 18:48was about 0.1 standard deviations,
  • 18:50which in education world is fairly large.
  • 18:53Certainly people would be concerned about an internal
  • 18:56validity bias of that size.
  • 18:58So we were able to sort of use this to say, look,
  • 19:00if we really wanna be serious about external validity,
  • 19:03it might be as much of a problem as sort of typical internal
  • 19:06validity bias that people care about in that field.
  • 19:13So again, the problem though, is we don't usually
  • 19:15have these sorts of designs where we have a population
  • 19:17effect estimate, and then sample estimates,
  • 19:19and we can compare them.
  • 19:21And so instead we can sometimes try to get evidence on sort
  • 19:24of the pieces.
  • 19:25So, but again, we basically often have very little
  • 19:28information on why people end up participating in trials.
  • 19:31And we also are having,
  • 19:34I think there's growing numbers of methods,
  • 19:36but there's still limited information on treatment effect
  • 19:39heterogeneity.
  • 19:40Individual randomized trials are almost never powered
  • 19:43to detect subgroup effects.
  • 19:45Although, there is really growing research in this field
  • 19:48and that is maybe a topic for another day.
  • 19:52Okay.
  • 19:53But again, there is a little...
  • 19:55I think I'll go through this really quickly, but,
  • 19:58I will give credit to some fields which are trying to better
  • 20:01understand kind of who are the people that enroll in trials
  • 20:04and how do they compare policy populations of interest.
  • 20:08So a lot of that has been done in sort of the substance
  • 20:11use field.
  • 20:12And you can see a bunch of sites here
  • 20:14documenting that people who participate in randomized trials
  • 20:18of substance use treatment do actually differ quite
  • 20:22substantially from people seeking treatment for substance
  • 20:25use problems more generally.
  • 20:27So for example, the Okuda reference the eligibility criteria
  • 20:32in cannabis treatment RCTs would exclude about 80%
  • 20:36of patients across the U.S. seeking treatment
  • 20:38for cannabis use.
  • 20:40And so again, it's sort of there's indications
  • 20:43that the people that participate in trials
  • 20:45are not necessarily reflective of the people
  • 20:48for whom decisions are having to be made.
  • 20:54Okay, so hopefully that at least kind of give some
  • 20:57motivation for why we want to think more carefully
  • 21:01about the population average treatment effect
  • 21:04and why we might wanna think about designing studies
  • 21:06or analyzing data in ways that help us estimate that.
  • 21:10Any questions before I move to, how do we do that?
  • 21:19Okay.
  • 21:20I will end...
  • 21:21I'm gonna hopefully end it at about 12:45, 1250,
  • 21:24so we'll have time at the end, too.
  • 21:27So, as a statistician, I feel obligated to say,
  • 21:31and actually I have a quote on this at the very end
  • 21:32of the talk.
  • 21:33If we wanna be serious about estimating something,
  • 21:36it's better to incorporate that through the design
  • 21:38of our study, rather than trying to do it post talk
  • 21:41at the end.
  • 21:44So let's talk briefly about how we can improve external
  • 21:47validity through study or randomized trial design.
  • 21:52So again,
  • 21:53as I alluded to earlier with the sort of ideal experiment.
  • 21:56An ideal scenario is one where we can randomly sample
  • 21:59from a population and then randomly assign treatment
  • 22:02and control conditions.
  • 22:04Doing this will give us a formerly unbiased treatment effect
  • 22:07estimate in the population of interest.
  • 22:10This is wonderful.
  • 22:11I know of about six examples of this type.
  • 22:17Most of the examples I know of are actually a federal
  • 22:19government programs where they are administered through
  • 22:23like centers or sites.
  • 22:25And the federal government was able to mandate participation
  • 22:28in an evaluation.
  • 22:29So classic example is the Head Start Impact Study,
  • 22:33where they were able to randomly select headstart centers
  • 22:36to participate.
  • 22:37And then within each center,
  • 22:39they randomized kids to be able to get in off the wait list
  • 22:42versus not.
  • 22:44An upward bound evaluation had a very similar design.
  • 22:48It's funny, I was...
  • 22:50I gave a talk on this topic at Facebook and I was like,
  • 22:52why is Facebook gonna care about this?
  • 22:54Because you would think at a place like Facebook,
  • 22:56they have their user sample,
  • 22:59they should be able to do randomization within,
  • 23:02like they should be able to pick users randomly
  • 23:04and then do any sort of random assignment they want
  • 23:06within that.
  • 23:07It turns out it's more complicated than that, and so,
  • 23:10they were interested in this topic,
  • 23:12but I think that's another sort of example where people
  • 23:15should be thinking, could we do this?
  • 23:16Like,
  • 23:18in a health system.
  • 23:20I can imagine Geisinger or something implement something
  • 23:22in their electronic health record where
  • 23:24it's about messaging or something.
  • 23:26And you could imagine actually picking people randomly
  • 23:29to then randomize.
  • 23:31But again, that's pretty rare.
  • 23:33There's an idea that's called purpose of sampling.
  • 23:35And this goes back to like the 1960s or 70s
  • 23:39and the idea is sort of picking subjects purposefully.
  • 23:44So one example here is like maybe we think
  • 23:47that this intervention might look different
  • 23:49or have different effects for large versus small
  • 23:52school districts.
  • 23:53So in our study, we just make an effort to enroll
  • 23:56both large and small districts.
  • 23:59This is sort of nice.
  • 24:00It kind of gives you some variability in the types of people
  • 24:05or subjects in the trial, but, it doesn't have the formal
  • 24:09representativeness and sort of the formal unbiasness,
  • 24:12like the random sampling I just talked about.
  • 24:15And then again, sort of similar is this idea and this push
  • 24:17in many fields towards pragmatic or practical clinical
  • 24:20trials, where the idea is just to sort of try to enroll
  • 24:24like kind of more representative sample
  • 24:27in sort of a hand wavy way like I'm doing now.
  • 24:29So not, it doesn't have this sort of formal statistical
  • 24:31underpinning, but at least it's trying to make sure
  • 24:35that it's not just patients from the Yale hospital
  • 24:38and the Hopkins hospital and whatever sort of large medical
  • 24:41centers, at least they might be trying to enroll patients
  • 24:45from a broader spectrum across the U.S.
  • 24:49Unfortunately, though, as much as I want to do things
  • 24:53for design often, we're in a case where there's a study
  • 24:56that's already been conducted and we are just
  • 25:00sort of stuck analyzing it.
  • 25:01And we wanna get a sense for how representative
  • 25:04the results might be for a population.
  • 25:09Sometimes people, when I talk about this,
  • 25:10people are like, well, isn't this what meta-analysis does?
  • 25:13Like meta-analysis enables you to combine multiple
  • 25:16randomized trials and come up with sort of an overall
  • 25:20effect estimate.
  • 25:23And my answer to that is sort of yes maybe, or no maybe.
  • 25:26Basically, the challenge with meta-analysis,
  • 25:30is that until recently, no one really had a potential target
  • 25:34population.
  • 25:35It was not very formal about what the target population is.
  • 25:38I think underlying that analysis is generally
  • 25:41sort of a belief that the effects are constant
  • 25:44and we're just trying to pool data.
  • 25:48And it...
  • 25:48And even just like, you can sort of see this,
  • 25:50like if all of the trials sampled the same
  • 25:52non-representative population,
  • 25:54combining them is not going to help you get towards
  • 25:57representativeness.
  • 25:59That's that I have a former Postdoc Hwanhee Hong,
  • 26:01who's now at Duke.
  • 26:03And she has been doing some work to try to bridge
  • 26:06these worlds and sort of really try to think through,
  • 26:08well, how can we better use multiple trials
  • 26:12to get to target population effects?
  • 26:16There's another field it's called risk cross-design
  • 26:18synthesis or research synthesis.
  • 26:21This is sort of neat.
  • 26:22It's one where you kind of combine randomized trial data,
  • 26:26which might be not representative with non-experimental
  • 26:30study data.
  • 26:31So sort of explicitly trading off the internal and external
  • 26:34validity.
  • 26:36I'm not gonna get into the details,
  • 26:37there's some references here.
  • 26:38Ellie Kaizar at Ohio State, is one of the people
  • 26:41that's done a lot of work on this.
  • 26:45And part of the reason I'm not focused on this is that
  • 26:48I work in a lot of areas like education and public health,
  • 26:53sort of social science areas,
  • 26:54where we often don't have multiple studies.
  • 26:56So we often are stuck with just one study and we're trying
  • 27:00to use that to learn about target populations.
  • 27:04So I'm gonna briefly talk about an example
  • 27:07where we trying to sort of do this.
  • 27:12And basically, the fundamental idea is to re-weight
  • 27:16the study sample to look like the target population.
  • 27:21This idea is related to post stratification
  • 27:25or, oh my gosh, I'm blanking now.
  • 27:27Raking adjustments in surveys.
  • 27:31So post stratification would be sort of at a simple level,
  • 27:33would be something like...
  • 27:35Well, if we know that males and females
  • 27:38have different effects, or let's say young and old
  • 27:41have different effects, let's estimate the effects
  • 27:44separately for young versus old.
  • 27:47And then re-weight those using the population proportions
  • 27:51of sort of young versus old.
  • 27:54That sort of stratification doesn't work if you have more
  • 27:58than like one or two categorical effect moderators.
  • 28:02And so,
  • 28:03what I'm gonna show today is an approach where we use
  • 28:06weighting, where we fit a model,
  • 28:08predicting participation in the trial,
  • 28:10and then weight the trial sample to look like the target
  • 28:13population.
  • 28:14So similar idea to things like propensity score weights
  • 28:17or non-response adjustment weights in samples.
  • 28:21There is a different approach,
  • 28:23So what I'm gonna illustrate today is sort of this sample
  • 28:27selection weighting strategy.
  • 28:29You also can tackle this external validity
  • 28:32by trying to model the outcome very flexibly
  • 28:35and then project outcomes in the population.
  • 28:40In some work I did with Jennifer Hill and others,
  • 28:43we showed that BARTs, Bayesian Additive Regression Trees
  • 28:46can actually work quite well for that purpose.
  • 28:49And more recently, Issa Dahabreh at Brown has done some
  • 28:53nice work sort of bridging these two and showing
  • 28:55basically a doubly robust kind of idea where we can use
  • 28:58both the sample membership model and the outcome model
  • 29:04to have better performance.
  • 29:06But today, I'm gonna just illustrate the weighting approach,
  • 29:08partly because it's a really nice sort of pedagogical
  • 29:11example and helps you kind of see what's going on
  • 29:14in the data.
  • 29:16Okay, any questions before I continue?
  • 29:21Okay.
  • 29:22So the example I'm gonna use is...
  • 29:26There was this, I mean, some of you probably know much more
  • 29:28about HIV treatment than I do, but the ACTG Trial,
  • 29:33which was now quite an old trial,
  • 29:36but it was one of the ones that basically showed that
  • 29:39HAART therapy, highly active antiretroviral therapy
  • 29:42was quite effective at reducing time to AIDS or death
  • 29:46compared to standard combination therapy at the time.
  • 29:49So it randomized about 1200 U.S. HIV positive adults
  • 29:54to treatment versus control.
  • 29:56And the intent to tree analysis in the trial
  • 29:59had a hazard ratio of 0.51.
  • 30:01So again, very effective at reducing time to AIDS or death.
  • 30:07So Steve Cole and I though kind of asked the question, well,
  • 30:10we don't necessarily just care about the people
  • 30:13in the trial.
  • 30:14This seems to be a very effective treatment.
  • 30:16What could we use this data to project out
  • 30:19sort of what the effects of the treatment would be
  • 30:22if it were implemented nationwide?
  • 30:25So we from CDC got estimates of the number of people
  • 30:28newly infected with HIV in 2006.
  • 30:32And basically, asked the question sort of if hypothetically,
  • 30:35everyone in that group were able to get HAART versus
  • 30:40standard combination therapy,
  • 30:42what would be the population impacts of this treatment?
  • 30:48In this case, because of sort of data availability,
  • 30:50we only had the joint distribution of age, sex and race
  • 30:55for the population.
  • 30:56So we made sort of a pseudo population, again,
  • 30:59sort of representing the U.S. population
  • 31:02of newly infected people.
  • 31:03But again, all we have is sex, race and age,
  • 31:06which I will come back to.
  • 31:08So this table documents the trial and the population.
  • 31:12So you can see for example,
  • 31:15that the trial tended to have more sort of 30 to 39 year
  • 31:20olds, many fewer people under 30.
  • 31:25The trial had more males and also had more whites
  • 31:29and fewer blacks, Hispanic was similar.
  • 31:32But I wanna flag and we'll come back to this in a minute
  • 31:35that, in what I'm gonna show,
  • 31:38we can adjust for the age, sex, race distribution.
  • 31:41But, there's a real limitation,
  • 31:43which is that the CD4 cell count as sort of a measure
  • 31:46of disease severity is not available in the population.
  • 31:50So this is a potential effect moderator,
  • 31:53which we don't observe in the population.
  • 31:56So in sort of projecting the impacts, we can say, well,
  • 31:59here is the predicted impact given the age, sex,
  • 32:03race distribution, but there's this unobserved
  • 32:06potential effect moderator that we sort of might be worried
  • 32:09about kind of in the back of our heads.
  • 32:15So again, I briefly mentioned this,
  • 32:17this is like the super basic description
  • 32:20of what can be done.
  • 32:22There are more nuances and I have some sites at the end
  • 32:24for sort of more details.
  • 32:26But basically fundamentally will, again,
  • 32:28we sort of think about it as we kind of stack
  • 32:30our data sets together.
  • 32:31So we put our trial sample and our population data set
  • 32:34together.
  • 32:35We have an indicator for whether someone is in the trial
  • 32:38versus the population.
  • 32:40And then, we're gonna wait the trial members
  • 32:43by their inverse probability of being in the trial
  • 32:46as a function of the observed covariance.
  • 32:48And again, very similar intuition and ideas
  • 32:51and theory underlying this as underlying things
  • 32:55like Horvitz-Thomson estimation in sample surveys
  • 32:58and inverse probability of treatment waiting
  • 33:01in non-experimental studies.
  • 33:06So I showed you earlier that age, sex and race
  • 33:09are all related to participation in the trial.
  • 33:13What I'm not showing you the details of,
  • 33:15but just trust me is that those factors also moderate
  • 33:19effects in the trial.
  • 33:20So the trial showed the largest effects for those ages,
  • 33:2430 to 39, males and black individuals.
  • 33:28And so, this is exactly why then what we might think
  • 33:31that the overall trial estimate might not reflect
  • 33:34what we would see population-wide.
  • 33:39Ironically though, it turns out actually
  • 33:40it kind of all cancels out.
  • 33:41So this table shows the estimated population effects.
  • 33:45So the first row again, is just the sort of naive trial
  • 33:48results.
  • 33:50We can then sort of weight by each characteristic
  • 33:52separately, and then the bottom row is the combined
  • 33:56age, sex, race adjustments.
  • 33:58And you can see sort of actually the hazard ratio
  • 34:01was remarkably similar.
  • 34:03It's partly because like the age weightings
  • 34:05sort of makes the impact smaller,
  • 34:07but then the race weighting makes it bigger.
  • 34:10And so then it kind of just washes out.
  • 34:13But again, it's sort of a nice example,
  • 34:15cause you can sort of see how the patterns
  • 34:17evolve based on the size of the effects
  • 34:20and the sample selection.
  • 34:23I also wanna point out though that, of course,
  • 34:25the confidence interval is wider,
  • 34:27and that is sort of reflecting the fact that we are doing
  • 34:30this extrapolation from the trial sample to the population.
  • 34:33And so there's sort of a variance price we'll pay for that.
  • 34:39Okay.
  • 34:40So I haven't been super formal on the assumptions,
  • 34:44but I'm I alluded to this?
  • 34:45So I wanna just take a few minutes to turn
  • 34:48to what about unobserved moderators?
  • 34:50Because again, we can interpret this 0.57
  • 34:54as the sort of overall population effect estimate
  • 34:58only under an assumption that there are no unobserved
  • 35:01moderators that differ between sample and population,
  • 35:06once we adjust for age, sex, race.
  • 35:11Okay, and in reality,
  • 35:14such unobserved effect moderators are likely the rule,
  • 35:17not the exception.
  • 35:18So again, sort of, as I just said,
  • 35:20the key assumption is that we've basically adjusted
  • 35:23for all of the effect moderators.
  • 35:26Very kind of comparable assumption to the assumption
  • 35:30of no an observed confounding in a non-experimental study.
  • 35:35And one of the reasons this is an important assumption
  • 35:38to think about, is that, it is quite rare actually
  • 35:42to have extensive covariate data overlap
  • 35:46between the sample and the population.
  • 35:48I have been working in this area for...
  • 35:51How many years now?
  • 35:52At least 10 years.
  • 35:53And I've found time and time again,
  • 35:56across a number of content areas,
  • 35:58that it is quite rare to have a randomized trial sample
  • 36:01and the target population dataset
  • 36:03with very many comparable measures.
  • 36:06So in the Stuart and Rhodes paper,
  • 36:08this was in like early childhood setting
  • 36:12and each data set, the trial and the population data
  • 36:15had like over 400 variables observed at baseline.
  • 36:19There were literally only seven that were measured
  • 36:22consistently between the two samples.
  • 36:25So essentially we have very limited ability then to adjust
  • 36:28for these factors because they just don't have much overlap.
  • 36:32So what that then motivated us to create some sensitivity
  • 36:37analysis to basically probe and say, well,
  • 36:40what if there is an unobserved effect moderator,
  • 36:43how much would that change our population effect estimate?
  • 36:47Again, this is very comparable to analysis of sensitivity,
  • 36:51to unobserved confounding and non-experimental studies
  • 36:54sort of adapted for this purpose of trial population,
  • 36:59generalized ability.
  • 37:03I think I can skip this in the interest of time and not go
  • 37:06through all the details.
  • 37:07If anyone wants the slides by the way,
  • 37:08feel free to email me, I'm happy to send them.
  • 37:13I'm gonna skip this too cause I've already said
  • 37:15sort of the key assumption that is relevant for right now,
  • 37:19but basically what we propose is,
  • 37:24I'm gonna talk about two cases.
  • 37:26So the easier case is this one where we're gonna assume
  • 37:29that the randomized trial observes all of the effect
  • 37:32moderators.
  • 37:33And the issue is that our target population dataset
  • 37:36does not have some moderators observed.
  • 37:41I think this is fairly realistic because at least
  • 37:43like to think that the people running the randomized trials
  • 37:47have enough scientific knowledge and expertise
  • 37:50that they sort of know what the likely effect moderators
  • 37:52are and that they measure them in the trial.
  • 37:55That is probably not fully realistic, but I'm...
  • 37:58I like to give them sort of the benefit of the doubt
  • 38:00on that.
  • 38:01And that sort of that's what the ACTG example,
  • 38:05was like CD4 count would be an example of this,
  • 38:07where we have CD4 count in the trial,
  • 38:11but we just don't have it in the population.
  • 38:14So what we showed is that there's actually,
  • 38:16a couple of different ways you can implement
  • 38:18this sort of sensitivity analysis.
  • 38:22One is essentially kind of an outcome model based one
  • 38:25where you,
  • 38:28basically, we just sort of specify a range
  • 38:30for the unobserved moderator V in the population.
  • 38:34So we kind of say, well, we don't know
  • 38:36the distribution of this moderator in the population,
  • 38:40but we're gonna guess that it's in some range.
  • 38:43And then, we kind of projected out using data from the trial
  • 38:48to understand like the extent of the moderation
  • 38:51due to that variable.
  • 38:53There's another variation on this,
  • 38:55which is sort of the weighting variation
  • 38:58where you kind of adjust the weights,
  • 39:00essentially again for this unobserved moderator.
  • 39:03Again, either way you sort of basically just have to specify
  • 39:07a potential range for this V, the unobserved moderator
  • 39:11in the population.
  • 39:14So here's an example of that.
  • 39:16This is a different example, where we were looking
  • 39:18at the effects of a smoking cessation intervention
  • 39:21among people in substance use treatment.
  • 39:24And in the randomized trial, the mean addiction score
  • 39:31was four.
  • 39:33But we didn't have this addiction score,
  • 39:35in the target population of interest.
  • 39:37And so, what the sensitivity analysis allows us to do
  • 39:40is to say, well, let's imagine that range is anywhere
  • 39:44from three to five.
  • 39:45And how much does that change our population effect
  • 39:49estimates?
  • 39:51Essentially, how steep this line is, is gonna be
  • 39:54sort of determine how much it matters.
  • 39:57And the steepness of the line basically
  • 39:59is how much of a moderator is it,
  • 40:02sort of how much effect heterogeneity is there in the trial
  • 40:05as a result of that variable.
  • 40:07But again, this is at least one way to sort of turn
  • 40:11this sort of worry about an unobserved moderator
  • 40:13into a more formal statement about how much
  • 40:16it really might matter.
  • 40:21I'm not gonna get into this partly,
  • 40:22so you might also be thinking, well,
  • 40:24what if the trial doesn't know what all the moderators are?
  • 40:27And what if there's some fully unobserved moderator
  • 40:31that will call U?
  • 40:34This is a much much harder, basically,
  • 40:36if anyone wants to try to dig into it, that would be great.
  • 40:39Part of the reason it's harder is because you have to make
  • 40:42very strong assumptions about the distribution
  • 40:44of the observed covariance and U together.
  • 40:48We put out one approach,
  • 40:49but it is a fairly special case and not very general.
  • 40:53So again, hopefully we're not in this sort of scenario
  • 40:56very often.
  • 41:01This is a little bit of a technicality,
  • 41:03but often epidemiologists ask this question.
  • 41:05So I've laid stuff out again with respect to kind of a risk
  • 41:09difference or a difference in outcomes
  • 41:12and sort of like more of like an additive treatment scale.
  • 41:15There is this real complication that arises,
  • 41:17which is that if you have like a binary,
  • 41:20like the scale of the outcome matters in terms of effect
  • 41:25moderation.
  • 41:26And in particular, there might be sort of more apparent
  • 41:30effect heterogeneity on one scale versus another.
  • 41:33So I'm just kind of flagging this, that like this exists,
  • 41:37there are some people sort of looking at this in more
  • 41:39formal, but again for now sort of just think about like risk
  • 41:44difference kind of scale.
  • 41:47Okay, great.
  • 41:48So let me just conclude with a few kind of final thoughts.
  • 41:51So, I think all of us, not all of us,
  • 41:54but often we sort of want to assume that study results
  • 41:58generalize.
  • 41:58Often people write a discussion section in a paper,
  • 42:01where they kind of qualitatively have some sentences
  • 42:05about why they do or don't think that the results
  • 42:08in this paper kind of extend to other groups
  • 42:10or other populations.
  • 42:13But I think until the past again, sort of five or so years,
  • 42:16a lot of that discussion was very hand-wavy
  • 42:19and sort of qualitative.
  • 42:21I think that what we are seeing in epidemiology
  • 42:24and statistics and bias statistics
  • 42:26recently has been a push towards having more
  • 42:29ability to quantify this and make it sort of more formal
  • 42:33statements.
  • 42:35So I think if we do wanna be serious though,
  • 42:37about assessing and enhancing external validity,
  • 42:41again, we really need these different pieces.
  • 42:43We need information on the factors that influence effect
  • 42:46heterogeneity the moderators.
  • 42:49We need information on the factors that influence
  • 42:51participation in rigorous studies like randomized trials.
  • 42:55And we need data on all of those things,
  • 42:57in the trial and the population.
  • 43:00And then finally, we need statistical methods that allow us
  • 43:04to use that data to estimate population treatment effects.
  • 43:08I would argue that that last bullet is sort of much further
  • 43:12along than any of the others.
  • 43:13That in my experience,
  • 43:15the limiting factor is usually not the methods.
  • 43:19The limiting factor at this point in time is the data
  • 43:22and sort of the scientific knowledge
  • 43:25about these different factors.
  • 43:29And that's what this slide is.
  • 43:30So I think I've already said, but that again,
  • 43:33is sort of one of the motivations for the sensitivity
  • 43:35analysis is just a recognition that it's often,
  • 43:39really quite hard to get data that
  • 43:42is consistently measured between a trial and a population.
  • 43:47So on that point, recommendations again,
  • 43:49if we wanna be serious about effect heterogeneity
  • 43:51or about estimating population treatment effects,
  • 43:55we need better information on treatment effect heterogeneity
  • 43:59that might be better analysis of existing trials,
  • 44:02that might be meta-analysis of existing trials.
  • 44:05That might also be theoretical models for the interventions
  • 44:07to understand what the likely moderators are.
  • 44:12We also need better information on the factors
  • 44:14that influence participation in trials and more discussion
  • 44:17of how trial samples are selected.
  • 44:22We need to standardize measures.
  • 44:23So again, it's incredibly frustrating when you have trial
  • 44:26and population data, but the measures in them are not
  • 44:30consistent.
  • 44:31There are methods that can be used for this,
  • 44:33some data harmonization approaches,
  • 44:36but, they require assumptions.
  • 44:39It's better if we can be thoughtful and strategic about,
  • 44:42for example, common measures across studies.
  • 44:45I will say one of the frustrations too,
  • 44:47is that in some fields like the early childhood data
  • 44:51I talked about,
  • 44:52part of the problem was like the two data sets might
  • 44:55actually have the same measure,
  • 44:56but they didn't give the raw data,
  • 44:58and they're like standardized scales differently.
  • 45:01Like they standardized them to their own population,
  • 45:03not sort of more generally.
  • 45:05And so they, weren't sort of on the same scale in the end.
  • 45:10As a statistician, of course, I will say we do need more
  • 45:12research on the methods and understanding when they work
  • 45:15and when they don't.
  • 45:16There are some pretty strong assumptions
  • 45:19in these approaches.
  • 45:20But again, I think that sort of in some ways,
  • 45:24that is further along and then some of the data situations.
  • 45:29So I just wanted to take one minute to flag some current
  • 45:32work in case partly if anyone wants to ask questions about
  • 45:34these.
  • 45:36One thing I'm kind of excited about,
  • 45:38especially in my education world is...
  • 45:42So what I've been talking about today has mostly been,
  • 45:44if we have a trial sample and we wanna project
  • 45:46to kind of a larger target population.
  • 45:49But there's an equally interesting question,
  • 45:51which is sort of how well can randomized trial informs
  • 45:54or local decision making?
  • 45:56So if we have a randomized trial with 60 schools in it,
  • 46:01how well can the results from that trial be used to inform
  • 46:04individual school districts decisions?
  • 46:07Turns out, not particularly well.
  • 46:09(laughs)
  • 46:10We can talk more about that.
  • 46:12I mentioned earlier, Issa Dahabreh, who's at Brown,
  • 46:15and he's really interested in developing sort of the formal
  • 46:18theories underlying different ways of estimating
  • 46:21these population effects, again, including some
  • 46:23doubly robust approaches.
  • 46:26Trang Nguyen, who works at Hopkins with me,
  • 46:29we are still looking at sort of the sensitivity analysis
  • 46:32for unobserved moderators.
  • 46:34I mentioned Hwanhee Hong already, who's now at Duke.
  • 46:37And she, again, sort of straddles the meta-analysis world
  • 46:40in this world, which has some really interesting
  • 46:43connections.
  • 46:45My former student now he's at Flatiron Health
  • 46:48as of a few months ago.
  • 46:50Ben Ackerman, did some work on sort of measurement error
  • 46:53and sort of partly how to deal with some of these
  • 46:55measurement challenges between the sample and population.
  • 47:00And then I'll just briefly mention Daniel Westreich at UNC,
  • 47:04who is really...
  • 47:05If you come from sort of more of an epidemiology world,
  • 47:09Daniel has some really nice papers that are sort of trying
  • 47:11to translate these ideas to epidemiology,
  • 47:14and this concept of what he calls target validity.
  • 47:17So sort of rather than thinking about internal and external
  • 47:20validity separately, and as potentially,
  • 47:23in kind of conflict with each other,
  • 47:26instead really think carefully about a target of inference
  • 47:29and then thinking of internal and external validity
  • 47:31sort of within that and not sort of trying to prioritize
  • 47:35one over the other.
  • 47:37And then just an aside, one thing,
  • 47:40I would love to do more in the coming years is thinking
  • 47:43about combining experimental and non-experimental evidence.
  • 47:46I think that is probably where it would be very beneficial
  • 47:49to go instead of more of that cross designed synthesis
  • 47:52kind of idea.
  • 47:55But again, I wanna conclude with this,
  • 47:57which is gets us back to design and that again,
  • 48:01sort of what is often the limiting factor here is the data
  • 48:04and just sort of strong designs.
  • 48:07So Rubin, 2005 with better data, fewer assumptions
  • 48:10are needed and then Light, Singer and Willett,
  • 48:13who are sort of big education methodologists.
  • 48:16You can't fix by analysis what you've bungled by design.
  • 48:19So again, just wanna highlight that if we wanna be serious
  • 48:22about estimating population effects,
  • 48:24we need to be serious about that in our study designs,
  • 48:27both in terms of who we recruit,
  • 48:30but then also what variables we collect on them.
  • 48:32But if we do that,
  • 48:33I think that we can have the potential to really help guide
  • 48:37policy and practice by thinking more carefully
  • 48:39about the populations that we care about.
  • 48:43So for more...
  • 48:44Here's this, there's my email, if you wanna email me
  • 48:47for the slides.
  • 48:49And thanks to various funders, and then I'll leave this up
  • 48:53for a couple minutes,
  • 48:55which are all big, tiny font, some of the references,
  • 48:59but then I'll take that down in a minute so that we can see
  • 49:01each other more.
  • 49:02So thank you, and I'm very happy to take some questions.
  • 49:14I don't know if you all have a way to organize
  • 49:16or people just can
  • 49:19jump in.
  • 49:24- So maybe I'll ask the question.
  • 49:25Thanks Liz, for this very interesting and great talk.
  • 49:29So I noticed that you've talked about the target population
  • 49:34in this framework.
  • 49:35And I think there are situations where the population sample
  • 49:39is actually a survey from a larger population.
  • 49:43- Yeah.
  • 49:44- Cause we do not really afford to absorb everything,
  • 49:47actual population, which will contain
  • 49:49like millions of individuals.
  • 49:50And so in that situation, does the framework still apply
  • 49:55particularly in terms of the sensitivity analysis?
  • 49:58And is there any caveat that we should also know in dealing
  • 50:01with those data?
  • 50:03- Great question.
  • 50:05And actually, thank you for asking that because I forgot
  • 50:07to mention that Ben Ackerman's dissertation,
  • 50:10also looked at that.
  • 50:11So I mentioned his measurement error stuff.
  • 50:13But yes, actually, so Ben's second dissertation paper
  • 50:17did exactly that, where we sort of laid out the theory
  • 50:21for when these the target population data
  • 50:24comes from a complex survey itself.
  • 50:29Short answer is yes, it all still works.
  • 50:31Like you have to use the weights, there are some nuances,
  • 50:34but, and you're right, like essentially,
  • 50:36especially like in...
  • 50:38Like for representing the U.S. population, often, the data
  • 50:41we have is like the National Health Interview Survey
  • 50:44or the Add Health Survey of Adolescents,
  • 50:47which are these complex surveys.
  • 50:49So short answer is, yeah, it still can work.
  • 50:53Your question about the sensitivity analysis is actually
  • 50:55a really good one and we have not extended...
  • 50:58I'd have to think, I don't know, off hand, like,
  • 51:00I think it would be sort of straightforward to extend
  • 51:04the sensitivity analysis to that, but we haven't actually
  • 51:07done it.
  • 51:08- Thanks Liz.
  • 51:11The other short question is that I noticed that
  • 51:12in your slide, you first define, PATE as population ate,
  • 51:16but then in one slide you have this Tate,
  • 51:19which I assume is target ate.
  • 51:21And so, I'm just really curious as to like, is there any,
  • 51:25like differences or nuances in the choice of this
  • 51:27terminology?
  • 51:29- Good question.
  • 51:30And no, yeah, I'm not...
  • 51:31I wasn't very precise with that, but in my mind, no.
  • 51:35Over time I've been trying to use Tate,
  • 51:38but you can see that kind of just by default,
  • 51:40I still sometimes use PATE.
  • 51:43Part of the reason I use Tate is because I think
  • 51:46the target is just a slightly more general term.
  • 51:48Like people sometimes I think, think if we meet,
  • 51:50if we say PATE, the population has to be like
  • 51:53the U.S. population or some like very sort of big,
  • 51:58very official population in some sense.
  • 52:01Whereas, the target average treatment effect,
  • 52:04Tate terminology, I think reflects that sometimes
  • 52:06it's just a target group that's well-defined.
  • 52:10- Gotcha.
  • 52:11Thanks, that's very helpful.
  • 52:12And I think we have a question coming from the chat as well.
  • 52:15- Yeah, I just saw that.
  • 52:16So I can read that.
  • 52:17We have theory for inference from a sample to a target
  • 52:20population needs to find that internal validity approaches,
  • 52:23what theory is there for connecting the internal validity
  • 52:25methods to external validity?
  • 52:29So I think, what you mean is sort of,
  • 52:33what is the formal theory for projecting the impact
  • 52:37to the target population?
  • 52:38That is exactly what some of those people that I referenced
  • 52:41sort of lay out.
  • 52:42Like I didn't...
  • 52:42For this talk, I didn't get into all the theoretical weeds,
  • 52:45but if you're interested in that stuff,
  • 52:46probably some of Issa Dahabreh's work would be the most
  • 52:49relevant to look at.
  • 52:51Cause he really lays out sort of the formal theory.
  • 52:54I mean, some of my early papers on this topic did it,
  • 52:58but his is like a little bit more formal and sort of makes
  • 53:01connections to the doubly robust literature
  • 53:04and things like that.
  • 53:04And so it's really...
  • 53:06Anyway, that's what this whole literature
  • 53:08and part of it is sort of building is that theoretical base
  • 53:11for doing this.
  • 53:17Any other questions?
  • 53:28- [Ofer] Liz,
  • 53:29I'm Ofer Harel.
  • 53:30- Oh, hi Ofer?
  • 53:31- [Ofer] Hi.
  • 53:33(mumbles)
  • 53:34Just jump on the corridor, so it's make it great.
  • 53:39So in most of the studies that I would work on,
  • 53:43they don't do really have a great idea about
  • 53:46what really the population is and how really to measure
  • 53:50those.
  • 53:51So it's great if I have some measure of the population,
  • 53:54but most of the time it is the studies that I work.
  • 53:57I have no real measurements on that population.
  • 54:02What happens then?
  • 54:03- Yeah, great question.
  • 54:04And in part, I meant to say this,
  • 54:06but that's one of the reasons why the analogy...
  • 54:08Why the design strategies don't always work particularly
  • 54:10well is like, especially when you're just starting out
  • 54:13a study, right?
  • 54:14We don't really know the target population.
  • 54:17I think certainly to do any of these procedures,
  • 54:21you need eventually to have a well defined population.
  • 54:25But I think that's partly why some of the analysis
  • 54:27approaches are useful is that,
  • 54:29you might have multiple target populations.
  • 54:31Like we might have one trial,
  • 54:33and we might be interested in saying,
  • 54:35how well does this generalize to the State of New Hampshire
  • 54:39or the State of Vermont or the State of Connecticut?
  • 54:41And so, you could imagine one study that's used to inform
  • 54:45multiple target populations.
  • 54:48With different assumptions,
  • 54:49sort of you have to think through the assumptions
  • 54:50for each one.
  • 54:52If you don't even,
  • 54:54I guess I would say if you don't even know
  • 54:56who your population is, you shouldn't be using these methods
  • 54:59at all, cause like the whole premise is that there is some
  • 55:02well-defined target population and you do need data on it
  • 55:05or at least...
  • 55:07Yeah, the joint distribution of some covariance
  • 55:09or something.
  • 55:10Without that, you're kind of just like,
  • 55:13I don't know, what a good analogy is,
  • 55:15but you're kinda just like guessing at everything.
  • 55:24(mumbles)
  • 55:26- No, go ahead.
  • 55:27Go ahead.
  • 55:29- Oh, Vinod, yeah.
  • 55:30All my friends are popping up, it's great.
  • 55:32(laughs)
  • 55:34- [Vinod] Can I go ahead?
  • 55:35I feel like I'm talking to someone.
  • 55:39- Yeah, go ahead Vinod.
  • 55:40- [Vinod] That was a great talk.
  • 55:42So I have a little ill formulated question,
  • 55:44but it's queuing after just the last question
  • 55:47that was asked is,
  • 55:49in clinical set populations where,
  • 55:55in some ways we're using this clinical samples
  • 55:58to learn about the population because unless they seek help,
  • 56:02we often don't know what they are in the wild, so to speak.
  • 56:05And so, each sampling of that clinical population
  • 56:09is a maybe by sampling of that larger population
  • 56:13in the wild.
  • 56:14So I guess my question is, how do you get around this,
  • 56:18I guess Rumsfeld problem, which is every time you sample
  • 56:22there's this unknown, unknown, but there's no way to get
  • 56:24at them because in some ways, your sampling relies on...
  • 56:27If we could say it relies on help seeking,
  • 56:30which is by itself as process.
  • 56:33And if we could just stipulate, there's no way to get
  • 56:35around that.
  • 56:36How do you see this going forward?
  • 56:40- Yeah, good question.
  • 56:40I think right, particularly relevant in mental health
  • 56:43research where there's a lot of people who are not seeking
  • 56:46treatment.
  • 56:47These methods are not gonna help with that in a sense
  • 56:50like again, they are gonna be sort of tuned to whatever
  • 56:53population you have.
  • 56:55I think though there are...
  • 56:57If you really wanna be thoughtful about that's
  • 57:00problem, that's where sort of some of the strategies
  • 57:03that were used like the Epidemiologic Catchment Area
  • 57:05Surveys, where they would go door to door and knock on doors
  • 57:08and do diagnostic interviews.
  • 57:11Like if we wanna be really serious about trying to reach
  • 57:14everyone and get an estimate of the really sort of true
  • 57:17population, then we really have to tackle that
  • 57:20very creatively and with a lot of resources probably.
  • 57:25- [Vinod] Thanks.
  • 57:27- Welcome.
  • 57:29- Hi Liz?
  • 57:30Yeah, it's gonna be a true question and great talk
  • 57:33by the way.
  • 57:35I'm curious, you mentioned there could be a slight
  • 57:38difference between the terms transportability
  • 57:40and generalizability.
  • 57:41Yeah, I'm curious about that.
  • 57:43- Yeah, briefly, this is a little bit of a...
  • 57:48What's the word?
  • 57:48Simplification, but briefly I think of generalizability
  • 57:51as one where the sample that, like the trial sample
  • 57:55is a proper subset of the population.
  • 57:57So we do a trial in New Hampshire,
  • 58:01and we're trying to generalize to new England.
  • 58:04Whereas transportability is one where it is not a proper
  • 58:08subset, so we do a trial in the United States
  • 58:10and we wanna transport to Europe.
  • 58:14Underlying both, the reason I don't worry too much about it,
  • 58:17the terms is because either way,
  • 58:19the assumption is essentially the same.
  • 58:21Like you still have to make this assumption about
  • 58:23no unobserved moderators.
  • 58:25It's just that it's probably gonna be a stronger assumption
  • 58:28and harder to believe,
  • 58:30when transporting rather than when generalizing.
  • 58:33Cause you sort of know that you're going from one place
  • 58:36to another in some sense.
  • 58:39- Thanks, makes sense.
  • 58:41- Sure.
  • 58:43- I think there's another question in the chat.
  • 58:45- Yeah, so this is a great question.
  • 58:46I'm glad shows you on.
  • 58:48I hope I got that.
  • 58:50It seems there are multiple ways to calculate the Tate
  • 58:53from standardization to waiting to the outcome model.
  • 58:55Do you have comments for their performance under different
  • 58:57circumstances?
  • 58:58Great question, and I don't.
  • 59:01I mean, there has been...
  • 59:02This is an area where I think
  • 59:04it'd be great to have more research on this topic.
  • 59:06So I have this one paper with Holger Kern and Jennifer Hill
  • 59:09where we sort of did try to kind of explore that.
  • 59:14And honestly, what we found not surprisingly
  • 59:16is that if that no unmeasured moderator assumption holds,
  • 59:20all the different methods are pretty good and fine.
  • 59:23And like, we didn't see much difference in them.
  • 59:25If that no unobserved moderator assumption doesn't hold
  • 59:28then of course, none of them are good.
  • 59:29So it sort of is like similar to propensity score world.
  • 59:33Like, the data you have is more important than what you do
  • 59:35with the data in a sense.
  • 59:38But anyway, I think that that is something that like,
  • 59:40we need a lot more work on.
  • 59:42One thing, for example, I do have a student working on this.
  • 59:45Like, we're trying to see if your sample
  • 59:47is a tiny proportion of the population, like how...
  • 59:51Cause like there's different.
  • 59:52That's one where like waiting might not work as well
  • 59:54actually, who knows.
  • 59:56Anyways, so like all of these different data scenarios,
  • 59:58I think need a lot more investigation to have better
  • 01:00:01guidance on when the different methods work well.
  • 01:00:09Anything else or maybe we're out of time?
  • 01:00:11I don't know, how tight you are at one o'clock.
  • 01:00:20- I think we're at an hour, so let's...