Skip to Main Content

BIS Seminar: Causal inference, competing events and mechanism

March 24, 2021
  • 00:10Fan (00:00:03):
  • 00:19Welcome, everyone. It's my pleasure to introduce our speaker today, Dr. Jessica Young. Dr. Young is an assistant professor in the Department of Population Medicine at Harvard Medical School. Her research focuses on the development and application of statistical methods for making valid causal inferences in longitudinal studies with complications, such as time-varying confounding, competing events, and censoring. I read a lot of her work personally and I think very highly of them. We're fortunate to have her today with us to share with us her work in addressing complications in causal inference with competing events. Welcome Jessica. The floor is yours.
  • 00:29Jessica Young (00:00:42):
  • 00:38Thank you. Thank you, Fan for the invitation and I am officially an associate professor now. I'm proud to announce so.
  • 00:48Fan (00:00:56):
  • 00:57Well, congratulations. [crosstalk 00:00:56]-
  • 01:07Jessica Young (00:00:56):
  • 01:07Very recent. Very recent. So I'm going to be speaking broadly about causal inference, competing events, and mechanism. I gave a very similar talk last week at E-Nor. If you heard that, you'll be hearing a lot of the same thing again, but we have more time this time, which is nice, so I can get into more details. So what are competing events? In failure time settings a competing risks or I prefer to call these competing events and hopefully, it will be clear why by the end of this talk.
  • 01:07This is any event that makes it basically impossible for the event we care about to subsequently occur. So as a running example in an early trial, investigators were interested in the causal effect of estrogen therapy versus placebo on specifically prostate cancer death in men who were recently diagnosed with prostate cancer, but some men in both study arms died of other causes, for example, heart attacks.
  • 01:07So in this example, death from other causes is a competing event for the event that they cared about because clearly an individual can't die of prostate cancer once he's died of something else. And importantly, competing events can occur in any kind of study design ranging from the perfectly executed randomized trial with perfect adherence, no loss to follow up, all the way over to an observational study where we don't intervene at all as investigators.
  • 01:07Many of you may be aware that across the spectrum of the statistics, epidemiology, and medical literatures, you can find probably hundreds of tutorials on how to analyze data with competing events and debates and recommendations on this is right for this, this is wrong for this, and so on and so forth. But the problem is that these discussions have paid limited, if any attention to the role of the causal question in determining the analysis and one thing that complicates this choice of causal question is that there are actually many ways that we can define a causal effect when competing events are present. And our usual go-to way of defining a causal effect when we don't have competing events create some interpretation problems in the setting.
  • 01:07So different effects that we can come up with while rely on different assumptions for identification, ranging from potentially very weak assumptions to incredibly strong assumptions in any given study, which in turn these things together are choice of effect, and then the assumptions we make to claim we can identify that effect in a given study are what determine our data analysis. So this consideration of the question and assumptions is really what we have to think about before we can talk about data analysis.
  • 01:07So in this talk, I'm basically going to review a number of widely considered targets of analysis when interest is either implicitly or explicitly in a causal treatment effect and competing events exist in the data. I would say the contrast that have the targets of analysis that I would classify as more implicit are counterfactual versions or contrasts of counterfactual versions of classical estimates from the survival analysis literature, where there has been less explicit discussion of causal reasoning and causal models.
  • 01:07These include the cause-specific cumulative incidents function, the cause-specific hazard and the marginal cumulative incidents. Those are the ones I'm going to focus on. And then there are alternative targets that STEM from the causal inference literature, which are much more explicit in terms of referring to causal effects, specifically the survivor average causal effect and the natural effects. But what I'm going to argue is that all of these historical targets may have severe interpretation limitations when competing events exist. And I'm going to hopefully have time to discuss some new notions of causal effect that I think overcome these limitations and separable effects. This work has been done led by Mattson.
  • 01:07So I'm going to motivate, go on with this example of a trial. I'm going to consider in terms of the data structure that we have. So what do we measure in the study? I'm going to consider the best possible case of this, not because we can't extend these ideas to a case where we have other complications, but really just to drill down on the idea that even when every other complication has gone, competing events create a problem in terms of coming up with a meaningful causal estimand.
  • 01:07So I'm going to consider an idealized version of this estrogen therapy trial where individuals, we just flip a coin to assign people to heads. You go to the estrogen therapy arm tales, by a go to the placebo arm. And we're able to know on every day over the course of say, five years. Let's say five years is the follow up of interest. We know whether someone died of prostate cancer by a particular day, K, so that's this YK indicator and we also know whether they died of another cause DK a competing event by K. I call this ideal because no one is lost to followup in the sense that I know this entire event process for everyone who's randomized at the starting of the study all the way up to five years. And we could also just presume that everybody adheres to what they were assigned to. So whether we wanted an intention to treat effect or a per protocol effect is irrelevant. There's no non-adherence in the trial.
  • 01:07So a key feature of this data structure that makes it different from other types of data structures and why we're going to call it a competing events data structure is that if an individual is known to experience the competing event by some day K over the followup, without having failed from what we care about, and I'll denote this type of person as their YK minus one is zero, but they're DK equals one. Then we actually know in the real world, the factual world, what their whole future event process is for the event we care about. It's deterministically zero.
  • 01:07In other words, in English, if you die of a heart attack by day three, you will never be dead of prostate cancer by day four or five, six, seven, eight, nine, it's impossible. So this is explicit way of making clear that this is not a missing data problem in the factual world, we know the full event process history. The problem is this determinism is not interesting and not what we want. So we're going to delve into that a little bit more.
  • 01:07And just to convince you that the way I defined this data structure is not crazy or is not out of step with how and competing risk data structure has been formulated, even in the classical literature is for small enough intervals. So I'm thinking of these as days. This data structure completely coincides with the classical competing risk data structure that's a Fine and Gray formulated in the early papers, where we just summarize this data structure in terms of a failure time T, which is the time to first failure from any cause.
  • 01:07C our censoring time in this case, the only form of censoring is administrative censoring because nobody's drops out. And then this indicator, which we'll just call epsilon, which is zero if you don't fail of anything over the five years, one, if you fail due to the cause they care about prostate cancer death, and two, if you failed due to something else. If I know this information, I know everything that I've just described in the observed data structure in terms of these event processes.
  • 01:07But the problem is that this summarized version of this data structure, which makes some additional assumptions that I don't make, in particular, the simultaneous existence of a censoring time and a failure time for the same individual, even an individual who dies is not necessary for one. But two, this structure is too highly summarized to be able to help us reason about defining causal effects, interpreting them, and importantly identifying them and the role of common causes of event processes over time by simply summarizing the data structure in this form. So this is why I'm actively choosing not to describe it this way, but these are completely consistent ways of defining data structure.
  • 01:07And finally, to the last part that's going to make the way I'm viewing this quite out of step with perhaps the traditional way of thinking about competing events, competing events are often referred to as a type of censoring. This is I believe an oversimplification that has led to some confusion. So competing events, at least as I define them again, consistent, not inconsistent with the way, say finding way to define them. They again, prevent the outcome of interest from happening in the factual world. This is just simply a feature of the observed data structure.
  • 01:07By contrast, censoring events are events that make the value of an outcome we care about missing, unobservable, whether that outcome of interest is factual or counterfactual. And when we're interested in causal effects, usually the outcomes we care about are counterfactual. So as we'll see for certain types of causal effects, again, causal effects are contrast in particular counterfactual outcomes, competing events will coincide with censoring events. But for many other types of causal effects they will not coincide with censoring events in the sense that they are not events that make the outcomes we care about missing. They will, just like in the real world in some settings, determine that in that counterfactual world. The event just simply cannot happen. So we'll go through some examples of this.
  • 01:07There's one more thing that will approach things in a different way. This is coming up a lot in a lot of our talks on this and so I think it's worth pointing out. Everything that I'm going to discuss, everything that we've discussed so far and that I will discuss applies equally, whether the running example is Y as prostate cancer death or prostate cancer diagnosis. And the last, the latter would often be characterized as a so-called semi-competing risks setting.
  • 01:07But importantly, the considerations that we need to make, and I think hopefully this will make this clear for defining, interpreting, and identifying, and estimating a treatment effect on either one of these outcomes are absolutely identical. There is no meaningful distinction between the case where the outcome we care about is a terminal event or a non-terminal event. The issue is whether that event we care about is subject to a competing event. And so this classification of competing risks versus semi competing risk, when we actually start with a causal question, we quickly see is an artificial distinction. It's not an important distinction. And we'll discuss this in a forthcoming commentary in biometrics.
  • 01:07So now we've defined the data structure and now we're going to consider options for defining a causal effect that might be of interest to these investigators who are claiming to care about the effect of estrogen therapy on specifically prostate cancer death. And I'm going to say classify this as the situation where the outcome of interest is subject to competing events. This is another thing I'm just pointing out because it sounds foreign to people who are more familiar with the classical competing risk literature. Precisely what I mean by subject two is, again, as I defined a competing event, that there exist events and ensure the outcome we care about can not subsequently occur again. This refers to something about the data structure, not the question.
  • 01:07Now we're going into, how can we define a meaningful causal question in this type of setting and then identify it with this type of data structure? So the go-to type of causal effect without competing events, we really wouldn't be agonizing over how to define a treatment effect, we would just go straight to an average treatment effect. The comparison of counterfactual outcomes had we say, given everybody estrogen therapy versus placebo, this would be fairly noncontroversial. Slightly, this is more controversial when the outcomes are subject to competing events.
  • 01:07So we can define the counterfactual value or indicator of the event of interest by K plus one, had possibly contrary to fact. We'd given him treatment level equals A, equals one is estrogen therapy equals zero placebo. Then the counterfactual contrast say, of the probability of failing from the event we care about by time K plus one under equals one versus equals zero happens to be a counterfactual contrast in this case, a discrete time version. But as we make intervals go infinitely small, this would coincide with what the classical literature is defined across specific cumulative incidents function.
  • 01:07This is a causal effect because it compares distributions of counterfactual outcomes under different treatment interventions, but in the same people. Any difference between these probabilities would have to be due to the treatment. There's no other explanation. In our ideal study, most of us know that because we randomized A, because there's no loss to follow up, because really that's all we need, then by the design of the trial, we have that this thing is simply identified by a contrast in the proportion of those who fail from prostate cancer by K plus one in the treatment arm versus the placebo arm.
  • 01:07So this sounds very simple, but the problem is that this is a total effect. An average treatment effect is always a total effect capturing all paths, all causal paths by which the treatment affects this outcome we care about. And by the nature of competing events, some of those pants are not interesting and can actually throw off our interpretation of the total effect. So here, this is a causal directed acyclic graph, representing an assumption underlying data generating process in our trial.
  • 01:07So here, we draw no common causes between A and other event processes because we just flipped a coin to assign A, but for everything else in the study, we did not flip a coin. So it's reasonable to assume that there are common causes of these event processes over time. So this allows, for example, a common cause Z, that could be a common cause of different event processes at different times, and also the same event processes over time. And I'm also just going to point out that we could have common causes that are unique to particular event processes. So early or late failure from prostate cancer death or earlier late failure from cardiovascular death, for example.
  • 01:07So this is just giving a general... Sorry, I don't know how to get rid of this guy here. Okay, here we go. So these paths from A through two YK plus one that include the blue arrows are going to be present when the treatment can affect competing events because these blue arrows are always there. They represent just the feature of a competing, the nature of the fact that if you die of a particular cause, you can't later die of something else. You can't be diagnosed for the first time with something else.
  • 01:07So I like to call this pathological mediation. That's how I think about it. That the paths through the blue arrows, I say make the total effect hard to interpret. I should probably qualify that when they can make the total effect very hard to interpret. It really depends on the situation. In this situation, this is the case. So in this example, the investigators found on the scale of the cause-specific cumulative of incidents, in other words, their estimates of the total effect where that estrogen therapy was found to protect against prostate cancer death.
  • 01:07There was a higher risk of prostate cancer death, cause-specific cumulative incidents in the placebo arm compared to the treatment arm. But the reverse was true for other deaths. So the problem is that, just looking at the total effect, we have no idea whether this protection of estrogen therapy against prostate cancer death, what that's due to. Is it due to something positive like action of estrogen that prohibits proliferation of cancer cells, or it could only be due to an action of estrogen that is damaging the heart, or causing some other pathological biological processes that is protecting you against prostate cancer deaths by killing you from other causes, or is it a combination?
  • 01:07So we can loosely reason about this, but looking at the total effect that doesn't tell us. So that doesn't tell you. It doesn't answer this question. So certainly estimating the total treatment effect on the competing event is useful in the sense that it could lend support to whether we believe that arrow from A to the Ds is there. But it doesn't solve this problem. It doesn't answer the question as to why we're seeing a protective effect of estrogen therapy. And what this really suggests is that the underlying question here is not a total effect. It's about some direct effect that avoids capturing paths that contain these blue areas.
  • 01:26Speaker 3 (00:20:44):
  • 01:35Is it possible for me to ask a question, Fan or would you rather we wait till the end?
  • 01:45Fan (00:20:50):
  • 01:54Yes, now I think Jessica is open to that.
  • 02:04Jessica Young (00:20:52):
  • 02:13Yeah, that's fine.
  • 02:23Speaker 3 (00:20:53):
  • 02:32All right, thanks. So Jessica thanks of this really important issue and it was great to see that you're working on it. I know it's a very challenging problem. But I thought that, I guess you're talking about just defining causal parameters, which I don't think has been done very well, so it's great that you're doing it. But I thought that the Fine and Gray estimator actually does answer the question, the first one in your bullet points. So in other words, it says, if there were no competing risks of the kind that you're mentioning that prohibit the event of interest to happen, then what's the impact of this intervention. So again, I know the Fine and Gray is an estimator, it's not a causal parameter. I thought that its interpretation is maybe what you're calling the direct effect.
  • 02:42Jessica Young (00:21:50):
  • 02:46Actually the Fine and Gray estimator is for a hazard ratio and it actually is not a direct effect. I am going to talk specifically about options for defining a direct effect. The Fine and Gray estimator, which is for and is basically estimator of the coefficient of a proportional hazards model for the so-called sub distribution hazard ratio. Unfortunately, I'm not, I wasn't planning on talking about the sub distribution hazard ratio, but really it's flavor is it actually can be linked to a total effect under the proportional hazards assumption.
  • 02:50I'm not really getting into assumptions like proportional hazards because to me, those are convenience assumptions that we make at the end. They're not, as you're saying. Really what we need to drill down on is what is it that we want to know? Once we start with that, we can always go down the line and then come up with some assumptions that might justify this estimator or this estimator under some additional restrictions. But generally the Fine and Gray approach is actually not going after a... But what Fine and Gray means, I'm talking about their sub distribution hazard ratio models.
  • 02:54Jason Fein has done a lot of different work on competing risks. And in fact, he's been advocating for something else that I will note. So I think I'll just leave it at that, but I'm also happy to answer more questions at the end. And we can, but for now, I guess what I'm asking you to do, which is hard is you almost let go of anything about estimaters and just think for the moment about what are meaningful questions. And then I will talk a bit about how sometimes certain questions will link to familiar estimation methods. Okay [crosstalk 00:23:55].
  • 03:01Speaker 3 (00:23:56):
  • 03:10Cool. Fine and Gray is also used to estimate the cumulative incidents, but we can move on.
  • 03:20Jessica Young (00:24:01):
  • 03:20Yeah, no, it is. And the cumulative, it is. And so the cause-specific cumulative incidents is a total effect because it captures those paths. Okay. So what are our options for a measure of direct effect? And this is really the million dollar question. So this is now a recommendation that has been made by Fine and others that where the recommendation is that when interest is in quote etiology, investigators should report so-called cause-specific hazard ratio. So this is now a recommendation that I'm seeing throughout the literature.
  • 03:20But the problem is that as I'm going to now argue, cause-specific hazard ratios are not causal effects. So it's possible that under particular assumptions, you could equate a cause-specific hazard ratio to some direct effect. But inherently cause-specific hazard ratios are not causal effects and I'll explain why. And just to go back, before we argued that this guy, this contrast and caused specific cumulative incidence functions is a causal effect because it compares counterfactual outcome distributions under different treatments, but in the same people.
  • 03:20And in general, hazard ratios of failure compare counterfactual outcomes under different treatments, in different people. Why is that? Well, what is it cause-specific hazard ratio? It's easier to understand these things in discrete time. And you can just think of the continuous time versions as taking limits as the interval length goes to zero. So we can simply understand the cause-specific hazard as the chance of failing from prostate cancer death. In particular, the chance that this indicator is equal to one by some particular time K plus one, among those who survived all causes of failure up to that point.
  • 03:20So the counterfactual hazard ratio. It is just comparing this inner world. Had we given everybody estrogen therapy versus instead, had we given everybody placebo? So this is a counterfactual contrast, sort of the immediate way to think about this, "Oh, this must be a causal effect." And in fact, this is completely identified as well in our trial, by the fact that we just flipped a coin to assign treatment and we measure the whole Y process and the whole D process. The problem is that if treatment affects these different event processes, either of these event processes, then by definition, those who survive under estrogen therapy are not guaranteed to be the same people who survive under placebo. But so this is comparing outcomes into different sets of individuals.
  • 03:20So it's possible we could make assumptions where we could claim that this equals some counterfactual contrast in the same individuals. One simple assumption is that the treatment doesn't affect anything under the knoll, then these groups are by definition the same because we can remove the label. But in general, this is not the case. Then our question is what direct effect are we actually answering in that case? So in general, this is not a causal effect. We can sometimes make assumptions to equate it, one simple one being the knoll. But more generally when the knoll is not true, when there is an arrow from A into D or even A into Y, even without competing events, then we generally cannot interpret a counterfactual contrast like that as a causal effect when we have these use here, either one of these use here.
  • 03:20And the reason I note that is because any definition of a direct effect that I'm now going to review, which again is an effect because they compare counterfactual outcomes in the same people, none of those notions of direct effects, regardless of any limitations I'm going to point out about them require measurement of these use to be identified or to be interpreted as direct effects. So these use can hang out there and we don't need to know anything about them. We only need to worry for direct effects about common causes of different event processes.
  • 03:20So this Z is going to need to be measured for any notion of direct effects, but for use not necessary. For the cause-specific hazard ratio to be interpreted as a causal effect, we would need to know all of these. So that just gives you a sense of the strength of assumptions you would need to start from that place. This is why I think it's better to start with the notion of causal effect. And then if you end up landing on a reasonable approximation with a hazards analysis, that's fine, but you have to think about how to justify it.
  • 03:20So one sort of very simple notion of direct effect that's been considered for a look back in that coincides with common ways of thinking about survival analysis, but also has a long history in the causal inference literature is the so-called control direct effect. So here, instead of defining why A, and I actually accidentally put the capital K here, but this could be any arbitrary follow-up time K. We can think about a counterfactual outcome, had we assigned an individual A and also somehow we prevented or eliminated competing events.
  • 03:20So in this case, we now have this different causal effect. This is a causal effect because it contrasts counterfactual outcomes under different interventions because this is A equals one A equals zero, but in the same people. This also happens to coincide with a contrast of what in the sort of classical survival analysis literature is called marginal cumulative incidents in contrast to the cause-specific cumulative incidents. So this is indirect effect. In this world, if we eliminated competing events, there's no way that this effect captures those types of blue paths because we've gotten rid of competing events somehow.
  • 03:20So this doesn't have that problem of the total effect. A downside, one downside is that it, this is not guaranteed identified even in our perfect trial because of those common causes Z. So in our trial, we did not control how D is assigned and so we have to worry about common causes of D and the outcome, which is Y. So we must've, we wanted something like this to claim we can identify it, we have to plan to measure adjustment covariance that would give us identifiability in a way that we don't need for the total effect.
  • 03:20One benefit or sort of appealing provided we did measure those Z's and or approximately measured them, this estimand leads us to very familiar estimation procedures. So this is getting into estimation. For example, this guy could be estimated by just contrasting the compliment of a weighted Kaplan-Meier survival estimator. For example, and you can use say, inverse probability of censoring weighting, where you incorporated Z into the weights. But the problem is that even beyond the issue of identification here and needing to measure covariates and make those types of assumptions that aren't guaranteed, or another word stronger identifying assumptions than we need for the total effect, even if we felt confident than we had estimated this thing well, it's not really clear what it means.
  • 03:20How does claiming that we know this treatment effect in a world where somehow we force nobody to die of anything other than prostate cancer, how does that inform any action or decision or policy. It's really not clear. So it gets rid of this quote, pathological mediation problem, but it's a very strange thing to try to interpret relative to a real world sort of clinical problem or policy problem.
  • 03:20So an alternative to the control direct effect that has sort of been argued as a way to overcome the problem of thinking about estimands where somehow we eliminate death or sort of generally implausible interventions is the so-called survivor average causal effect. So this we can think of as just a total effect, but in the subset of the original population who would never experience the competing event, regardless of which level of treatment we gave them.
  • 03:20So, as we were discussing before in the case where treatment doesn't affect the competing event, then this is an identifiable group of people, but in the case where it does, this is generally not something we can ever observe. We can't observe who in the study population would survive under estrogen therapy, as well as a placebo because either we give them placebo or we give estrogen therapy. We only get to observe what their competing events status in the real world is under one treatment level.
  • 03:20So like the control direct effect, this is going to rely on stronger assumptions. Then the total effect we'll have to measure those common causes Z, and we would account for them differently analytically than we would for the control direct effect. I'm not going to go through details of estimation. I'm focusing more on these interpretational issues, but the point is that, and again, this has the advantage in that it avoids this sort of pathological mediation problem. But the problem is that even if we could argue that, hey, we've estimated this, we think we've estimated this. We can't identify who these people are. And in fact, this subset may not exist, particularly in a setting where the treatment has a very strong effect on the competing events. So it's, again, not really clear how this informs an action or a disease or a policy.
  • 03:20So now we have another proposal that's been considered widely in the causal inference literature is the so-called natural or pure effects. So Robins and Greenland first defined the pure direct effect and an indirect analog, which is again, a contrast in this of counterfactual outcomes in the same people, but under different interventions. But unlike the control direct effect which considers interventions on D that simply sets them to zero, these interventions in this case would conceptualize an intervention where we assigned the status of competing events under the reference treatment.
  • 03:20So obviously, this left-hand guy is not an observable quantity in any real world trial or any real-world intervention because once we give somebody estrogen, we don't know what their... We can observe what their competing events status would be if we gave them placebo, for example. But again, like the other notions of direct effect, this avoidance that pathological mediation. It will rely on strong assumptions that require measurement of Z, but similarly, it's not actionable. And in fact, Robins and Greenland rejected this pretty much immediately for this reason, but the natural effects were later reintroduced and given a different name by you to Pearl, the natural effects and he has strongly advocated their utility.
  • 03:20So what interesting or what's very interesting is that in 2010 Robins and Richardson pointed out an interesting contradiction in a story that Pearl told in order to argue the utility of the natural effects for actionable decision-making. And the story that he told was about the effect of a modified version of the study treatment under assumptions that this modified treatment operates just like the study treatment, but with certain mechanisms removed.
  • 03:20So the example he gave was the original discussion was the natural direct and indirect effects of smoking, I believe it was on cardiovascular disease and perhaps the mediator. It was not a competing event setting, but the mediator was something like blood pressure. And so, because we can't directly intervene on blood pressure, his argument was, well, the natural effects are informative for real-world clinical decision-making or something real world because I could test these in a study where I created a modified cigarette with either nicotine or tar removed. Then he gives some assumptions on how nicotine and tar independently operate on cardiovascular disease, through blood pressure and outside of blood pressure.
  • 03:20So this was a very interesting insight and Robins and Richardson then said, "What you've just described is not a natural effect, but it is an actionable notion of direct effect that is very interesting." And they considered some assumptions under which this essentially modified treatment effect would quantify both direct and indirect effects of the current study treatment, and also could be identified by the data in the current trial, which only had measures of values of the current study treatment. So this original proposal is in this 2010 paper and I have a number of other citations of BN.
  • 03:20So later, Vanessa Didelez extended this idea to settings with survival outcomes and a time during mediator. But what's most relevant to our discussion here is that Matt Stensrud identified that this idea of these modified treatment effects are really the solution to coming up with meaningful notions of causal effect in competing events settings, when the total effect is not what we want as in this estrogen therapy example. So we now refer to this class of estimands as the separable effects, like all prior notions of effects that quantify mechanism that we just reviewed, the SES, the control direct effect, the natural effects.
  • 03:20This will always rely on stronger assumptions for identification than the total effect, which again, in a trial like the perfect one, we just talked about the total effect is guaranteed identified. But the distinction is that these effects, even though they require more assumptions, they may be of more direct interest to investigators and they actually have actionable implications. So I'll spend the next, the rest just going through this idea. Unless there's any questions, otherwise I'll just keep going.
  • 03:20So the separable effects rest on the idea of thinking about a modified treatment. So to define the separable effects, we can conceptualize a new study where instead of assigning the treatment A, for example, estrogen versus placebo, we assign combinations of two new treatments and I'm just going to call them AY and AD. And to interpret and identify the separable effects, interpret them in a way that relates to a direct effect of the current study treatment.
  • 03:20And then to identify them in data where we've only measured the current standard treatment, one foundational assumption we have to make is what we call the modified treatment assumption, which is that jointly assigning AY and AD to the same value A, so in our binary example, this would either be one or zero would lead to exactly the same values of the event process of interest and the competing event process had we assigned the study treatment to the level of A.
  • 03:20So again, we're going to need this assumption, both for identifying separate well effects, but also to interpret them as capturing either a direct, indirect, or we can even be more flexible as some particular path specific effect of day. So this assumption again, is generally going to be a strong assumption, but in principle, it could be tested in a future trial where we actually come up and assign with some, these modified treatments that we're thinking about, along with levels of A.
  • 03:20So an example where this would hold is when AY and AD are a physical decomposition of A like in Pearl's example. So thinking of a cigarette decomposition and to nicotine and tar, but in fact, a decomposition is not required. We could just come up with completely different treatments, as long as they have this modified treatment assumption in terms of how they would operate together with respect to the study treatment.
  • 03:20So this is just showing the causal DAG that we had before. So this represents an assumption on how the data in our current study is generated agnostic to the idea of the modified treatment assumption. And for now I'm going to be agnostic as to whether Z is actually measured or not because the first stage of the modified treatment assumption and the reasoning is about interpretation. So I'm just thinking of that I need to worry about Z and features around it in terms of interpretation, and then later I'm going to need to worry about whether I measured it to think about identification.
  • 03:20So this is an extended DAG that was proposed by in that 2010 paper by Robins and Richardson, which imagines, which considers a scenario like Pearl's where the modified treatment assumption holds through a treatment decomposition. So these bold arrows reflect the assumption that if AY and AD or decomposition of A, then in the actual data that we have, we do have measures of AY and AD, but we never have measures discordant measures of them.
  • 03:20So if you received estrogen therapy of AY and AD or physical decomposition of estrogen therapy, then if you got A equals one then you've got AY and AD equals one. And if you got placebo, A equals zero, then you got AY and AD equals zero. So this is just a way of reflecting that assumption. And then it'll also allow us to make mechanistic assumptions about how specifically these components, AY and AD operate on the competing event in the event we care about, which will be important for interpretation.
  • 03:20So the separable effects are then again, just another example of a contrast of counterfactual outcomes in the same people, but they are defined with respect to these new modifying treatments as opposed to the treatment A. So the separable effect of AY evaluated at some fixed AD is just the outcomes, had we assigned AY equals one versus AY equals zero, but fixed AD. And then similarly in the effect of AD is the counterfactual contrast under AD equals one versus AD equals zero, had we fixed AY.
  • 03:20So if we conducted a trial where we randomized all combinations of AY and AD, we could identify. We'd be guaranteed to identify any one of these separable effects. The issue though, that we're interested in is how to interpret them in terms of mechanism, the types of paths that they capture. And so then this is why we need additional assumptions about them. And also we need additional assumptions to be able to link them to say something about whether they quantify a particular mechanistic effect of A, the study treatment.
  • 03:20So basically this is just a repeat of what I said, the separable effects. They can be defined the same way regardless of how we interpret them, but the extended graph allows us to represent assumptions about how they would quantify particular mechanisms. So the paths that these effects actually capture rely on additional assumptions that we refer to as isolation assumptions or isolation conditions. So the extended graph that I just showed illustrates the assumption of full isolation, which leads to the purest interpretation of the separable effects.
  • 03:20So formerly, full isolation is satisfied if relative to the extended DAG. The only causal pounds from AY two D at any time are intersected by Y, and the only causal paths from AD to Y at any time are intersected by D. So in this particular case, the separable effects could be interpreted as direct and indirect effects. So going back to the graph, we can see that all the paths flowing out of AY into D at any time are intersected by Y, in other words, they don't contain any blue arrows. Whereas all of the paths flowing out of AD into Y are intersected by paths containing the blue arrows, in other words, intersected By D.
  • 03:20But we can weaken this assumption. So we can consider weaker isolation conditions, for example under what we call AY partial isolation. This would hold only one holds, but two could fail. And similarly, we call it AD partial isolation when only two holds, but one can fail. So just to show you an example, so these would show up when we have a scenario, which is not uncommon where the treatment affects Z common causes of the competing event, the event of interest. So this is an example of AY partial isolation, which allows that the AD component could affect Z, but it does make the constraint than AY does not.
  • 03:20So in this case, the AY separable effect still only contains paths that do not have blue arrows. In other words, it only includes paths from AY to D that are intersected by Y. But AD is not uniquely capturing paths with the blue arrows. It contains this path AD to Z to Y two, and also to Y four. So in this example, AY, the separable AY effect is a direct effect, but it doesn't contain all the direct effects because the AD effect contains some of these.
  • 03:20So this is just a way to think about interpretation when we make these different assumptions. But this could represent a scenario if the AD component is increasing the heart attack deaths, then the separable AY effect would be an improved treatment that maybe only contains the positive actions of estrogen therapy that maybe stop the proliferation of cancer cells without this harmful component of estrogen, which we actually know from trials could be hard.
  • 03:20So I don't know if I have to stop, but we have some other conditions are then needed. So this is about interpretation, but then we have to, of course, worry about identification, which then leads us to solutions for estimation. So to identify the separable effects, we need all the usual assumptions for the total effect, which would hold by design in our trial, but we're going to not surprisingly need some additional assumptions. And this would require assumptions about Z, which now I'm going to call L as something that I assume is measured.
  • 03:20So I'll just skip ahead. So I'm happy to share these slides and I have some examples of when these conditions would hold or fail. But basically at the end, like in any causal inference problems, so now we've stated our estimand. We've reasoned through our identifying assumptions, but now we have some clear understanding of interpretation and we land on a function of the data. We can think of it as like a G formula.
  • 03:20In fact this coincides with the identification formula for Pearl's mediation parameter that defines the natural effects. In this particular case, it does. This would be the identification formula that coincides with full isolation. It gets a little more complicated when you allow partial isolation, but then we just have a number of ways we can estimate this parametric estimator, semi-parametric, double or bust efficient influence function drive decimators and so on, all of that we can do.
  • 03:20And I'll just end with saying, stealing a quote from Thomas Richardson that I really like. So to define and interpret the separable effects. We're forced to be really transparent about what we do and don't understand about how our treatment study works or how it might work. And this is often a really intimidating idea for what we're used to in causal inference. But I liked Thomas Richardson's phrase that this is actually not a bug on the separate effects, we can think of it as a feature. Because if we're moving forward with something like the SES or a control direct effect, because we just can't handle thinking about the separable effects, that's basically saying, we don't really know how to interpret what we're doing here.
  • 03:20And if we really feel so unsure about this, then maybe we just stick with the total effect and we have to be honest about its interpretation limitations. Then I was just going to go back to this idea of censoring, which I promised I would, in the case. So we now have gone through a whole range of options for direct effects. I claim before that sometimes competing events or censoring events, in other words, they create missingness sometimes they're not. So the only estimand that we considered were competing events are censoring events is the control direct effect. Because in this case, the outcomes we care about our outcomes in a world where competing of nobody dies.
  • 03:20So clearly an individual who experiences a competing event, we don't get to observe what their outcome of interest would be had they not experienced the competing events. So this creates a source of missingness, but none of the other counterfactual outcomes that we considered, it would be the case that competing events create missingness. They would operate just like they operate in the real world. They ensure that you don't get the event. But even in that world, as in the case of the separable effects, we can still quantify mechanism and isolate out paths that are of interest. So these are just some references and that's it.
  • 03:39Fan (00:52:42):
  • 03:48Very nice talk, Jessica. Thank you so much. And let's see if the audience has any questions for you first. If anyone has any questions, please unmute yourself and then speak up. And if not, I can start with my first. So I've got two questions. The first one is that we are assuming a physical decomposition of the intervention, one for the events of interest and the one for the competing event. And so how granular should we go in practice? There are cases where we might have multiple components affecting each of these events of interest and then just this framework handle that easily, or is there a point to stop-
  • 03:58Jessica Young (00:53:37):
  • 04:00It really can, there's really no limit. So this is where I think this gets very interesting and philosophical. So I don't feel like the sort of theory and how to apply it in practice. I think it's still can use a lot of guidance and we're coming up with more applications that particularly right now in dementia that I'm hoping will be useful. But I think the way I think about it is that you just want to think about what is meaningful, why are these investigators asking this question?
  • 04:02Maybe they have some very clear mechanistic ideas about components. For example, on that Pearl example, it was very clean. You have nicotine and tar we know what makes up a cigarette. We can come up with some assumptions about this, maybe implicitly what Pearl was thinking about was this full isolation scenario, maybe we could push back and say, no, based on what we know about the mechanisms here with our subject matter experts, the best we can hope for is partial isolation or something like that.
  • 04:04But there also maybe settings where there's a lot of chemicals and things that are inactive, so you really only have to think about particular active chemicals. But really, I think the goal here is to inform, is there harmful in the current study treatment, which you would think that patients and doctors would be interested in. And then there's also the idea of how can we improve future treatments.
  • 04:06So this is really just something you really have to think through with a subject matter expert. So in Matt's papers... He's actually an MD by training, so he's really good at coming up. As a statistician I'm not as good because I just don't know the biology. So it really does require working with a subject matter expert. But if you look at, so for example, in the published paper that really only considers the full isolation case, and now we have this a second paper that goes into more of these more general things and allow partial isolation or more generally common causes affected by treatment. This is under a second review now at lifetime data analysis.
  • 04:08We have several examples where they walk through the reasoning process, so you can take a look at those. It's a liberal overwhelming because there's an infinite number of separable effects that you could potentially consider, and every single one is going to result in a different estimator. So this just pulls the hood off causal inference with competing events is not straight forward. It's very easy and to rely on total effects. And in fact, I strongly advocate for always starting with the total effect because it relies on the fewest assumptions. It's the closest thing to the real world that we have, but then you have to think about how to interpret it. And in fact, what's so interesting is how angry people get or how strongly people are tied to what they think is the right parameter.
  • 04:10So what I've noticed is that on the statistics community tends to heavily favor what I call the total effect. Whereas if you go to the epidemiology community where people are a lot more tied into subject matter, they completely reject that that would be a useful parameter. But they also don't realize how complicated it is to even define these effects, no less identify them and what you would need. So I think we have to think through lots of examples and see where... Maybe present some guidelines that people can copy, but you can consider a multi decomposition. You can think of AY and AD as just sort of right now, I don't exactly have candidates for these, but I'm posing a hypothetical treatment that maybe we can target in development to have these properties.
  • 04:12And the last thing I'll say is that we don't need to imagine a decomposition of the treatment. We can also think about just combinations of completely different treatments that we think would operate like this. So in this Justin paper, Matt's talks about castration, for example, as operating coupled with other treatments might operate just like the mechanisms of estrogen therapy that we're interested in that would not have these harmful effects on heart disease or things like that. So we can reason much more broadly than the decomposition.
  • 04:17Fan (00:58:53):
  • 04:26Thanks. My last question is that I know that this all sounds really nice and then we have that general formula to identify valid causal effect, and is any of these implemented in software already? I know that selling something to the biomedical community, usually they would worry about how can I actually do any of these pretty complicated?
  • 04:36Jessica Young (00:59:18):
  • 04:40So actually, I'm writing a grant now that will involve creating some, our packages, but we do have our code associated with all these papers. And it's actually not very hard to... Like anything else, in causal inference, you have easy to implement versions of estimators like AIP Weighted. So this could be estimated by just piecing together some weighted non-parametric hazard functions. Or you could use a G-computation estimator where you model all these things.
  • 04:44So in this case, there's no time varying else. It's just a baseline now. So you could just fit models for each of these things and predict them pieces together and then average, the same way you would with like a G-combats meter, or you could do an AIP Weighted version that would have some models for these guys, but in a weighted form. Or we have... We don't have this written down, but for the... We actually have results on what we call the conditional separable effects, which are... So one thing I didn't... And there we have the efficient influence function and VRS meters implemented, and the ideas would naturally extend here and we're going to write more about that.
  • 04:48So I don't think that the implementation is any more of a hurdle than any causal inference method with time-varying treatments and confounders, which of course, we still have hurdles with getting people comfortable with those methods in practice.