Skip to contents

Let YY be the outcome of interest, AA be the (binary, coded as 0 for control and 1 for treatment) treatment, SS be the indicator for study participation (so that S=1S=1 means that the subject is in the population of the original study, while S=0S = 0 means that the subject is in the target population), 𝐋\mathbf{L} be covariates to control for confounding in the original study and 𝐄\mathbf{E} be effect modifiers. Let Y0Y^0 and Y1Y^1 be counterfactual outcomes associated with control and treatment, respectively. The primary objective of transportability analysis is to estimate the ATE in the target population: ATE=E[Y1βˆ’Y0|S=0].ATE = \mathrm{E}[Y^1 - Y^0 \,|\,S = 0].

Simply taking the difference in sample means using the original study data will only unbiasedly estimate the quantity E[Y|A=1,S=1]βˆ’E[Y|A=0,S=1],\mathrm{E}[Y \,|\,A = 1, S = 1] - \mathrm{E}[Y \,|\,A = 0, S = 1], which is different from the target ATE due to confounding and the different distributions of effect modifiers.

Under the assumptions specified in the Assumptions vignette, we have

E[Ya|S=0]=ES=0E[Ya|𝐋,𝐄]=ES=0E[Ya|A=a,𝐋,𝐄]conditional exchangeability wrt treatment assignment=ES=0E[Ya|A=a,𝐋,𝐄,S=1]conditional exchangeability wrt study participation=ES=0E[Y|A=a,𝐋,𝐄,S=1]consistency.\begin{align*} \mathrm{E}[Y^a \,|\,S = 0] &= \mathrm{E}_{S = 0}\mathrm{E}[Y^a \,|\,\mathbf{L}, \mathbf{E}] & \\ &= \mathrm{E}_{S=0}\mathrm{E}[Y^a \,|\,A = a, \mathbf{L}, \mathbf{E}] & \textrm{conditional exchangeability wrt treatment assignment} \\ &= \mathrm{E}_{S=0}\mathrm{E}[Y^a \,|\,A = a, \mathbf{L}, \mathbf{E}, S = 1] & \textrm{conditional exchangeability wrt study participation} \\ &= \mathrm{E}_{S=0}\mathrm{E}[Y \,|\,A = a, \mathbf{L}, \mathbf{E}, S = 1] & \textrm{consistency}.\\ \end{align*}

Therefore, we can estimate E[Ya|S=0]\mathrm{E}[Y^a \,|\,S = 0] unbiasedly by averaging samples of E[Y|A=a,𝐋,𝐄,S=1]\mathrm{E}[Y \,|\,A = a, \mathbf{L}, \mathbf{E}, S = 1] over the distribution of 𝐋\mathbf{L} and 𝐄\mathbf{E} in the target data. We can obtain such samples by first fitting a regression model of YY in terms of AA, 𝐋\mathbf{L} and 𝐄\mathbf{E}, which essentially fits a model of E[Y|A=a,𝐋,𝐄,S=1]\mathrm{E}[Y \,|\,A = a, \mathbf{L}, \mathbf{E}, S = 1]. Then, this model is used to calculate fitted values of YY at observed values of 𝐋\mathbf{L} and 𝐄\mathbf{E} in the target data with AA being set to aa. The fitted values may be seen as β€œsamples” of E[Y|A=a,𝐋,𝐄,S=1]\mathrm{E}[Y \,|\,A = a, \mathbf{L}, \mathbf{E}, S = 1] where 𝐋\mathbf{L} and 𝐄\mathbf{E} are empirically drawn from the target data, so they are averaged to obtain an estimate of the counterfactual mean above. From this, the target ATE may be estimated unbiasedly, provided that the outcome model is specified correctly.

For more information, check out the β€œWhat If” book on causal inference (HernΓ‘n and Robins 2024), which includes a discussion of g-computation for confounding adjustment.

References

HernΓ‘n, MA, and JM Robins. 2024. Causal Inference: What If? Boca Raton: Chapman & Hall/CRC.