Theory of target aggregate data adjustment transportability analysis

Let $Y$ be the outcome of interest, $A$ be the binary treatment (coded as 0 for control and 1 for treatment). $S$ be the indicator for study participation (so that $S=1$ means that the subject is in the population of the original study, while $S = 0$ means that the subject is in the target population). $\mathbf{L}$ be covariates to control for confounding in the original study and $\mathbf{E}$ be effect modifiers. Let $Y^0$ and $Y^1$ be counterfactual outcomes associated with control and treatment, respectively. The primary objective of transportability analysis is to estimate the average treatment effect (ATE) in the target population: $ATE = \mathrm{E}[Y^1 - Y^0 \,|\,S = 0].$

Simply taking the difference in sample means using the original study data will only unbiasedly estimate the quantity $\mathrm{E}[Y \,|\,A = 1, S = 1] - \mathrm{E}[Y \,|\,A = 0, S = 1],$ which is different from the target ATE due to confounding and the different distributions of effect modifiers.

Let $w_1 = \begin{cases}\frac{1}{P(A = 1 \,|\,\mathbf{L}, S = 1)} & \textrm{if } A = 1 \\ \frac{1}{P(A = 0 \,|\,\mathbf{L}, S = 1)} & \textrm{if } A = 0\end{cases}$ and $w_2 = \frac{P(S = 0 \,|\,\mathbf{E})}{P(S = 1 \,|\,\mathbf{E})}.$ To control for confounding, the estimator $\frac{1}{\sum_{i=1}^n w_{1,i}I(A_i = a)}\sum_{i=1}^n w_{1,i}YI(A_i = a)$ will unbiasedly estimate the quantity $\mathrm{E}[Y^a \,|\,S = 1],$ which uses the first set of weights $w_1$ and is the IP weighting approach in causal inference. However, to estimate the target ATE, the estimator $\frac{1}{\sum_{i=1}^n w_{1,i}w_{2,i}I(A_i = a)}\sum_{i=1}^n w_{1,i}w_{2,i}YI(A_i = a)$ should be used instead, which incorporates the second set of weights $w_2$ to unbiasedly estimate the target ATE. This is extended to estimate the coefficients of any marginal structural model in the target population in the same manner as IP weighting: more specifically, the marginal structural model coefficients are estimated by fitting regression models on the original study data with the weights $w_1w_2$ .

Assume that there is individual patient-level data (IPD) available for the study data, but only aggregate-level data (AgD) is available for the target data. The first set of weights, $w_1$ can be obtained using logistic regression on the study data. However, this is not possible for $w_2$ , so we use a method of moments (MoM) approach to obtain it instead. Let $\mathbf{X}$ denote effect modifiers that we wish to adjust for. Let $\mathbf{X}_{i}$ be the value of the effect modifiers for each individual patient $i$ in study sample. Let $\overline{\mathbf{X}}$ be the mean of $\mathbf{X}$ in the target sample.

We assume that the weight for individual patient $i$ is $w_i^\textbf{MoM} = \exp(\alpha + X_{i}\cdot \mathbf{k}).$ These weights implicitly mimic the inverse odds of participation weighting approach when IPD for target data is available. Instead of logistic regression, the coefficients $\alpha$ and $\mathbf{k}$ in the weight will be estimated to satisfy the equality constraint: $\frac{\sum_{i=1}^{N} w_2\mathbf{X}_{i}}{\sum_{i=1}^{N}w_2} = \overline{\mathbf{X}}.$

Once $w_2$ is obtained, the remainder of the analysis proceeds as for inverse odds of participation weighting. For more information, check out the “What If” book on causal inference (Hernán and Robins 2024) and the introduction of method of moments by Phillippo et al. (Phillippo et al. 2018) originally in indirect treatment comparison (ITC) application.

References

Hernán, MA, and JM Robins. 2024. Causal Inference: What If? Boca Raton: Chapman & Hall/CRC.

Phillippo, David M, Anthony E Ades, Sofia Dias, Stephen Palmer, Keith R Abrams, and Nicky J Welton. 2018. “Methods for Population-Adjusted Indirect Comparisons in Health Technology Appraisal.” Medical Decision Making 38 (2): 200–211.

Core Clinical Sciences

References