Chapter 7 Inverse Probability of Treatment Weighting (IPTW)
7.1 Estimation of the Average total effect
7.1.1 IPTW for the ATE
If the average total effect (ATE) is identifiable, \(\Psi_{ATE} = \mathbb{E}(Y_{A=1}) - \mathbb{E}(Y_{A=0})\) can be expressed using Inverse probability of treatment weighting (IPTW), denoting \(\mathbb{P}(A=a \mid L(0)) = g(A=a \mid L(0))\): \[\begin{equation} \Psi_{ATE} = \mathbb{E}\left( \frac{\mathbb{I}(A=1)}{g(A=1 \mid L(0))} Y \right) - \mathbb{E}\left( \frac{\mathbb{I}(A=0)}{g(A=0 \mid L(0))} Y \right) \end{equation}\]
The following steps describe the implementation of the IPTW estimator
Estimate the treatment mechanism \(g(A=1 \mid L(0))\)
Predict each individual’s probability of being exposed to her own exposure
Apply weights corresponding to the inverse of the predicted probability \(w_i = \frac{1}{\hat{g}(A = a_i \mid L(0)_i)}\)
Use the empirical mean of the weighted outcome \(Y\): \(\hat{\mathbb{E}}(Y_a) = \frac{1}{n} \sum_{i=1}^n \frac{\mathbb{I}(A_i=a)}{\hat{g}(A=a_i \mid L(0)_i)} Y_i\)
rm(list=ls())
df2_int <- read.csv(file = "data/df2_int.csv")
## 1. Estimate g
g.L <- glm(A0_PM2.5 ~ L0_male + L0_soc_env,
family = "binomial", data = df2_int)
## 2. Predict each individual's probability of being exposed to her own exposure
# predict the probabilities P(A0_PM2.5=1|L(0)) & P(A0_PM2.5=0|L(0))
pred.g1.L <- predict(g.L, type="response")
pred.g0.L <- 1 - pred.g1.L
# the predicted probability of the observed treatment A=a_i is :
gA.L <- rep(NA, nrow(df2_int))
gA.L[df2_int$A0_PM2.5 == 1] <- pred.g1.L[df2_int$A0_PM2.5 == 1]
gA.L[df2_int$A0_PM2.5 == 0] <- pred.g0.L[df2_int$A0_PM2.5 == 0]
## 3. Apply weights corresponding to the inverse of the predicted probability
wt <- 1 / gA.L
## 4. Use the empirical mean of the weighted outcome
# point estimates:
IPTW.death <- mean(wt * as.numeric(df2_int$A0_PM2.5 == 1) * df2_int$Y_death) -
mean(wt * as.numeric(df2_int$A0_PM2.5 == 0) * df2_int$Y_death)
IPTW.death
# [1] 0.08224947
IPTW.qol <- mean(wt * as.numeric(df2_int$A0_PM2.5 == 1) * df2_int$Y_qol) -
mean(wt * as.numeric(df2_int$A0_PM2.5 == 0) * df2_int$Y_qol)
IPTW.qol
# [1] -8.436797
The ATE estimates using IPTW for death probability and mean quality of life are respectively +8.2% and -8.44.
7.1.2 Stabilized IPTW for the ATE
If the average total effect (ATE) is identifiable, \(\Psi_{ATE}\) can be estimated using a stabilized IPTW estimator: \[\begin{equation} \hat{\mathbb{E}}(Y_1) - \hat{\mathbb{E}}(Y_0) = \frac{\frac{1}{n} \sum_{i=1}^n \frac{\mathbb{I}(A_i=1)\hat{g}^*(A_i=1)}{\hat{g}(A_i=1 \mid L(0)_i)} Y_i}{ \frac{1}{n} \sum_{i=1}^n \frac{\mathbb{I}(A_i=1)\hat{g}^*(A_i=1)}{\hat{g}(A_i=1 \mid L(0)_i)}} - \frac{\frac{1}{n} \sum_{i=1}^n \frac{\mathbb{I}(A_i=0)\hat{g}^*(A_i=0)}{\hat{g}(A_i=0 \mid L(0)_i)} Y_i}{ \frac{1}{n} \sum_{i=1}^n \frac{\mathbb{I}(A_i=0)\hat{g}^*(A_i=0)}{\hat{g}(A_i=0 \mid L(0)_i)}} \end{equation}\] The estimation algorithm is the same as for IPTW, but taking into account any non-null function of \(A\) (\(g^*(A_i=a)\)) in the denominator of the weight in step 3, and applying the stabilized estimator in step 4.
## 3. For example, applying g^*(A) = 1
## 4. Applying the stabilized estimator
# point estimates:
s.IPTW.death <- (mean(wt * as.numeric(df2_int$A0_PM2.5 == 1) * df2_int$Y_death) /
mean(wt * as.numeric(df2_int$A0_PM2.5 == 1))) -
(mean(wt * as.numeric(df2_int$A0_PM2.5 == 0) * df2_int$Y_death) /
mean(wt * as.numeric(df2_int$A0_PM2.5 == 0)))
s.IPTW.death
# [1] 0.08294185
s.IPTW.qol <- (mean(wt * as.numeric(df2_int$A0_PM2.5 == 1) * df2_int$Y_qol) /
mean(wt * as.numeric(df2_int$A0_PM2.5 == 1))) -
(mean(wt * as.numeric(df2_int$A0_PM2.5 == 0) * df2_int$Y_qol) /
mean(wt * as.numeric(df2_int$A0_PM2.5 == 0)))
s.IPTW.qol
# [1] -8.291992
The ATE estimates using stabilized IPTW for death probability and mean quality of life are respectively +8.3% and -8.29.
7.2 Estimation of the Controlled direct effect (CDE)
7.2.1 IPTW for the CDE
If the controlled direct effect (CDE) is identifiable, \(\Psi^{\text{CDE}_m} = \mathbb{E}(Y_{A=1,M=m}) - \mathbb{E}(Y_{A=0,M=m})\) can be expressed by the basic Horvitz Thompson estimator (using Inverse probability of treatment weighting (IPTW)), denoting \(\mathbb{P}\left(A=a \mid L(0)) = g(A=a \mid L(0)\right)\) and \(\mathbb{P}\left(M=m \mid L(0),A,L(1)) = g(M=m \mid L(0),A,L(1)\right)\): \[\begin{equation} \small \Psi^{\text{CDE}_m} = \mathbb{E}\left[ \frac{\mathbb{I}(A=1 \cap M=m)}{g(A=1 \mid L(0)) \times g(M=m \mid L(0),A,L(1))} Y \right] - \mathbb{E}\left[ \frac{\mathbb{I}(A=0 \cap M=m)}{g(A=0 \mid L(0)) \times g(M=m \mid L(0),A,L(1))} Y \right] \end{equation}\]
The following steps describe the implementation of the IPTW estimator
Estimate the treatment mechanisms \(g\left(A=1 \mid L(0)\right)\) and \(g\left(M=1 \mid L(0),A,L(1)\right)\)
Predict each individual’s probability of being exposed to her own exposure
Apply weights corresponding to the inverse of the predicted probability \(w_{A_i} = \frac{1}{\hat{g}(A = a_i \mid L(0)_i)}\) and \(w_{M_i} = \frac{1}{\hat{g}(M = m_i \mid L(0)_i,A_i,L(1)_i)}\)
Use the empirical mean of the weighted outcome \(Y\): \(\hat{\mathbb{E}}(Y_{a,m}) = \frac{1}{n} \sum_{i=1}^n \frac{\mathbb{I}(A_i=a \cap M_i=m)}{\hat{g}(A=a_i \mid L(0)_i) \times \hat{g}(M=m_i \mid L(0)_i,A_i,L(1)_i)} Y_i\)
rm(list=ls())
df2_int <- read.csv(file = "data/df2_int.csv")
## 1. Estimate gA and gM
gA.L <- glm(A0_PM2.5 ~ L0_male + L0_soc_env,
family = "binomial", data = df2_int)
gM.L <- glm(M_diabetes ~ L0_male + L0_soc_env + A0_PM2.5 + L1,
family = "binomial", data = df2_int)
## 2. Predict each individual's probability of being exposed to her own exposure
# predict the probabilities P(A0_PM2.5=1|L(0)) & P(A0_PM2.5=0|L(0))
pred.gA1.L <- predict(gA.L, type = "response")
pred.gA0.L <- 1 - pred.gA1.L
# the predicted probability of the observed treatment A_i=a is :
gAobs.L <- rep(NA, nrow(df2_int))
gAobs.L[df2_int$A0_PM2.5 == 1] <- pred.gA1.L[df2_int$A0_PM2.5 == 1]
gAobs.L[df2_int$A0_PM2.5 == 0] <- pred.gA0.L[df2_int$A0_PM2.5 == 0]
# predict the probabilities P(M=1|L(0),A,L(1)) & P(M=0|L(0),A,L(1))
pred.gM1.L <- predict(gM.L, type = "response")
pred.gM0.L <- 1 - pred.gM1.L
# the predicted probability of the observed treatment M_i=m is :
gMobs.L <- rep(NA, nrow(df2_int))
gMobs.L[df2_int$M_diabetes == 1] <- pred.gM1.L[df2_int$M_diabetes == 1]
gMobs.L[df2_int$M_diabetes == 0] <- pred.gM0.L[df2_int$M_diabetes == 0]
## 3. Apply weights corresponding to the inverse of the predicted probability
wt_A <- 1 / gAobs.L
wt_M <- 1 / gMobs.L
wt <- wt_A * wt_M
## 4. Use the empirical mean of the weighted outcome
# point estimates of CDE, setting M=0
CDE_IPTW_m0_death <- (mean(wt * as.numeric(df2_int$A0_PM2.5 == 1 &
df2_int$M_diabetes == 0) *
df2_int$Y_death) -
mean(wt * as.numeric(df2_int$A0_PM2.5==0 &
df2_int$M_diabetes == 0) *
df2_int$Y_death))
CDE_IPTW_m0_death
# [1] 0.05874684
CDE_IPTW_m0_qol <- (mean(wt * as.numeric(df2_int$A0_PM2.5 == 1 &
df2_int$M_diabetes == 0) *
df2_int$Y_qol) -
mean(wt * as.numeric(df2_int$A0_PM2.5==0 &
df2_int$M_diabetes == 0) *
df2_int$Y_qol))
CDE_IPTW_m0_qol
# [1] -5.341138
# point estimates of CDE, setting M=1
CDE_IPTW_m1_death <- (mean(wt * as.numeric(df2_int$A0_PM2.5 == 1 &
df2_int$M_diabetes == 1) *
df2_int$Y_death) -
mean(wt * as.numeric(df2_int$A0_PM2.5==0 &
df2_int$M_diabetes == 1) *
df2_int$Y_death))
CDE_IPTW_m1_death
# [1] 0.101733
CDE_IPTW_m1_qol <- (mean(wt * as.numeric(df2_int$A0_PM2.5 == 1 &
df2_int$M_diabetes == 1) *
df2_int$Y_qol) -
mean(wt * as.numeric(df2_int$A0_PM2.5==0 &
df2_int$M_diabetes == 1) *
df2_int$Y_qol))
CDE_IPTW_m1_qol
# [1] -8.185866
7.2.2 Stabilized IPTW for the CDE
If the controlled dired effect (CDE) is identifiable, \(\Psi^{CDE}\) can be estimated using a stabilized IPTW estimator (modified Horvitz Thompson estimator): \[\begin{equation} \small \hat{\mathbb{E}}(Y_{1,m}) - \hat{\mathbb{E}}(Y_{0,m}) = \frac{ \sum_{i=1}^n \frac{\mathbb{I}(A_i=1 \cap M_i=m)}{\hat{g}(A_i=1 \mid L(0)_i) \times \hat{g}(M_i=m \mid L(0)_i,A_i,L(1)_i)} Y_i}{\sum_{i=1}^n \frac{\mathbb{I}(A_i=1 \cap M_i=m)}{\hat{g}(A_i=1 \mid L(0)_i) \hat{g}(M_i=m \mid L(0)_i,A_i,L(1)_i)}} - \frac{ \sum_{i=1}^n \frac{\mathbb{I}(A_i=0 \cap M_i=m)}{\hat{g}(A_i=0 \mid L(0)_i) \times \hat{g}(M_i=m \mid L(0)_i,A_i,L(1)_i)} Y_i}{ \sum_{i=1}^n \frac{\mathbb{I}(A_i=0 \cap M_i=m)}{\hat{g}(A_i=0 \mid L(0)_i) \times \hat{g}(M_i=m \mid L(0)_i,A_i,L(1)_i)}} \end{equation}\] The estimation algorithm is the same as for IPTW, but applying the stabilized estimator in step 4.
## 4. Applying the stabilized estimator
# point estimates of CDE, setting M=0:
CDE_sIPTW_m0_death <- (mean(wt * as.numeric(df2_int$A0_PM2.5 == 1 &
df2_int$M_diabetes == 0) *
df2_int$Y_death) /
mean(wt * as.numeric(df2_int$A0_PM2.5 == 1 &
df2_int$M_diabetes == 0))) -
(mean(wt * as.numeric(df2_int$A0_PM2.5 == 0 &
df2_int$M_diabetes == 0) *
df2_int$Y_death) /
mean(wt * as.numeric(df2_int$A0_PM2.5 == 0 &
df2_int$M_diabetes == 0)))
CDE_sIPTW_m0_death
# [1] 0.0601292
CDE_sIPTW_m0_qol <- (mean(wt * as.numeric(df2_int$A0_PM2.5 == 1 &
df2_int$M_diabetes == 0) *
df2_int$Y_qol) /
mean(wt * as.numeric(df2_int$A0_PM2.5 == 1 &
df2_int$M_diabetes == 0))) -
(mean(wt * as.numeric(df2_int$A0_PM2.5 == 0 &
df2_int$M_diabetes == 0) *
df2_int$Y_qol) /
mean(wt * as.numeric(df2_int$A0_PM2.5 == 0 &
df2_int$M_diabetes == 0)))
CDE_sIPTW_m0_qol
# [1] -4.966328
# point estimates of CDE, setting M=1:
CDE_sIPTW_m1_death <- (mean(wt * as.numeric(df2_int$A0_PM2.5 == 1 &
df2_int$M_diabetes == 1) *
df2_int$Y_death) /
mean(wt * as.numeric(df2_int$A0_PM2.5 == 1 &
df2_int$M_diabetes == 1))) -
(mean(wt * as.numeric(df2_int$A0_PM2.5 == 0 &
df2_int$M_diabetes == 1) *
df2_int$Y_death) /
mean(wt * as.numeric(df2_int$A0_PM2.5 == 0 &
df2_int$M_diabetes == 1)))
CDE_sIPTW_m1_death
# [1] 0.09030186
CDE_sIPTW_m1_qol <- (mean(wt * as.numeric(df2_int$A0_PM2.5 == 1 &
df2_int$M_diabetes == 1) *
df2_int$Y_qol) /
mean(wt * as.numeric(df2_int$A0_PM2.5 == 1 &
df2_int$M_diabetes == 1))) -
(mean(wt * as.numeric(df2_int$A0_PM2.5 == 0 &
df2_int$M_diabetes == 1) *
df2_int$Y_qol) /
mean(wt * as.numeric(df2_int$A0_PM2.5 == 0 &
df2_int$M_diabetes == 1)))
CDE_sIPTW_m1_qol
# [1] -10.03045
7.3 Estimation of the PNDE and TNIE by Inverse Odds Ratio Weigthing
(Tchetgen Tchetgen 2013) described the estimation of the Pure Natural Direct Effect (PNDE) and Total Natural Indirect Effect (TNIE) using Inverse Oddes Ratio Weighting (IORW), when the the two effects are identifiable (Causal model 1, Figure 3.1).
A practical guidance is also given in (Nguyen et al. 2015).
The approach is particularly useful with multiple mediators, as we don’t need to estimate a model for each mediator of interest. We also don’t need to make any assumptions regarding the exposure-mediator interaction effects on the outcome.
The IORW approach relies on :
- a single model of the exposure conditional on the set of mediators and baseline confounders,
- and 2 models of the outcome, circonventing the need to specify the possible interactions between the exposure and mediators
The analysis relies on the following steps:
- Fit a standard multiple logistic regression model for the exposure \(A\), conditional on mediators \(M\) and baseline confounders \(L(0)\) and \(L(1)\)
- Compute an IORW weight by taking the inverse of the predicted odds ratio from step 1 for each observation in the exposed group (the unexposed group weight equals 1)
- Estimate the direct effect of the exposure via a glm of the regression of the outcome \(Y\) on the exposure \(A\) and baseline confounders \(L(0)\) and \(L(1)\), with the appropriate link function and the weights from step 2
- Estimate the total effect of the exposure \(A\) on the outcome \(Y\) using a standard glm with the appropriate link function (adjusted for baseline confounders)
- Calculate indirect effects by substracting the direct effect from the total effect
- Bootstrap effect estimates to get 95% confidence intervals
Note: Examples of sensitivity analysis to test the unmeasured confounding assumptions are also described in (Nguyen et al. 2015) (not shown below).
rm(list=ls())
df1_int <- read.csv(file = "data/df1_int.csv")
## 1) Fit a standard multiple logistic regression model for the exposure A,
## conditional on mediators M and baseline confounders L(0) and L(1)
g.A.L0 <- glm(A0_PM2.5 ~ M_diabetes + L0_male + L0_soc_env + L1,
family = "binomial",
data = df1_int)
summary(g.A.L0)
# Coefficients:
# Estimate Std. Error z value Pr(>|z|)
# (Intercept) -2.883345 0.080069 -36.011 < 2e-16 ***
# M_diabetes 0.562935 0.065539 8.589 < 2e-16 ***
# L0_male 0.374904 0.064828 5.783 7.34e-09 ***
# L0_soc_env 0.600956 0.073872 8.135 4.12e-16 ***
# L1 -0.007768 0.070141 -0.111 0.912
## 2) Compute an IORW weight by taking the inverse of the predicted odds ratio
## from step 1 for each observation in the exposed group (the unexposed
## group weight equals 1)
p <- predict(g.A.L0, type = "response")
iorw <- rep(NA, nrow(df1_int))
iorw[df1_int$A0_PM2.5 == 0] <- 1
iorw[df1_int$A0_PM2.5 == 1] <- ((1 - p[df1_int$A0_PM2.5 == 1]) /
p[df1_int$A0_PM2.5 == 1])
## 3) Estimate the direct effect of the exposure via a glm of the regression of
## the outcome Y on the exposure A and baseline confounders L(0) and L(1),
## with the appropriate link function and the weights from step 2
Dir.Y.model.death <- glm(Y_death ~ A0_PM2.5 + L0_male + L0_soc_env + L1,
weights = iorw,
family = "gaussian", # to get risk differences
data = df1_int)
summary(Dir.Y.model.death)
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 0.102447 0.009481 10.805 < 2e-16 ***
# A0_PM2.5 0.063862 0.008313 7.682 1.71e-14 ***
# L0_male 0.066841 0.008318 8.035 1.04e-15 ***
# L0_soc_env 0.056381 0.008625 6.537 6.57e-11 ***
# L1 0.091272 0.009103 10.026 < 2e-16 ***
PNDE <- coef(Dir.Y.model.death)["A0_PM2.5"]
# 0.06386221
## 4) Estimate the total effect of the exposure A on the outcome Y using a standard
## glm with the appropriate link function (adjusted for baseline confounders)
Tot.Y.model.death <- glm(Y_death ~ A0_PM2.5 + L0_male + L0_soc_env + L1,
family = "gaussian", # to get risk differences
data = df1_int)
summary(Tot.Y.model.death)
# Coefficients:
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 0.111516 0.008328 13.390 < 2e-16 ***
# A0_PM2.5 0.076673 0.012777 6.001 2.03e-09 ***
# L0_male 0.050049 0.008042 6.224 5.05e-10 ***
# L0_soc_env 0.060178 0.008413 7.153 9.07e-13 ***
# L1 0.080243 0.008813 9.106 < 2e-16 ***
# the total effect is 0.076673
## 5) Calculate indirect effects by substracting the direct effect from the total
## effect
TNIE <- coef(Tot.Y.model.death)["A0_PM2.5"] - PNDE
# 0.01281078
## 6) bootstrap effect estimates to get 95% confidence intervals
We can use the CMAverse
package to obtain those estimations by IORW.
rm(list=ls())
df1_int <- read.csv(file = "data/df1_int.csv")
library(CMAverse)
## Using the CMAverse to estimate PNDE and TNIE by IORW
res_msm_df1.death <- cmest(data = df1_int,
model = "iorw",
outcome = "Y_death",
exposure = "A0_PM2.5",
mediator = "M_diabetes",
basec = c("L0_male", "L0_soc_env","L1"),
postc = NULL,
# EMint = TRUE, # not needed for IORW
ereg = "logistic", # exposure regression model g(A=1|L(0))
yreg = "linear", # to get risk difference
# mreg = list("logistic"), # not needed for IORW
# wmnomreg = list("logistic"), # not needed for IORW
# wmdenomreg = list("logistic"), # not needed for IORW
astar = 0, #E(Y_{A=0,M=1})
a = 1, #E(Y_{A=1,M=1})
# mval = list(0), # not needed for IORW
estimation = "imputation",
inference = "bootstrap",
nboot = 2)
summary(res_msm_df1.death)
# Causal Mediation Analysis
#
# # Outcome regression for the total effect:
# Call:
# glm(formula = Y_death ~ A0_PM2.5 + L0_male + L0_soc_env + L1,
# family = gaussian(), data = getCall(x$reg.output$yregTot)$data,
# weights = getCall(x$reg.output$yregTot)$weights)
# Coefficients:
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 0.111516 0.008328 13.390 < 2e-16 ***
# A0_PM2.5 0.076673 0.012777 6.001 2.03e-09 ***
# L0_male 0.050049 0.008042 6.224 5.05e-10 ***
# L0_soc_env 0.060178 0.008413 7.153 9.07e-13 ***
# L1 0.080243 0.008813 9.106 < 2e-16 ***
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# # Outcome regression for the direct effect:
# Call:
# glm(formula = Y_death ~ A0_PM2.5 + L0_male + L0_soc_env + L1,
# family = gaussian(), data = getCall(x$reg.output$yregDir)$data,
# weights = getCall(x$reg.output$yregDir)$weights)
# Coefficients:
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 0.102447 0.009481 10.805 < 2e-16 ***
# A0_PM2.5 0.063862 0.008313 7.682 1.71e-14 ***
# L0_male 0.066841 0.008318 8.035 1.04e-15 ***
# L0_soc_env 0.056381 0.008625 6.537 6.57e-11 ***
# L1 0.091272 0.009103 10.026 < 2e-16 ***
#
# # Exposure regression for weighting:
# Call:
# glm(formula = A0_PM2.5 ~ M_diabetes + L0_male + L0_soc_env +
# L1, family = binomial(), data = getCall(x$reg.output$ereg)$data,
# weights = getCall(x$reg.output$ereg)$weights)
#
# Coefficients:
# Estimate Std. Error z value Pr(>|z|)
# (Intercept) -2.883345 0.080069 -36.011 < 2e-16 ***
# M_diabetes 0.562935 0.065539 8.589 < 2e-16 ***
# L0_male 0.374904 0.064828 5.783 7.34e-09 ***
# L0_soc_env 0.600956 0.073872 8.135 4.12e-16 ***
# L1 -0.007768 0.070141 -0.111 0.912
#
# # Effect decomposition on the mean difference scale via the inverse odds ratio weighting approach
#
# Direct counterfactual imputation estimation with
# bootstrap standard errors, percentile confidence intervals and p-values
#
# Estimate Std.error 95% CIL 95% CIU P.val
# te 0.076673 0.009812 0.072033 0.085 <2e-16 ***
# pnde 0.063862 0.003280 0.056996 0.061 <2e-16 ***
# tnie 0.012811 0.006532 0.015038 0.024 <2e-16 ***
# pm 0.167083 0.052652 0.208409 0.279 <2e-16 ***
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# (te: total effect;
# pnde: pure natural direct effect;
# tnie: total natural indirect effect;
# pm: proportion mediated)
7.4 Estimation of “Conditional” Randomized/Interventional Natural Direct (CRDE) and Indirect Effects (CRIE)
7.4.1 IPTW for the CRDE and CRIE
Zheng and van der Laan (Zheng and van der Laan 2017) described “Conditional” Randomized Direct and Indirect Effects, using random draws from the counterfactual distribution of the mediator, conditional on both \(L(0)\) and \(L(1)\). The Average Total Effect (ATE) can be decomposed into the sum of :
- a Conditional Randomized Natural Direct Effect (CRDE): \(\text{CRDE}=\mathbb{E}(Y_{1,\Gamma_{0} \mid L(0),L(1)})-\mathbb{E}(Y_{0,\Gamma_{0} \mid L(0), L(1)})\),
- and Conditional Randomized Natural Indirect Effect (CRIE) \(\text{CRIE}=\mathbb{E}(Y_{1,\Gamma_{1} \mid L(0), L(1)})-\mathbb{E}(Y_{1,\Gamma_{0} \mid L(0), L(1)})\).
Under the identifiability conditions, the quantity of \(\mathbb{E}(Y_{a,\Gamma_{a^\prime} \mid L(0), L(1)})\) can be estimated by IPTW: \[\begin{equation*} \Psi^{a,a^\prime}_\text{IPTW} = \mathbb{E}\left[Y_{a,\Gamma_{a^\prime} \mid L(0), L(1)} \right] = \frac{1}{n} \sum_{i=1}^n D_\text{IPTW}^{a,a^\prime} \end{equation*}\] where \[\begin{equation*} D_\text{IPTW}^{a,a^\prime} = Y \frac{I(A=a)}{p_A(A=a\mid L(0))} \frac{p_M(M \mid A=a^\prime,L(1),L(0))}{p_M(M \mid A=a,L(1),L(0))} \end{equation*}\] and the variance of the estimator \(\Psi^{a,a^\prime}_\text{IPTW}\) is \(\text{var}\left(\Psi^{a,a^\prime}_\text{IPTW}\right) = \frac{\text{var}\left(D_\text{IPTW}^{a,a^\prime}\right)}{n}\), which can be used to estimate 90% confidence intervals.
rm(list=ls())
df2_int <- read.csv(file = "./data/df2_int.csv")
## In order to estimate Psi^{a,a'}
## 1. Estimate gA and gM
gA <- glm(A0_PM2.5 ~ L0_male + L0_soc_env,
family = "binomial", data = df2_int)
gM <- glm(M_diabetes ~ L0_male + L0_soc_env + A0_PM2.5 + L1,
family = "binomial", data = df2_int)
## 2. Predict each individual's probability of being exposed to her own exposure
# predict the probabilities P(A0_PM2.5=1|L(0)) & P(A0_PM2.5=0|L(0))
pred.gA1 <- predict(gA, type = "response")
pred.gA0 <- 1 - pred.gA1
# the predicted probability of the observed treatment A_i=a is :
gAobs <- rep(NA, nrow(df2_int))
gAobs[df2_int$A0_PM2.5 == 1] <- pred.gA1[df2_int$A0_PM2.5 == 1]
gAobs[df2_int$A0_PM2.5 == 0] <- pred.gA0[df2_int$A0_PM2.5 == 0]
# predict the probabilities
# P(M=1|L(0),A=1,L(1)) & P(M=0|L(0),A=1,L(1)) setting A = 1
# and P(M=1|L(0),A=0,L(1)) & P(M=0|L(0),A=0,L(1)) setting A = 0
data.Ais0 <- data.Ais1 <- df2_int
data.Ais0$A0_PM2.5 <- 0
data.Ais1$A0_PM2.5 <- 1
pred.gM1.Ais1 <- predict(gM, newdata = data.Ais1, type = "response")
pred.gM0.Ais1 <- 1 - pred.gM1.Ais1
# the predicted probability of the observed treatment M_i=m is :
gMobs.Ais1 <- rep(NA, nrow(df2_int))
gMobs.Ais1[df2_int$M_diabetes == 1] <- pred.gM1.Ais1[df2_int$M_diabetes == 1]
gMobs.Ais1[df2_int$M_diabetes == 0] <- pred.gM0.Ais1[df2_int$M_diabetes == 0]
pred.gM1.Ais0 <- predict(gM, newdata = data.Ais0, type = "response")
pred.gM0.Ais0 <- 1 - pred.gM1.Ais0
# the predicted probability of the observed treatment M_i=m is :
gMobs.Ais0 <- rep(NA, nrow(df2_int))
gMobs.Ais0[df2_int$M_diabetes == 1] <- pred.gM1.Ais0[df2_int$M_diabetes == 1]
gMobs.Ais0[df2_int$M_diabetes == 0] <- pred.gM0.Ais0[df2_int$M_diabetes == 0]
## 3. Calculate D^{a,a'} - influence curve of the IPTW estimator
D.death.11 <- (df2_int$Y_death * (I(df2_int$A0_PM2.5 == 1) / gAobs) *
(gMobs.Ais1 / gMobs.Ais1))
D.death.10 <- (df2_int$Y_death * (I(df2_int$A0_PM2.5 == 1) / gAobs) *
(gMobs.Ais0 / gMobs.Ais1))
D.death.00 <- (df2_int$Y_death * (I(df2_int$A0_PM2.5 == 0) / gAobs) *
(gMobs.Ais0 / gMobs.Ais0))
D.qol.11 <- (df2_int$Y_qol * (I(df2_int$A0_PM2.5 == 1) / gAobs) *
(gMobs.Ais1 / gMobs.Ais1))
D.qol.10 <- (df2_int$Y_qol * (I(df2_int$A0_PM2.5 == 1) / gAobs) *
(gMobs.Ais0 / gMobs.Ais1))
D.qol.00 <- (df2_int$Y_qol * (I(df2_int$A0_PM2.5 == 0) / gAobs) *
(gMobs.Ais0 / gMobs.Ais0))
## 4. Calculate CRDE and CRIE
## For death
CRDE.death <- mean(D.death.10) - mean(D.death.00)
CRDE.death
# [1] 0.07405068
CRIE.death <- mean(D.death.11) - mean(D.death.10)
CRIE.death
# [1] 0.00819879
## For quality of life
CRDE.qol <- mean(D.qol.10) - mean(D.qol.00)
CRDE.qol
# [1] -7.563116
CRIE.qol <- mean(D.qol.11) - mean(D.qol.10)
CRIE.qol
# [1] -0.8736813
## 5. Calculate 95% CI based on the influence curve D^{a,a'}
# the variance of the estimator Psi^{a,a'} = mean(D^{a,a'}) is var(D^{a,a'}) / n)
se.CRDE.death <- sqrt(var(D.death.10 - D.death.00) / nrow(df2_int))
c(CRDE.death - qnorm(0.975) * se.CRDE.death,
CRDE.death + qnorm(0.975) * se.CRDE.death)
# [1] 0.04141535 0.10668602
se.CRIE.death <- sqrt(var(D.death.11 - D.death.10) / nrow(df2_int))
c(CRIE.death - qnorm(0.975) * se.CRIE.death,
CRIE.death + qnorm(0.975) * se.CRIE.death)
# [1] 0.003380238 0.013017342
se.CRDE.qol <- sqrt(var(D.qol.10 - D.qol.00) / nrow(df2_int))
c(CRDE.qol - qnorm(0.975) * se.CRDE.qol,
CRDE.qol + qnorm(0.975) * se.CRDE.qol)
# [1] -11.748095 -3.378137
se.CRIE.qol <- sqrt(var(D.qol.11 - D.qol.10) / nrow(df2_int))
c(CRIE.qol - qnorm(0.975) * se.CRIE.qol,
CRIE.qol + qnorm(0.975) * se.CRIE.qol)
# [1] -1.4111404 -0.3362223
Results are close to the estimations obtained by g-computation. IPTW estimations are know to be more sensitive to positivity issues, with larger confidence intervals (they can be too conservative with more than 95% coverage).
- the conditional “randomized” Natural Indirect effect is \(\text{CRIE} \approx +0.8\%, \quad 95\%CI=[0.3\%,1.3\%]\) on death and \(\text{CRIE} \approx -0.9, \quad 95\%CI=[-1.4,-0.3]\) on quality of life. This indirect effect corresponds to the specific path \(A \rightarrow M \rightarrow Y\) (and can also contain the mediated interactive effect of due to the \(A \ast M\) interaction on \(Y\)).
- the conditional “randomized” Natural Direct effect is \(\text{CRDE} \approx +7.4\, \quad 95\%CI=[4.1\%,10.7\%]\) on death and \(\text{CRDE} \approx -7.6, \quad 95\%CI=[-11.7,-3.4]\) on quality of life. This direct effect corresponds to the combination of the paths \(A \rightarrow Y\), \(A \rightarrow L(1) \rightarrow Y\) and \(A \rightarrow L(1) \rightarrow M \rightarrow Y\).
95% confidence intervals can be calculated by bootstrap.