Chapter 3 Applying the CMAverse
If we use the CMAverse
package with the df
data set, we can estimate the Average Total Effect (\(ATE\)) and the Controlled direct effects (setting the mediator to 0) (\(CDE(M=0)\)).
\[\begin{align*} ATE &= \mathbb{E}(Y_{A=1}) - \mathbb{E}(Y_{A=0}) \\ CDE(M=0) &= \mathbb{E}(Y_{A=1,M=0}) - \mathbb{E}(Y_{A=0,M=0}) \end{align*}\]
In case of intermediate confounders affected by the exposure, the estimation based on regression coefficients cannot be used to estimate CDE and other direct and indirect effects. We need to use a g-computation, an IPTW estimator or a double-robust estimator.
3.1 Estimation by “parametric” g-computation
For example, we can estimate the two estimands ATE and CDE(M=0) on a risk difference scale as described below. In order to get estimates on the risk difference scale, the yreg
argument needs to be set to "linear"
.
In order to estimate CDE(M=0) by parametric g-computation, we need:
- a model of the outcome (note that the
exposure*mediator
interaction is correctly included) - a model of each of the intermediate confounder (2 models in our example).
Using those 3 models, we can simulated counterfactual values (under \(\{A=1,M=0\}\) and \(\{A=0,M=0\}\)) exactly as what we did in the introduction chapter to simulate the data set df
.
In the results, the model of the mediator is not needed for the CDE, but it is needed to estimate the interventional (stochastic) direct and indirect effects.
Note that for the model of the intermediate confounder occupation
, physical activity (phys
) was not included as a predictor whereas it was present in the corresponding equation of the data-generating system: Can this “misspecification” result in some bias? (I don’t know the answer, it might be interesting to test on simulations).
library(CMAverse)
set.seed(1234)
gformula_death_RD <- cmest(data = df,
model = "gformula", # for parametric g-computation
outcome = "death", # outcome variable
exposure = "edu", # exposure variable
mediator = "smoking", # mediator
basec = c("sex",
"low_par_edu"), # baseline confounders
postc = c("phys",
"occupation"), # intermediate confounders
EMint = TRUE, # exposures*mediator interaction
mreg = list("logistic"), # g(M=1|L1,A,L0)
yreg = "linear",# Qbar.L2 = P(Y=1|M,L1,A,L0)
postcreg = list("logistic", "logistic"),
astar = 0,
a = 1,
mval = list(0), # do(M=0) to estimate CDE(M=0)
estimation = "imputation", # if model= gformula
inference = "bootstrap",
boot.ci.type = "per", # percentiles, other option: "bca"
nboot = 2) # use a large number of bootstrap samples
## | | | 0% | |=================================== | 50% | |======================================================================| 100%
## Causal Mediation Analysis
##
## # Outcome regression:
##
## Call:
## glm(formula = death ~ edu + smoking + edu * smoking + sex + low_par_edu +
## phys + occupation, family = gaussian(), data = getCall(x$reg.output$yreg)$data,
## weights = getCall(x$reg.output$yreg)$weights)
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.0006107 0.0128884 -0.047 0.9622
## edu 0.0585572 0.0128876 4.544 5.59e-06 ***
## smoking 0.1122164 0.0130604 8.592 < 2e-16 ***
## sex 0.0633703 0.0084086 7.536 5.25e-14 ***
## low_par_edu 0.0652140 0.0094112 6.929 4.49e-12 ***
## phys -0.0168476 0.0084636 -1.991 0.0466 *
## occupation 0.0413486 0.0097641 4.235 2.31e-05 ***
## edu:smoking 0.1484815 0.0170875 8.689 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 0.1671299)
##
## Null deviance: 1905.1 on 9999 degrees of freedom
## Residual deviance: 1670.0 on 9992 degrees of freedom
## AIC: 10499
##
## Number of Fisher Scoring iterations: 2
##
##
## # Mediator regressions:
##
## Call:
## glm(formula = smoking ~ edu + sex + low_par_edu + phys + occupation,
## family = binomial(), data = getCall(x$reg.output$mreg[[1L]])$data,
## weights = getCall(x$reg.output$mreg[[1L]])$weights)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.85996 0.06143 -13.998 <2e-16 ***
## edu 0.70047 0.04402 15.911 <2e-16 ***
## sex 0.62499 0.04365 14.318 <2e-16 ***
## low_par_edu 0.41552 0.04772 8.707 <2e-16 ***
## phys -0.38703 0.04418 -8.761 <2e-16 ***
## occupation 0.59325 0.04946 11.993 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 13578 on 9999 degrees of freedom
## Residual deviance: 12685 on 9994 degrees of freedom
## AIC: 12697
##
## Number of Fisher Scoring iterations: 4
##
##
## # Regressions for mediator-outcome confounders affected by the exposure:
##
## Call:
## glm(formula = phys ~ edu + sex + low_par_edu, family = binomial(),
## data = getCall(x$reg.output$postcreg[[1L]])$data, weights = getCall(x$reg.output$postcreg[[1L]])$weights)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.38815 0.04765 8.145 3.78e-16 ***
## edu -0.34545 0.04205 -8.215 < 2e-16 ***
## sex 0.43714 0.04124 10.599 < 2e-16 ***
## low_par_edu -0.19185 0.04689 -4.092 4.28e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 13729 on 9999 degrees of freedom
## Residual deviance: 13519 on 9996 degrees of freedom
## AIC: 13527
##
## Number of Fisher Scoring iterations: 4
##
##
##
## Call:
## glm(formula = occupation ~ edu + sex + low_par_edu, family = binomial(),
## data = getCall(x$reg.output$postcreg[[2L]])$data, weights = getCall(x$reg.output$postcreg[[2L]])$weights)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.37498 0.05083 7.378 1.61e-13 ***
## edu 0.85924 0.04734 18.150 < 2e-16 ***
## sex 0.36173 0.04783 7.562 3.97e-14 ***
## low_par_edu 0.09333 0.05217 1.789 0.0736 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 11377 on 9999 degrees of freedom
## Residual deviance: 10977 on 9996 degrees of freedom
## AIC: 10985
##
## Number of Fisher Scoring iterations: 4
##
##
## # Effect decomposition on the mean difference scale via the g-formula approach
##
## Direct counterfactual imputation estimation with
## bootstrap standard errors, percentile confidence intervals and p-values
##
## Estimate Std.error 95% CIL 95% CIU P.val
## cde 0.0715898 0.0102149 0.0648080 0.079 <2e-16 ***
## rpnde 0.1415395 0.0124769 0.1284949 0.145 <2e-16 ***
## rtnde 0.1735670 0.0166478 0.1519051 0.174 <2e-16 ***
## rpnie 0.0242051 0.0002929 0.0222740 0.023 <2e-16 ***
## rtnie 0.0562325 0.0038779 0.0460777 0.051 <2e-16 ***
## te 0.1977720 0.0163549 0.1745726 0.197 <2e-16 ***
## rintref 0.0699497 0.0022620 0.0636869 0.067 <2e-16 ***
## rintmed 0.0320275 0.0041709 0.0234102 0.029 <2e-16 ***
## cde(prop) 0.3619816 0.0210889 0.3711416 0.399 <2e-16 ***
## rintref(prop) 0.3536883 0.0188551 0.3395706 0.365 <2e-16 ***
## rintmed(prop) 0.1619413 0.0100660 0.1340541 0.148 <2e-16 ***
## rpnie(prop) 0.1223887 0.0122998 0.1133772 0.130 <2e-16 ***
## rpm 0.2843301 0.0022338 0.2609549 0.264 <2e-16 ***
## rint 0.5156296 0.0087890 0.4871483 0.499 <2e-16 ***
## rpe 0.6380184 0.0210889 0.6005255 0.629 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (cde: controlled direct effect; rpnde: randomized analogue of pure natural direct effect; rtnde: randomized analogue of total natural direct effect; rpnie: randomized analogue of pure natural indirect effect; rtnie: randomized analogue of total natural indirect effect; te: total effect; rintref: randomized analogue of reference interaction; rintmed: randomized analogue of mediated interaction; cde(prop): proportion cde; rintref(prop): proportion rintref; rintmed(prop): proportion rintmed; rpnie(prop): proportion rpnie; rpm: randomized analogue of overall proportion mediated; rint: randomized analogue of overall proportion attributable to interaction; rpe: randomized analogue of overall proportion eliminated)
##
## Relevant variable values:
## $a
## [1] 1
##
## $astar
## [1] 0
##
## $mval
## $mval[[1]]
## [1] 0
The ATE = 0.1978 and the CDE(M=0) = 0.0716.
3.2 Estimation by MSM estimated by IPTW
The CDE(M=0) can also by estimated using Marginal structural models (MSM) estimated by IPTW.
In order to get estimations by IPTW, the model
argument needs to be set to msm
.
The cmest
function will estimate:
- a MSM for the outcome (depending only on the exposure and the mediator). Confounding is handled by weighting. This MSM can be used to estimate the controlled direct effect.
\[\begin{align*} \mathbb{E}(Y_{A=a,M=m}) &= \beta_0 + \beta_{A_{educ}} \times a + \beta_{M_{smoking}} \times m + \beta_{A \star M} \times (a \times m) \\ CDE(M=m) &= \mathbb{E}(Y_{A=1,M=m}) - \mathbb{E}(Y_{A=0,M=m}) = \beta_{A_{educ}} + \beta_{A \star M} \times m \end{align*}\]
- the weights \(sw = sw_A \times sw_M\) is needed to handle confounding in the MSM of the outcome, where \(sw_A\) is a weight balancing parents of the exposure \(A\) and \(sw_M\) is a weight balancing parents of the mediator. To calculate those weights, we need to specify the numerator and denominator for the mediator’s weight with the
wmnomreg
andwmdenomreg
arguments. The denominator for the exposure’s weight is specified with theereg
argument.
\[\begin{align*} sw = sw_A \times sw_M = \frac{P(A_i)}{P(A_i \mid L(0))} \times \frac{P(M_i \mid A)}{P(M_i \mid L(0),A,L(1))} \end{align*}\]
An MSM of the mediator is also estimated (depending only on the exposure, confounding is handled by weighting). This MSM is not useful to estimate the CDE, but is needed to estimated the “interventional” or “stochastic” natural direct and indirect effects.
set.seed(1234)
iptw_death_RD <- cmest(data = df,
model = "msm", # using MSM estimated by IPTW
outcome = "death", # outcome variable
exposure = "edu", # exposure variable
mediator = "smoking", # mediator
basec = c("sex",
"low_par_edu"), # baseline confounders
postc = c("phys",
"occupation"), # intermediate confounders
EMint = TRUE, # exposures*mediator interaction
ereg = "logistic", # exposure regression model g(A=1|L(0))
mreg = list("logistic"), # g(M=1|L1,A,L0)
yreg = "linear",# Qbar.L2 = P(Y=1|M,L1,A,L0)
postcreg = list("logistic", "logistic"), # Qbar.L1 = P(L1=1|A,L0)
wmnomreg = list("logistic"), #g(M=1|A) wgt nominator
wmdenomreg = list("logistic"), # g(M=1|L1,A,L(0)) wgt denominator
astar = 0,
a = 1,
mval = list(0), # do(M=0) to estimate CDE_m
estimation = "imputation", # if model= gformula
inference = "bootstrap",
boot.ci.type = "per", # for percentile, other option: "bca"
nboot = 2) # we should use a large number of bootstrap samples
## | | | 0%
## | |=================================== | 50%
## | |======================================================================| 100%
## Causal Mediation Analysis
##
## # Outcome regression:
##
## Call:
## glm(formula = death ~ edu + smoking + edu * smoking, family = gaussian(),
## data = getCall(x$reg.output$yreg)$data, weights = getCall(x$reg.output$yreg)$weights)
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.082933 0.008865 9.355 < 2e-16 ***
## edu 0.070447 0.012775 5.514 3.59e-08 ***
## smoking 0.119740 0.012960 9.239 < 2e-16 ***
## edu:smoking 0.142064 0.017184 8.267 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 0.1690611)
##
## Null deviance: 1880.9 on 9999 degrees of freedom
## Residual deviance: 1689.9 on 9996 degrees of freedom
## AIC: 10969
##
## Number of Fisher Scoring iterations: 2
##
##
## # Mediator regressions:
##
## Call:
## glm(formula = smoking ~ edu, family = binomial(), data = getCall(x$reg.output$mreg[[1L]])$data,
## weights = getCall(x$reg.output$mreg[[1L]])$weights)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.11692 0.03149 -3.713 0.000205 ***
## edu 0.78869 0.04174 18.895 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 13576 on 9999 degrees of freedom
## Residual deviance: 13214 on 9998 degrees of freedom
## AIC: 13197
##
## Number of Fisher Scoring iterations: 4
##
##
## # Mediator regressions for weighting (denominator):
##
## Call:
## glm(formula = smoking ~ edu + sex + low_par_edu + phys + occupation,
## family = binomial(), data = getCall(x$reg.output$wmdenomreg[[1L]])$data,
## weights = getCall(x$reg.output$wmdenomreg[[1L]])$weights)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.85996 0.06143 -13.998 <2e-16 ***
## edu 0.70047 0.04402 15.911 <2e-16 ***
## sex 0.62499 0.04365 14.318 <2e-16 ***
## low_par_edu 0.41552 0.04772 8.707 <2e-16 ***
## phys -0.38703 0.04418 -8.761 <2e-16 ***
## occupation 0.59325 0.04946 11.993 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 13578 on 9999 degrees of freedom
## Residual deviance: 12685 on 9994 degrees of freedom
## AIC: 12697
##
## Number of Fisher Scoring iterations: 4
##
##
## # Mediator regressions for weighting (nominator):
##
## Call:
## glm(formula = smoking ~ edu, family = binomial(), data = getCall(x$reg.output$wmnomreg[[1L]])$data,
## weights = getCall(x$reg.output$wmnomreg[[1L]])$weights)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.13313 0.03151 -4.225 2.39e-05 ***
## edu 0.81446 0.04178 19.493 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 13578 on 9999 degrees of freedom
## Residual deviance: 13192 on 9998 degrees of freedom
## AIC: 13196
##
## Number of Fisher Scoring iterations: 4
##
##
## # Exposure regression for weighting:
##
## Call:
## glm(formula = edu ~ sex + low_par_edu, family = binomial(), data = getCall(x$reg.output$ereg)$data,
## weights = getCall(x$reg.output$ereg)$weights)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.02910 0.04186 0.695 0.487
## sex -0.23178 0.04154 -5.580 2.41e-08 ***
## low_par_edu 0.63793 0.04599 13.871 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 13497 on 9999 degrees of freedom
## Residual deviance: 13285 on 9997 degrees of freedom
## AIC: 13291
##
## Number of Fisher Scoring iterations: 4
##
##
## # Effect decomposition on the mean difference scale via the marginal structural model
##
## Direct counterfactual imputation estimation with
## bootstrap standard errors, percentile confidence intervals and p-values
##
## Estimate Std.error 95% CIL 95% CIU P.val
## cde 0.070447 0.007889 0.067239 0.078 <2e-16 ***
## rpnde 0.137886 0.005815 0.137662 0.145 <2e-16 ***
## rtnde 0.164690 0.003025 0.168127 0.172 <2e-16 ***
## rpnie 0.022592 0.002277 0.020573 0.024 <2e-16 ***
## rtnie 0.049395 0.000512 0.050350 0.051 <2e-16 ***
## te 0.187281 0.005303 0.188700 0.196 <2e-16 ***
## rintref 0.067440 0.002075 0.067636 0.070 <2e-16 ***
## rintmed 0.026803 0.002789 0.026718 0.030 <2e-16 ***
## cde(prop) 0.376155 0.030641 0.356284 0.397 <2e-16 ***
## rintref(prop) 0.360098 0.020701 0.345418 0.373 <2e-16 ***
## rintmed(prop) 0.143118 0.018617 0.136462 0.161 <2e-16 ***
## rpnie(prop) 0.120629 0.008677 0.109013 0.121 <2e-16 ***
## rpm 0.263747 0.009939 0.257133 0.270 <2e-16 ***
## rint 0.503216 0.039318 0.481879 0.535 <2e-16 ***
## rpe 0.623845 0.030641 0.602550 0.644 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (cde: controlled direct effect; rpnde: randomized analogue of pure natural direct effect; rtnde: randomized analogue of total natural direct effect; rpnie: randomized analogue of pure natural indirect effect; rtnie: randomized analogue of total natural indirect effect; te: total effect; rintref: randomized analogue of reference interaction; rintmed: randomized analogue of mediated interaction; cde(prop): proportion cde; rintref(prop): proportion rintref; rintmed(prop): proportion rintmed; rpnie(prop): proportion rpnie; rpm: randomized analogue of overall proportion mediated; rint: randomized analogue of overall proportion attributable to interaction; rpe: randomized analogue of overall proportion eliminated)
##
## Relevant variable values:
## $a
## [1] 1
##
## $astar
## [1] 0
##
## $mval
## $mval[[1]]
## [1] 0
Applying an IPTW estimator using the CMAverse
, the ATE = 0.1873 and the CDE(M=0) = 0.0704.