The R package mlma is created for linear and nonlinear mediation analysis with multilevel data using multilevel additive models Yu and Li (2022). The vignette is composed of three parts. We first generate a simulated dataset. Based on the simulation, part I focuses on how to transform variables and prepare data for the mediation analysis. Part II walks through functions of the multilevel mediation analysis and Part III explains how to make inferences on multilevel mediation effects.

To use the R package mlma, we first install the package in R (`install.packages("mlma")`

) and load it.

We generate a dataset with two levels. In the simulation, there are 1 level one exposure that is binary and 1 level two exposure that is continuous. There are also two mediators, one at each level. The level one mediator is continuous while the level two mediator is binary. The variables are generated by the following code:

```
set.seed(1)
=20 # the number of observations in each group
n<-600/n # there are 30 groups
J=rep(1:J,each=n)
level<-0.8 #covariates coefficients
alpha_211<-0.8
alpha_1111<-0.8
alpha_2111<-0.4
beta_1<-0.4
beta_2<-0.4
beta_3<-0.4
beta_4<-0.4
beta_5=5 #the level 1 variance
v1=v1/5 #the level 2 variance
v2
#The exposure variables
<-rbinom(600,1,0.5) #binary level 1 exposure, xij
x1<-rep(rnorm(J),each=n) #continuous level 2 exposure
x2
#The mediators
<-rep(rbinom(J,1,exp(alpha_211*unique(x2^2))/(1+exp(alpha_211*unique(x2^2)))),each=n) #level 2 binary mediator
m2<-rep(rnorm(J,0,0.5),each=n) #level 2 variance for mij
u1<-rnorm(n*J) #level 1 variance for mij
e1<-u1+alpha_1111*x1+alpha_2111*x2+e1 #level 1 continuous mediator
m1
#The response variable
<-rep(rnorm(J,0,v2),each=n)
u0<-rnorm(n*J,0,v1)
e0<-u0+beta_1*x1+beta_2*x2+beta_3*ifelse(x2<=0,0,log(1+x2))+beta_4*m1+beta_5*m2+e0 y
```

The \(data.org\) function is used to do the transformation before the mediation analysis. In the function, the exposure variable(s) (\(x\)) and the mediator(s) (\(m\)) are required to input. The response variable (\(y\)) is required only when its level (\(levely\)) is not given. The argument \(levelx\) is to identify the levels of the exposure variable. \(levelx\) does not need to be provided. The function can automatically decide the level of the exposure variable(s). If any of the exposure variable is binary or categorical, \(xref\) is used to identify the reference group of the exposure variable.

Note that the name for one mediator in \(m\) should **not** be the subset of the name of another mediator.

The arguments \(l1\) and \(l2\) specify the column numbers in \(m\) the continuous mediators at level one or level two respectively. \(c1\) and \(c2\) refers to the categorical mediators where \(c1r\) and \(c2r\) specify the reference group respectively. \(l1, l2, c1\), and \(c2\) does not have to be provided. If not provided, the function \(data.org\) checks each column of \(m\) and decide whether each variable belongs to level 1 or 2, and be continuous or categorical.

\(level\) is a vector that record the group number for each observation. \(weight\) defines the weight of each observation.

\(f01y\) and \(f10y\) specify the desired transformation of exposures at level two or level one respectively in explaining the response variable. \(f01y\) and \(f10y\) are lists with the first item identify the column number of the exposure variable in \(x\) that needs to be transformed, and then in that order, each of the rest items list the transformation functional expressions for each exposure. For example, `f10y=list(1,c("x^2","log(x)"))`

means that column 1 of \(x\) is a level 1 exposure. It needs to be transformed to its square form and natural log form to predict the response variable. If not specified in \(f01y\) or \(f10y\), the exposure will keep its original format without transformation. In our simulation data, the level two exposure is transformed to itself, \(x_{.j}\) and \(I(x_{.j}>0)\times log(x_{.j}+1)\). Therefore, we define `f01y=list(2,c("x","ifelse(x>0,log(x+1),0)"))`

. Similarly, \(f02ky\) and \(f20ky\) defines the transformation of level two and level one mediators respectively in explaining \(y\).

\(f01km1\) and \(f01km2\) are arguments that define transformation of level two exposures in explaining level one or level two mediator(s) respectively. Since only higher or equivalent level variables can be used as predictors, level one exposures can only be predictors for level one mediators. \(f10km\) defines the transformation of level one exposure(s) in explaining level one mediator(s). In addition, when there are level two mediators but not level two exposure variable, the level one exposure variable(s) will be aggregated at level two to form the level two exposure variable(s). The first item of the the \(f01km1\), \(f01km2\) and \(f10km\) is a matrix of two columns, where the first column indicates the column number of the mediator in \(m\). The second column indicates the column number of the exposure in \(x\). By the order of the rows of the first item, each of the rest items of \(f01km1, f01km2\) and \(f10km\) list the transformation functional expressions for the exposure (identified by column 2) in explaining each mediator (identified by column 1). In our example, level two mediator \(m2\) is explained by the level two exposure \(x2\) in the form of \(x2^2\). Therefore, we set the argument `f01km2=list(matrix(c(2,2),1,2),"x^2")`

.

The following codes prepare for the data and perform the transformations. Note that the transformation functions can be set in different ways. Besides those in the example, we can also use the natural spline bases (e.g. `ns(x,df=5)`

) and piecewise functions (e.g. `ifelse(x>0,0,sqrt(x))`

).

```
<-data.org(x=cbind(x1=x1,x2=x2), m=cbind(m1=m1,m2=m2),
example1f01y=list(2,c("x","ifelse(x>0,log(x+1),0)")),
level=level,
f01km2=list(matrix(c(2,2),1,2),"x^2"))
```

The function \(mlma\) can be executed based on the results from \(data.org\) or on the original arguments of \(data.org\). In addition, the response variable needs to be set up by \(y\). If the response variable is categorical, \(yref\) is used to specify the reference group. The \(random\) argument is to set up the random effect part for the response variable and \(random.m1\) is for the mediators.

The argument \(covariates\) includes the data frame of all covariates for the response variable and/or mediators. For the response variable, covariates are defined as those variables used to explain \(y\), but are not related or caused by the exposure variable(s). Arguments \(cy1\) and \(cy2\) specify the column numbers of level one and two covariates respectively in \(covariates\). \(cm\) specifies the covariates for mediators.

If the joint effect of a group of mediators are of interest, the group can be set up with the \(joint\) argument. Finally, if users are interested in the medation effects on a new set of exposure and mediators. The new sets can also be set. Please read the menu of the package.

```
<-mlma(y=y,data1=example1,intercept=F)
mlma.e1
mlma.e1#> Level 2 Third Variable Effects:
#> TE DE m2.1 m1
#> x2 0.76 0.62 0.02 0.11
#> Level 1 Third Variable Effects:
#> TE DE m1
#> x1 0.65 0.5 0.14
```

The result of mediation effect analysis shows the mediation effect from different levels. The direct effect, indirect effects and total effect are shown for each exposure-response pair of variables. For the above example, the level 1 total effect from \(x1\) to \(y\) is \(1.04\), of which direct effect is \(0.5\), indirect effect from \(m1\) is \(0.54\). The level 2 total effect between \(x2\) and \(y\) is \(1.05\), in which the direct effect is \(0.62\), the indirect effect from \(m2\) is \(0.01\), and from \(m1\) is \(0.43\).

The \(summary\) function provides the ANOVA tests of the exposure variables and mediators in the full model to estimate the response variable. It also provides the ANOVA tests of the exposure variable(s) in predicting each mediator. Using the results, users can decide which variables should be included as mediators and which ones should be used as covariates and rerun the multilevel mediation analysis.

```
summary(mlma.e1)
#> 1. Anova on the Full Model:
#> Analysis of Deviance Table (Type III Wald chisquare tests)
#>
#> Response: y
#> Chisq Df Pr(>Chisq)
#> x1 1.3500 1 0.245285
#> x2.1 0.3080 1 0.578897
#> x2.2 0.0998 1 0.752076
#> m2.2 0.0403 1 0.840988
#> m1 7.5740 1 0.005922 **
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#>
#> 2. Anova on models for Level 1 mediators:
#> $m1
#> Analysis of Deviance Table (Type III Wald chisquare tests)
#>
#> Response: y
#> Chisq Df Pr(>Chisq)
#> x2 51.817 1 6.092e-13 ***
#> x1 111.038 1 < 2.2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#>
#>
#> 3. Anova on models for Level 2 mediators:
#> $m2
#> Analysis of Deviance Table (Type III tests)
#>
#> Response: y
#> LR Chisq Df Pr(>Chisq)
#> x2.2.1 9.5608 1 0.001988 **
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

To check the actual coefficients for each variable in the full model or in the model to predict level one or level two mediators, we can check the results from \(mlma\) directly.

```
$f1 #the full model
mlma.e1#> Linear mixed model fit by REML ['lmerMod']
#> Formula: y ~ x1 + x2.1 + x2.2 + m2.2 + m1 - 1 + (1 | level)
#> Data: data.frame(temp.data)
#> REML criterion at convergence: 3660.149
#> Random effects:
#> Groups Name Std.Dev.
#> level (Intercept) 1.182
#> Residual 5.040
#> Number of obs: 600, groups: level, 30
#> Fixed Effects:
#> x1 x2.1 x2.2 m2.2 m1
#> 0.5021 0.4120 0.5781 0.1328 0.4979
$fm1 #models for level 1 mediators
mlma.e1#> [[1]]
#> [1] 1
#>
#> [[2]]
#> Linear mixed model fit by REML ['lmerMod']
#> Formula: y ~ x2 + x1 - 1 + (1 | level)
#> Data: data.frame(temp.data)
#> REML criterion at convergence: 1863.169
#> Random effects:
#> Groups Name Std.Dev.
#> level (Intercept) 0.5675
#> Residual 1.0878
#> Number of obs: 600, groups: level, 30
#> Fixed Effects:
#> x2 x1
#> 0.8537 0.8955
$fm2 #models for level 2 mediators
mlma.e1#> [[1]]
#> [1] 2
#>
#> [[2]]
#>
#> Call: glm(formula = frml.m, family = binomial(link = "logit"), data = data.frame(temp.data))
#>
#> Coefficients:
#> x2.2.1
#> 1.277
#>
#> Degrees of Freedom: 30 Total (i.e. Null); 29 Residual
#> Null Deviance: 41.59
#> Residual Deviance: 32.03 AIC: 34.03
```

The \(plot\) function help depict the directions of mediaiton effects. Without specifying the mediator (by \(var\)), the \(plot\) function plots the overall medation effects.

```
plot(mlma.e1)
#> Error in plot.new(): figure margins too large
```

By specifying the mediator, the \(plot\) function shows the indirect effect of the mediator, and its marginal relationship with the response variable and with the exposure variable at each level.

```
plot(mlma.e1,var="m1")
#> Error in plot.new(): figure margins too large
plot(mlma.e1,var="m2")
#> Error in plot.new(): figure margins too large
```

Finally, the \(boot.mlma\) function uses the bootstrap method to estimate mediation effects and the estimated variances and confidence intervals. Again, the analysis can be built on the results from \(data.org\). The default number of bootstrap samples is \(100\), which can be changed to other desired numbers. The \(summary\) function for the output of \(boot.mlma\) gives the inference results for all mediation effects. Two confidence intervals are built up for the estimated mediation effects. (lwbd, upbd) is based on the normal approximation and (lwbd_quan, upbd.quan) is built by the quantiles of the bootstrap results.

```
<-boot.mlma(y=y,data1=example1,echo=F,intercept = F)
boot.e1summary(boot.e1)
#> MLMA Analysis: Estimated Effects at level 1:
#> $x1
#> te de m1
#> est 0.6468 0.5021 0.1447
#> mean 0.5176 0.4909 0.0268
#> sd 0.3691 0.3782 0.0975
#> upbd 1.2410 1.2322 0.2179
#> lwbd -0.2058 -0.2505 -0.1644
#> upbd.quan 1.1380 1.1508 0.2597
#> lwbd.quan -0.2448 -0.2705 -0.1395
#>
#> MLMA Analysis: Estimated Effects at level 2:
#> $x2
#> te de m2.1 m1
#> est 0.7554 0.6183 0.0238 0.1133
#> mean 0.7611 0.6574 0.0238 0.0798
#> sd 0.4362 0.2847 0.0084 0.3579
#> upbd 1.6159 1.2155 0.0403 0.7813
#> lwbd -0.0938 0.0993 0.0074 -0.6216
#> upbd.quan 1.5272 1.2840 0.0370 0.7615
#> lwbd.quan 0.0240 0.1685 0.0093 -0.6276
```

The \(plot\) function for the \(boot.mlma\) objects works similarly for the \(mlma\) objects but confidence intervals for estimations are added.

```
plot(boot.e1)
#> Error in plot.new(): figure margins too large
plot(boot.e1,var="m1")
#> Error in plot.new(): figure margins too large
plot(boot.e1,var="m2")
#> Error in plot.new(): figure margins too large
```

Yu, Qingzhao, and Bin Li. 2020. “Third-Variable Effect Analysis with Multilevel Additive Models.” *PLoS ONE* 15 (10). https://doi.org/https://doi.org/10.1371/journal.pone.0241072.

———. 2022. *Statistical Methods for Mediation, Confounding and Moderation Analysis Using r and SAS*. Chapman & Hall/CRC. https://www.routledge.com/Statistical-Methods-for-Mediation-Confounding-and-Moderation-Analysis/Yu-Li/p/book/9780367365479.

Yu, Qingzhao, Mandi Yu, J Zou, X Wu, SL Gomez, and Bin Li. 2021. “Multilevel Mediation Analysis on Time-to-Event Outcomes - Exploring Racial/Ethnic Disparities in Breast Cancer Survival in California.” *Research Methods in Medicine & Health Sciences*. https://doi.org/https://doi.org/10.1177/26320843211061292.