# Time Varying Mediation Function: Continuous Outcome and Two Treatment Groups

yajnaseni.chakraborti@temple.edu dcoffman@temple.edu harryzobeljr@gmail.com

## Introduction

The purpose of this vignette is to provide users with a step-by-step guide for performing and interpreting the results of a time varying mediation analysis with 2 treatment (exposure) groups and a continuous outcome. Please note, that this package has been built considering the structure of panel data, where each subject/participant has repeated responses (collected over time) for the outcome and mediator variables. However, we do not address dynamic treatment regimens in this package. Therefore, we assume the scenario where the treatment (exposure) is time-invariant (i.e., does not change over time). For more details, please refer to Cai et al., 2020 (the paper can be obtained at https://arxiv.org/abs/2008.11797).

### Data

For illustration, we rely on an example dataset, simulated based on the Wisconsin Smokers’ Health Study 2 data (Baker et. al., 2016) which includes 1086 individuals assigned to one of three treatment conditions. One-third of participants received only a nicotine patch; another one-third received varenicline, and the final third of participants received a combination nicotine replacement therapy (NRT) which included nicotine patch + nicotine mini-lozenge. The outcome of interest is cessation fatigue; that is, how tired a participant felt of trying to quit smoking (7-point Likert scale). In addition, mediator variables were measured by asking participants if they felt a negative mood in the last fifteen minutes, and whether they wanted to smoke in the last 15 minutes, also recorded on a 7-point Likert scale. Both the outcome and mediator variables were assessed two times per day for the first two weeks post-quit day (rendering 30 time points of response since assessments start on day 0 post target quit day), and every other day (2x per day) for weeks 3 and 4 (rendering 14 time points of response).

A traditional approach to analyzing this type of data would be to use mediation analysis in which the effects are assumed to not vary as a function of time. First, a single (i.e., time-invariant) direct effect would be calculated by regressing the outcome on the treatment condition and mediator. Next, a time-invariant indirect effect would be computed by multiplying the effect of treatment condition on the mediator by the effect of the mediator on the outcome. However, this method potentially misses important information about the dynamic effect that a mediator may have over time. Specifically, we hypothesize that mood changes across and within days and thus, its mediating effect on one’s success of quitting smoking is likely to vary over time. We therefore propose a time varying mediation analysis which estimates the mediation effect as a function that varies over time.

The tvma function was developed for estimating the mediation effect of two treatment (exposure) groups. To illustrate the analysis, we have considered the scenario of varenicline vs. not varenicline i.e., the subjects taking nicotine patch or combination NRT were combined into a single group.

## Getting started

To use the time varying mediation analysis package in R, you must first install the package and load it. Before that, make sure you have R version 4.0.3 or greater. There are two ways to install the package from the CRAN (Comprehensive R Archive Network) repository, by using install.packages or the devtools function.

install.packages("tvmediation", dependencies = TRUE)
library(tvmediation)

The equivalent code using devtools is:

devtools::install_cran("tvmediation", dependencies = TRUE)
library(tvmediation)

If you do not have devtools installed and loaded, you can do so using the following code:

install.packages("devtools", dependencies = TRUE)
library(devtools)

Alternatively, if you want to install the package directly from the GitHub repository to access new or revised functions in development, use the following code:

devtools::install_github("dcoffman/tvmediation", dependencies = TRUE)
library(tvmediation)

## Formatting your data before calling the tvma function

Once installed, you can type ?tvmediation in the console to view the package documentation, as well as links to the important functions and data included in the package. The time-varying mediation analysis for continuous outcomes and two exposure groups, relies on two user functions, tvma and LongToWide, as well as a number of internal functions of the tvmediation package.

The tvma function has four required and five optional arguments.

1. treatment A binary vector indicating the treatment group
2. t.seq A numeric vector of the time sequence of the measures
3. mediator The matrix of mediator values in wide format
4. outcome The matrix of outcome values in wide format

The optional arguments are:

1. t.est The time sequence for estimation. This is by default equal to t.seq.
2. plot TRUE or FALSE for plotting the mediation effect. The default value is “FALSE”.
3. CI “none” or “boot” method of deriving confidence intervals (CIs). The default value is “boot”.
4. replicates Number of replicates for bootstrapping CIs. The default value is 1000.
5. verbose TRUE or FALSE for printing results to screen. The default value is “FALSE”.

The dataset we will use for our illustration is named smoker and is included in the package.

To load the simulated dataset smoker.rda, type:

data(smoker)

As discussed earlier, the tvma function only supports two treatment groups. A separate function tvma_3trt for three treatment groups is also available.

The smoker data is organized in long format with SubjectID repeating over multiple rows for each participant. The tvma function requires that the data be in wide format to estimate the time varying coefficients. The tvmediation package includes a useful function LongToWide to help users properly format their data for analysis.

LongToWide has three required arguments and a fourth optional argument.

1. subject.id specifies the column of subject identifiers.
2. time.sequences specifies the column of measurement times.
3. outcome specifies the column for the variable (either the outcome or the mediator) that will be transposed.
4. verbose is an optional argument that if “TRUE” prints the output of LongToWide to the console. The default value is “FALSE”.

The output of LongToWide is a matrix of data in wide format where columns represent the subjects and rows represent the time sequence. Thus, each cell contains the j-th subject’s response at the i-th time point.

The tvma function requires two matrices, one for the mediator, and one for the outcome. Thus, we use the LongToWide function twice as illustrated below:

mediator <- LongToWide(smoker$SubjectID, smoker$timeseq, smoker$NegMoodLst15min) outcome <- LongToWide(smoker$SubjectID, smoker$timeseq, smoker$cessFatig)
class(mediator)
## [1] "matrix" "array"
mediator[1:16, 1:10]
##     1 2  3  4  5  6  7  8  9 10
## 0   7 1  1  1  1 NA  4  1  1  3
## 0.5 2 1  1  6  1 NA NA  1  1  1
## 1   1 1  2 NA NA  1  3  1  1  1
## 1.5 1 1  1  1  1  1  2  1 NA  1
## 2   1 1  1  1 NA  1  1  1  1  1
## 2.5 3 1  1  1  1  3  1  1  1  1
## 3   1 6 NA  1  1  1  1  2 NA  1
## 3.5 1 1  2  6  3  7  1  6  1  1
## 4   1 1  1  1  1  1  1  1  2  4
## 4.5 1 1  1  5  1  5  2 NA  1  3
## 5   4 1  2  1 NA  5  2  1  1  1
## 5.5 1 1  2  1  1  2 NA  1  1  2
## 6   2 1  1  4  1  7  1 NA  2  3
## 6.5 1 1  1  1  1  1  1  5  3  1
## 7   1 2  1 NA  1  1 NA  1  1  4
## 7.5 2 2  1  1  1 NA  1  1  1  6
class(outcome)
## [1] "matrix" "array"
outcome[1:16, 1:10]
##     1 2  3  4  5  6  7  8  9 10
## 0   1 6  1  2  1 NA  1  6  7  3
## 0.5 1 3  1  1  1 NA NA  1  7  1
## 1   1 1  1 NA NA  1  1  7  7  1
## 1.5 6 7  6  1  1  6  6  4 NA  7
## 2   7 7  1  1 NA  1  7  1  1  1
## 2.5 1 2  1  7  1  6  1  2  1  7
## 3   4 1 NA  1  7  2  1  4 NA  5
## 3.5 5 3  7  7  6  1  1  7  5  1
## 4   6 3  5  4  7  2  7  5  7  5
## 4.5 7 7  1  7  7  1  7 NA  4  1
## 5   7 7  1  1 NA  1  1  7  1  1
## 5.5 6 5  1  1  1  3 NA  5  7  1
## 6   1 1  1  7  5  1  7 NA  2  1
## 6.5 5 7  5  7  7  1  1  1  7  7
## 7   1 1  1 NA  7  1 NA  1  7  7
## 7.5 5 1  2  1  1 NA  1  1  1  7

If your data are already in wide format, there is no need to use the LongToWide function and you can simply subset your dataset. However, mediator and outcome must be of class matrix; hence make sure you convert the class of the subsetted mediator and outcome objects to matrix before proceeding. This can be done using the R function as.matrix.

The tvma function requires two more variables that we have not yet created:

1. treatment A binary numeric vector indicating the treatment group
2. t.seq A numeric vector of the time sequence of the measures

We can create these variables as follows. Because treatment is time-invariant, we need only one instance of each subject’s response for varenicline. We then convert it to a numeric value and subtract 1 to yield a vector of zeros and ones, as shown below.

# Step 1: Since each subject has multiple rows of data, extract the unique response of each subject to receiving varenicline. The data is still in dataframe format.
trt1 <- unique(smoker[ , c("SubjectID","varenicline")])
trt1[1:10,]
##    SubjectID varenicline
## 1          1          No
## 2          2         Yes
## 3          3         Yes
## 4          4          No
## 5          5         Yes
## 6          6         Yes
## 7          7         Yes
## 8          8         Yes
## 9          9         Yes
## 10        10         Yes
# Step 2: 2 is assigned to those subjects who received varenicline and 1 to the rest. The data is now in vector format.
trt2 <- as.numeric(trt1[ , 2])
trt2[1:10]
##  [1] 2 1 1 2 1 1 1 1 1 1
# Step 3: subtract 1 from these numeric responses to procure a vector of zeros and ones
treatment <- trt2 -1
treatment[1:10]
##  [1] 1 0 0 1 0 0 0 0 0 0

This steps can alternatively be collated into a single step and written as follows:

treatment <- as.numeric(unique(smoker[ , c("SubjectID","varenicline")])[, 2])-1
treatment[1:10]
##  [1] 1 0 0 1 0 0 0 0 0 0

As discussed earlier, the goal in this example is estimating the time varying effect of varenicline compared to not varenicline on cessation fatigue, as mediated via negative mood in the last fifteen minutes. We could also look at the effect of varenicline vs. nicotine patch only or combination NRT vs. varenicline with an additional step of excluding the subjects belonging to the treatment group that is not of interest, and then following the steps described above. Please refer to the vignette for the tvmb function for an illustration of the steps involved when considering varenicline vs. nicotine patch only or combination NRT vs. varenicline or combination NRT vs. nicotine patch only.

To generate t.seq we found the unique instance of each time point and then sorted from smallest to largest. There are 44 unique time points in the dataset where 0 after the decimal indicates the morning measurement and 5 after the decimal indicates the evening measurement recorded for that day.

t.seq <- sort(unique(smoker$timeseq)) t.seq ## [1] 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 ## [16] 7.5 8.0 8.5 9.0 9.5 10.0 10.5 11.0 11.5 12.0 12.5 13.0 13.5 14.0 14.5 ## [31] 16.0 16.5 18.0 18.5 20.0 20.5 22.0 22.5 24.0 24.5 26.0 26.5 28.0 28.5 We are now ready to perform the time varying mediation analysis. ## Calling the tvma function As discussed earlier, the tvma function has four required and five optional arguments. 1. treatment A binary vector indicating the treatment group 2. t.seq A numeric vector of the time sequence of the measures 3. mediator The matrix of mediator values in wide format 4. outcome The matrix of outcome values in wide format The optional arguments are: 1. t.est The time sequence for estimation. This is by default equal to t.seq. 2. plot TRUE or FALSE for plotting the mediation effect. The default value is “FALSE”. 3. CI “none” or “boot” method of deriving CIs. The default value is “boot”. 4. replicates Number of replicates for bootstrapping CIs. The default value is 1000. 5. verbose TRUE or FALSE for printing results to screen. The default value is “FALSE”. We will call the function with the optional arguments plot=TRUE and replicates = 250. We decreased the number of bootstrap replicates so that this vignette compiles faster but we suggest using at least 500 replicates in an actual analysis. The remaining optional arguments are left to their respective default values. results <- tvma(treatment, t.seq, mediator, outcome, plot = TRUE, replicates = 250) ## Results The tvma function returns a list of results that include: 1. hat.alpha estimated time-varying treatment effect on the mediator 2. CI.lower.alpha CI lower limit for hat.alpha 3. CI.upper.alpha CI upper limit for hat.alpha 4. hat.gamma estimated time-varying direct effect of the treatment on the outcome 5. CI.lower.gamma CI lower limit for hat.gamma 6. CI.upper.gamma CI upper limit for hat.gamma 7. hat.beta estimated time-varying effect of the mediator on the outcome 8. CI.lower.beta CI lower limit for hat.beta 9. CI.upper.beta CI upper limit for hat.beta 10. hat.tau estimated time-varying total treatment effect on the outcome 11. CI.lower.tau CI lower limit for hat.tau 12. CI.upper.tau CI upper limit for hat.tau 13. est.M the time-varying mediation effect of treatment on the outcome Optional returns based on argument CI = "boot" include: 1. boot.se.m estimated standard error for the time-varying mediation effect, est.M 2. CI.lower CI lower limit for the time-varying mediation effect, est.M 3. CI.upper CI upper limit for the time-varying mediation effect, est.M The above estimates are compiled in a single dataframe which can be accessed using nameOfStoredResultsObj$Estimates. For example, the following line of code displays the estimates for the first 6 time-points.

head(results$Estimates) ## t.est hat.alpha CI.lower.alpha CI.upper.alpha hat.gamma CI.lower.gamma ## 1 0.0 -0.5750034 -0.8404865 -0.2511151 0.3803577 0.01548155 ## 2 0.5 -0.5819098 -0.7720232 -0.3478521 0.4592414 0.18954246 ## 3 1.0 -0.5880155 -0.7196653 -0.3969861 0.5237820 0.32913063 ## 4 1.5 -0.5932360 -0.6933294 -0.4595194 0.5710753 0.44069711 ## 5 2.0 -0.5976067 -0.6824344 -0.5018488 0.6003423 0.48451702 ## 6 2.5 -0.6012658 -0.6718323 -0.5310321 0.6125481 0.49363637 ## CI.upper.gamma hat.beta CI.lower.beta CI.upper.beta hat.tau CI.lower.tau ## 1 0.7137066 0.2880372 0.1934518 0.3828504 -0.17768293 -0.6099129 ## 2 0.6915342 0.3033509 0.2395912 0.3694337 -0.10302673 -0.4067603 ## 3 0.6854935 0.3128918 0.2662036 0.3603184 -0.04859090 -0.2455066 ## 4 0.6996254 0.3192837 0.2826496 0.3604350 -0.01368227 -0.1490691 ## 5 0.7167229 0.3243138 0.2910140 0.3630136 0.00357346 -0.1087448 ## 6 0.7119399 0.3291172 0.2986037 0.3677783 0.00584576 -0.1047466 ## CI.upper.tau est.M boot.se.m CI.lower CI.upper ## 1 0.1408753 -0.1656224 0.04414036 -0.2548170 -0.08372314 ## 2 0.1366287 -0.1765229 0.03345352 -0.2439625 -0.11290021 ## 3 0.1138763 -0.1839852 0.02614798 -0.2316125 -0.13036852 ## 4 0.1210646 -0.1894106 0.02091074 -0.2258889 -0.14658637 ## 5 0.1236913 -0.1938121 0.01711894 -0.2254171 -0.16361938 ## 6 0.1128760 -0.1978869 0.01461271 -0.2266726 -0.17191718 At each time point of interest, t.est, which in this case is equal to t.seq, the time-varying effects of the treatment on the mediator, the treatment on the outcome (adjusted and not adjusted for mediator), and the mediator on the outcome are estimated along with the respective 95% CIs. The CIs are computed via a non-parametric bootstrap method (Efron and Tibshirani, 1986), drawing samples of size 1086 from the original sample with replacement, estimating the sample mean and then applying the percentile method to compute the 95% CIs. Note that the CIs for the alpha, gamma, beta and tau coefficients (hat.alpha, hat.gamma, hat.beta, hat.tau) are computed regardless of the value of CI argument in the function. est.M is the estimated mediation effect of varenicline compared to the other two treatment groups, that varies over t.est. For CI = "boot" (which is the default option unless the user chooses otherwise) the standard error of the estimated mediation effect and 95% CI is estimated via a similar bootstrapping technique described earlier for the other coefficients. If plot = TRUE, the results will also include the following figures: 1. Alpha_CI plot for hat.alpha with 95% CIs across timeseq 2. Gamma_CI plot for hat.gamma with 95% CIs across timeseq 3. Tau_CI plot for hat.tau with 95% CIs across timeseq 4. Beta_CI plot for hat.beta with 95% CIs across timeseq 5. MedEff plot for est.M across timeseq 6. MedEff_CI plot for est.M with 95% CIs across timeseq We recommend using the plots to interpret the estimates as it may be difficult to derive meaning from the numerical values alone. To display the plots, use nameOfStoredResultsObj$ followed by the name of the plot to access the required plot accordingly. For example:

results$Alpha_CI In the above plot, the effect of varenicline on subjects’ feeling of negative mood in the last fifteen minutes is negative compared to the other two treatment groups and mostly stable with a slight decreasing trend over time. results$Gamma_CI

The above plot shows the time-varying direct effect of varenicline on the outcome cessation fatigue that is positive in the varenicline group compared to the other two treatment groups. That is, those assigned to varenicline have greater cessation fatigue than the other two treatment groups. The estimated 95% CI does not cover 0 over time, indicating that the effect is statistically significant.

results$Beta_CI In the above plot, the effect of subjects’ negative mood in the last fifteen minutes on the outcome cessation fatigue is increases initially, albeit with a slight decrease around day 6-7 (end of week 1), and then decreases beginning around day 20-21 (end of week 3). The effect is positive such that an increase in negative mood is related to an increase in cessation fatigue. results$Tau_CI

The above plot shows the time-varying total effect of varenicline on the outcome cessation fatigue. The estimated 95% CI covers 0 (no effect) throughout the time period and thus, the effect is not statistically significant.

results$MedEff results$MedEff_CI

In the above plot, the effect of varenicline on cessation fatigue that is mediated by negative mood in the last fifteen minutes is negative compared to the patch only or combination NRT group. That is, varenicline reduces cessation fatigue due to a decrease in negative mood. This effect decreases (i.e., becomes stronger) during the first week and then stabilizes after day 6. Beyond week 2 the effect begins to decrease again, and then increases somewhat after week 3. Given that the CIs do not include zero, we can conclude that this effect is significant.

As discussed earlier, tvma accepts additional input arguments that allow the user to modify the analyses. For example, specifying t.est can be specified to select only certain time points at which to make the estimations. The following code will produce estimates at time points 0.2, 0.4, 0.6, and 0.8 using 250 bootstrap replicates.

results_new <- tvma(treatment, t.seq, mediator, outcome,
t.est = c(0.2, 0.4, 0.6, 0.8), replicates = 250)
results_new
## \$Estimates
##   t.est  hat.alpha CI.lower.alpha CI.upper.alpha hat.gamma CI.lower.gamma
## 1   0.2 -0.5778515     -0.8413740     -0.2943108 0.4133649      0.1338834
## 2   0.4 -0.5805877     -0.8207642     -0.3397772 0.4444837      0.2242638
## 3   0.6 -0.5831995     -0.7886388     -0.3686192 0.4734197      0.2760063
## 4   0.8 -0.5856775     -0.7577821     -0.3939715 0.4999199      0.3107156
##   CI.upper.gamma  hat.beta CI.lower.beta CI.upper.beta     hat.tau CI.lower.tau
## 1      0.7312327 0.2950568     0.2154036     0.3620688 -0.14540866   -0.4745251
## 2      0.7359234 0.3008484     0.2334724     0.3586299 -0.11634281   -0.3815561
## 3      0.7340905 0.3056249     0.2472812     0.3592557 -0.09052425   -0.3137884
## 4      0.7221786 0.3095812     0.2589868     0.3587922 -0.06795276   -0.2660459
##   CI.upper.tau      est.M  boot.se.m   CI.lower    CI.upper
## 1    0.1484149 -0.1704990 0.04296317 -0.2550429 -0.08633100
## 2    0.1397599 -0.1746689 0.03856799 -0.2467570 -0.09838304
## 3    0.1347972 -0.1782403 0.03462418 -0.2389332 -0.10634893
## 4    0.1380229 -0.1813147 0.03109166 -0.2340952 -0.11666464

The tvma function computes bootstrap confidence intervals by default. Therefore, if the user decides to not bootstrap CIs for the mediation effect by specifying CI = "none", but by mistake also specifies replicates = 250, the function will not display an error, but will simply execute without computing the CIs for mediation effect. Note that the CIs for the effects of the treatment on the mediator and the mediator on the outcome, and for the direct and total effects are computed even if the user passes the argument CI = "none".

## Summary

The tvmediation package provides a set of functions for estimating mediation effects that vary over time for both binary and continuous time-varying outcomes. Currently, the package only allows for a time-invariant treatment. The mediator and outcome are assumed to be time-varying, such as intensive longitudinal measurements obtained via Ecological Momentary Assessment or via wearable and mobile devices. The development of this tool has widespread application for use in human behavior research, clinical trials, addiction research, and others by allowing specification of mediation effects that vary as a function of time.

## References

1. Cai, X., Piper, M. E., Li, R., & Coffman, D. L. (2020). Estimation and inference for the mediation effect in a time-varying mediation model. https://arxiv.org/abs/2008.11797

2. Baker, T. B., Piper, M. E., Stein, J. H., Smith, S. S., Bolt, D. M., Fraser, D. L., & Fiore, M. C. (2016). Effects of nicotine patch vs varenicline vs combination nicotine replacement therapy on smoking cessation at 26 weeks: A randomized clinical trial. JAMA, 315(4), 371-379.}

3. Efron, B. & Tibshirani, R. Bootstrap Methods for Standard Errors, Confidence Intervals, and Other Measures of Statistical Accuracy. Statist. Sci. 1 (1986), no. 1, 54–75.}