rtrim for TRIM users

Patrick Bogaart, Mark van der Loo and Jeroen Pannekoek

2016-11-28

Introduction

The rtrim package is an complete reimplementation of the original TRIM software developed by Jeroen Pannekoek and Arco van Strien from the 1990’s onwards. This vignette provides a quick getting started manual that demonstrates the R-based workflow for computing TRIM models.

TRIM was developed to estimate animal populations, based on repeated counts at various sites while counts may be missing for certain sites at certain times. Estimation is based on a model-based imputation method.

We assume that the reader is already familiar with the methodology behind TRIM but in short, TRIM estimates a piecewise loglinear growth model to compute imputations. There are three variants of this model which differ by their basic assumptions.

Note that both Model 1 and Model 3 can be seen as special cases of Model 2 (Model 1 is equivalent with Model 2 when where time effects or growth rate is set to zero; Model 3 is equivalent with Model 2 when growth rates are assumed to change every time point).

For each variant it is possible to include categorical covariates in the model, or to weight sites. Certain simplifying assumptions are made to keep computations tractable. A detailed description of the methodology can be found in the original TRIM3 manual.

Computing TRIM models

We are going to use the skylark dataset, which is included with the package.

library(rtrim)
data(skylark)
# inspect the dataset
head(skylark,3)
##   site time count Habitat Deposition
## 1    1    1    11       2          2
## 2    1    2     8       2          2
## 3    1    3     5       2          2

Here, skylark is a regular R data.frame.

The central function for computing TRIM models is called trim. Calling this function is very similar to calling basic R modeling functions like lm. Here, we compute TRIM model 2.

m1 <- trim(count ~ site + time, data=skylark,model=2)

Note Since site and time are treated differently by the model, the order matters (see also model specification). Alternatively, one can do this:

m1 <- trim(skylark, count.id="count", site.id="site", time.id="time", model=2)

The result is an object of class trim. Just like with objects of class lm, its various components can be extracted using specialized functions. Here are some examples.

# summarize the model
summary(m1)
## Call:
## tools::buildVignettes(dir = ".", tangle = TRUE)
## 
## Model  : 2
## Method : ML (Convergence reached after 3 iterations)
## 
## Coefficients:
##   from upto        add     se_add      mul     se_mul
## 1    1    8 0.05482546 0.01043636 1.056356 0.01102452
## 
## 
## Goodness of fit:
##               Chi-square = 210.53, df=146, p=0.0004
##         Likelihood Ratio = 204.63, df=146, p=0.0010
##   AIC (up to a constant) = -87.37

Return time-totals

totals(m1)
##   time imputed se_imp
## 1    1     438     21
## 2    2     392     20
## 3    3     432     21
## 4    4     433     21
## 5    5     474     22
## 6    6     521     23
## 7    7     556     25
## 8    8     591     28

Retrieve goodness-of-fit

gof(m1)
## Goodness of fit:
##               Chi-square = 210.53, df=146, p=0.0004
##         Likelihood Ratio = 204.63, df=146, p=0.0010
##   AIC (up to a constant) = -87.37

Extract the coefficients

coefficients(m1)
##   from upto        add     se_add      mul     se_mul
## 1    1    8 0.05482546 0.01043636 1.056356 0.01102452

Plot with overall slope

plot(overall(m1))

These are just a few of of the functions that can be used to analyse the model. See any of their help files for a complete list of links to all analyses functions.

Model specification

The names of variables in the dataset are not important and neither is their order. However, since TRIM models are designed to estimate the number of counts at counting sites, the formula specifying the model has to satisfy certain rules.

For example, to use the variable Habitat as covariate when analysing the skylark dataset (under model 2) one does the following.

m2 <- trim(count ~ site + time + Habitat, data=skylark, model=2)

It is also possible to specify weights by passing trim a weights argument. The TRIM options overdisp (for overdispersion) and serialcor (for serial correlation), are simple TRUE/FALSE toggles. The breaks of the piecewise loglinear model can be specified with the changepoints option. The trim function will give an error when too little observations are present in a time segment, except when the autodelete option is set to TRUE. In that case time segments are combined until enough observations are present for a model to be estimated. See ?trim for a precise description of all options. Below is an example where we specify the maximum number of changepoints and let trim delete change points where necessary.

m3 <- trim(count ~ site + time + Habitat, data=skylark, model=2
     , overdisp = TRUE, serialcor = TRUE, changepoints=1:7, autodelete=TRUE)
m3$changepoints
## [1] 1 2 3 4 5 6 7

In this case, no change points are deleted.

In this example, the data sets consists of 8 time points, so time points 1 to 7 are explicitly specified as change point. This notation, which requires the prior identification of the number of time points present within the data, can be replaced by the more convenient expression changepoints="all".

Alternatively the stepwise algorithm can be used. This algorithm removes changepoints when the slope does not change significantly from before to after a changepoint, yielding a simpler (more robust) model.

m4 <- trim(count ~ site + time + Habitat, data=skylark, model=2
     , overdisp = TRUE, serialcor = TRUE, changepoints=1:7, stepwise = TRUE)
m4$changepoints
## [1] 1 2

Again, the explicit setting of initial changepoints can be replaced by the more convenient changepoints="auto", which combines changepoints="all" with stepwise=TRUE.

Utility functions

An overview of count data can be obtained with the function count_summary

data(skylark)
count_summary(skylark)
## Total number of sites                   55
## Sites without positive counts (0): 
## Number of observed zero counts            0
## Number of observed positive counts      202
## Total number of observed counts         202
## Number of missing counts                238
## Total number of counts                  440

The result is an overview similar to the one that used to be printed at the start of TRIM output files.

The TRIM model can only be computed when sufficient data is present. With the function check_observations one can check if a certain model can be computed.

check_observations(skylark, model=2, changepoints=c(1,4))
## $sufficient
## [1] TRUE
## 
## $errors
## $errors$changepoint
## numeric(0)

The result is a list with booean element sufficient. If sufficient==FALSE, the element errors contains a data.frame with the sites/times/covariates with insuffiecient counts.