GGIR is an R-package to process multi-day raw accelerometer data for physical activity and sleep research. The term raw refers to data being expressed in m/s2 or gravitational acceleration as opposed to the previous generation accelerometers which stored data in accelerometer brand specific units. The signal processing includes automatic calibration, detection of sustained abnormally high values, detection of non-wear and calculation of average magnitude of dynamic acceleration based on a variety of metrics. Next, GGIR uses this information to describe the data per recording, per day of measurement, and (optionally) per segment of a day of measurement, including estimates of physical activity, inactivity and sleep. We published an overview paper of GGIR in 2019 link.
This vignette provides a general introduction on how to use GGIR and interpret the output, additionally you can find a introduction video and a mini-tutorial on YouTube. If you want to use your own algorithms for raw data then GGIR facilitates this with it’s external function embedding feature, documented in a separate vignette: Embedding external functions in GGIR. GGIR is increasingly being used by research groups across the world. A non-exhaustive overview of academic publications related to GGIR can be found here. R package GGIR would not have been possible without the support of the contributors listed in the author list at GGIR, with specific code contributions over time since April 2016 (when GGIR development moved to GitHub) shown here.
Cite GGIR:
When you use GGIR in publications do not forget to cite it properly as that makes your research more reproducible and it gives credit to it’s developers. See paragraph on Citing GGIR for details.
How to contribute to the code?
The development version of GGIR can be found on github, which is also where you will find guidance on how to contribute.
How can I get service and support?
GGIR is open source software and does not come with service or support guarantees. However, as user-community you can help each other via the GGIR google group or the GitHub issue tracker. Please use these public platform rather than private e-mails such that other users can learn from the conversations.
If you need dedicated support with the use of GGIR or need someone to adapt GGIR to your needs then Vincent van Hees is available as independent consultant.
Change log
Our log of main changes to GGIR over time can be found here.
Download and install RStudio (optional, but recommended)
Install GGIR with its dependencies from CRAN. You can do this with one command from the console command line:
install.packages("GGIR", dependencies = TRUE)
Alternatively, to install the latest development version with the latest bug fixes use instead:
install.packages("remotes")
remotes::install_github("wadpac/GGIR")
read.myacc.csv
and argument
rmc.noise
in the GGIR function
documentation (pdf).GGIR comes with a large number of functions and optional settings (arguments) per functions.
To ease interacting with GGIR there is one central function, named
GGIR
, to talk to all the other functions. In the past this
function was called g.shell.GGIR
, but we decided to shorten
it to GGIR
for convenience. You can still use
g.shell.GGIR
because g.shell.GGIR
has become a
wrapper function around GGIR
passing on all arguments to
GGIR
and by that providing identical functionality.
In this paragraph we will guide you through the main arguments to
GGIR
relevant for 99% of research. First of all, it is
important to understand that the GGIR package is structured in two
ways.
Firstly, it has a computational structure of five parts which are
applied sequentially to the data, and that GGIR
controls
each of these parts:
The reason why it split up in parts is that it avoids having the re-do all analysis if you only want to make a small change in the more downstream parts. The specific order and content of the parts has grown for historical and computational reasons.
Secondly, the function arguments which we will refer to as input parameters are structured thematically independently of the five parts they are used in:
This structure was introduced in GGIR version 2.5-6 to make the GGIR code and documentation easier to navigate.
To see the parameters in each parameter category and their default values do:
library(GGIR)
print(load_params())
If you are only interested in one specific category like sleep:
library(GGIR)
print(load_params()$params_sleep)
If you are only interested in parameter “HASIB.algo” from the sleep_params object:
library(GGIR)
print(load_params()$params_sleep[["HASPT.algo"]])
Documentation for this parameter objects can be found in the (GGIR function
documentation (pdf)). All of these are accepted as argument to
function GGIR
, because GGIR
is a shell around
all GGIR functionality. However, the params_
objects
themselves can not be provided as input to GGIR
.
You will probably never need to think about most of the arguments listed above, because a lot of arguments are only included to facilitate methodological studies where researchers want to have control over every little detail. See previous paragraph for links to the documentation and how to find the default value of each parameter.
The bare minimum input needed for GGIR
is:
library(GGIR)
GGIR(datadir="C:/mystudy/mydata",
outputdir="D:/myresults")
Argument datadir
allows you to specify where you have
stored your accelerometer data and outputdir
allows you to
specify where you would like the output of the analyses to be stored.
This cannot be equal to datadir
. If you copy paste the
above code to a new R script (file ending with .R) and Source it in
R(Studio) then the dataset will be processed and the output will be
stored in the specified output directory.
Below we have highlighted the key arguments you may want to be aware of. We are not giving a detailed explanation, please see the package manual for that.
mode
- which part of GGIR to run, GGIR is constructed
in five parts.overwrite
- whether to overwrite previously produced
milestone output. Between each GGIR part, GGIR stores milestone output
to ease re-running parts of the pipeline.idloc
- tells GGIR where to find the participant ID
(default: inside file header)strategy
- informs GGIR how to consider the design of
the experiment.
strategy
is set to value 1, then check out arguments
hrs.del.start
and hrs.del.end
.strategy
is set to value 3, then check out arguments
ndayswindow
.maxdur
- maximum number of days you expect in a data
file based on the study protocol.desiredtz
- time zone of the experiment.chunksize
- a way to tell GGIR to use less memory,
which can be useful on machines with limited memory.includedaycrit
- tell GGIR how many hours of valid data
per day (midnight-midnight) is acceptable.includenightcrit
- tell GGIR how many hours of a valid
night (noon-noon) is acceptable.qwindow
- argument to tell GGIR whether and how to
segment the day for day-segment specific analysis.mvpathreshold
and boutcriter
-
acceleration threshold and bout criteria used for calculating time spent
in MVPA (only used in GGIR part2).epochvalues2csv
- to export epoch level magnitude of
acceleration to a csv files (in addition to already being stored as
RData file)dayborder
- to decide whether the edge of a day should
be other than midnight.iglevels
- argument related to intensity gradient
method proposed by A. Rowlands.do.report
- specify reports that need to be
generated.viewingwindow
and visualreport
- to create
a visual report, this only works when all five parts of GGIR have
successfully run.The table below shows all GGIR input arguments, the GGIR part (1, 2, 3, 4 and/or 5) they are used in, and the parameter object they belong too. As you will see a few parameters are not part of any parameter object. Their default values can be found in the GGIR function documentation (pdf).
Argument (parameter) | Used in GGIR part | Parameter object |
---|---|---|
datadir | 1, 2, 4, 5 | not in parameter objects |
f0 | 1, 2, 3, 4, 5 | not in parameter objects |
f1 | 1, 2, 3, 4, 5 | not in parameter objects |
windowsizes | 1, 5 | params_general |
desiredtz | 1, 2, 3, 4, 5 | params_general |
overwrite | 1, 2, 3, 4, 5 | params_general |
do.parallel | 1, 2, 3, 5 | params_general |
maxNcores | 1, 2, 3, 5 | params_general |
myfun | 1, 2, 3 | not in parameter objects |
outputdir | 1 | not in parameter objects |
studyname | 1 | not in parameter objects |
chunksize | 1 | params_rawdata |
do.enmo | 1 | params_metrics |
do.lfenmo | 1 | params_metrics |
do.en | 1 | params_metrics |
do.bfen | 1 | params_metrics |
do.hfen | 1 | params_metrics |
do.hfenplus | 1 | params_metrics |
do.mad | 1 | params_metrics |
do.anglex | 1 | params_metrics |
do.angley | 1 | params_metrics |
do.angle | 1 | params_metrics |
do.enmoa | 1 | params_metrics |
do.roll_med_acc_x | 1 | params_metrics |
do.roll_med_acc_y | 1 | params_metrics |
do.roll_med_acc_z | 1 | params_metrics |
do.dev_roll_med_acc_x | 1 | params_metrics |
do.dev_roll_med_acc_y | 1 | params_metrics |
do.dev_roll_med_acc_z | 1 | params_metrics |
do.lfen | 1 | params_metrics |
do.lfx | 1 | params_metrics |
do.lfy | 1 | params_metrics |
do.lfz | 1 | params_metrics |
do.hfx | 1 | params_metrics |
do.hfy | 1 | params_metrics |
do.hfz | 1 | params_metrics |
do.bfx | 1 | params_metrics |
do.bfy | 1 | params_metrics |
do.bfz | 1 | params_metrics |
do.zcx | 1 | params_metrics |
do.zcy | 1 | params_metrics |
do.zcz | 1 | params_metrics |
lb | 1 | params_metrics |
hb | 1 | params_metrics |
n | 1 | params_metrics |
do.cal | 1 | params_rawdata |
spherecrit | 1 | params_rawdata |
minloadcrit | 1 | params_rawdata |
printsummary | 1 | params_rawdata |
print.filename | 1 | params_general |
backup.cal.coef | 1 | params_rawdata |
rmc.noise | 1 | params_rawdata |
rmc.dec | 1 | params_rawdata |
rmc.firstrow.acc | 1 | params_rawdata |
rmc.firstrow.header | 1 | params_rawdata |
rmc.col.acc | 1 | params_rawdata |
rmc.col.temp | 1 | params_rawdata |
rmc.col.time | 1 | params_rawdata |
rmc.unit.acc | 1 | params_rawdata |
rmc.unit.temp | 1 | params_rawdata |
rmc.origin | 1 | params_rawdata |
rmc.header.length | 1 | params_rawdata |
mc.format.time | 1 | params_rawdata |
rmc.bitrate | 1 | params_rawdata |
rmc.dynamic_range | 1 | params_rawdata |
rmc.unsignedbit | 1 | params_rawdata |
rmc.desiredtz | 1 | params_rawdata |
rmc.sf | 1 | params_rawdata |
rmc.headername.sf | 1 | params_rawdata |
rmc.headername.sn | 1 | params_rawdata |
rmc.headername.recordingid | 1 | params_rawdata |
rmc.header.structure | 1 | params_rawdata |
rmc.check4timegaps | 1 | params_rawdata |
rmc.col.wear | 1 | params_rawdata |
rmc.doresample | 1 | params_rawdata |
imputeTimegaps | 1 | params_rawdata |
selectdaysfile | 1, 2 | params_cleaning |
dayborder | 1, 2, 5 | params_general |
dynrange | 1 | params_rawdata |
configtz | 1 | params_general |
minimumFileSizeMB | 1 | params_rawdata |
interpolationType | 1 | params_rawdata |
metadatadir | 2, 3, 4, 5 | not in parameter objects |
minimum_MM_length.part5 | 5 | params_cleaning |
strategy | 2, 5 | params_cleaning |
hrs.del.start | 2, 5 | params_cleaning |
hrs.del.end | 2, 5 | params_cleaning |
maxdur | 2, 5 | params_cleaning |
max_calendar_days | 2 | params_cleaning |
includedaycrit | 2 | params_cleaning |
L5M5window | 2 | params_247 |
M5L5res | 2, 5 | params_247 |
winhr | 2, 5 | params_247 |
qwindow | 2 | params_247 |
qlevels | 2 | params_247 |
ilevels | 2 | params_247 |
mvpathreshold | 2 | params_phyact |
boutcriter | 2 | params_phyact |
ndayswindow | 2 | params_cleaning |
idloc | 2, 4 | params_general |
do.imp | 2 | params_cleaning |
storefolderstructure | 2, 4, 5 | params_output |
epochvalues2csv | 2 | params_output |
do.part2.pdf | 2 | params_output |
mvpadur | 2 | params_phyact |
window.summary.size | 2 | params_247 |
bout.metric | 2, 5 | params_phyact |
closedbout | 2 | params_phyact |
IVIS_windowsize_minutes | 2 | params_247 |
IVIS_epochsize_seconds | 2 | params_247 |
IVIS.activity.metric | 2 | params_247 |
iglevels | 2, 5 | params_247 |
TimeSegments2ZeroFile | 2 | params_cleaning |
qM5L5 | 2 | params_247 |
MX.ig.min.dur | 2 | params_247 |
qwindow_dateformat | 2 | params_247 |
anglethreshold | 3 | params_sleep |
timethreshold | 3 | params_sleep |
acc.metric | 3, 5 | params_general |
ignorenonwear | 3 | params_sleep |
constrain2range | 3 | params_sleep |
do.part3.pdf | 3 | params_output |
sensor.location | 3, 4 | params_general |
HASPT.algo | 3 | params_sleep |
HASIB.algo | 3 | params_sleep |
Sadeh_axis | 3 | params_sleep |
longitudinal_axis | 3 | params_sleep |
HASPT.ignore.invalid | 3 | params_sleep |
loglocation | 4, 5 | params_sleep |
colid | 4 | params_sleep |
coln1 | 4 | params_sleep |
nnights | 4 | params_sleep |
sleeplogidnum | 4, 5 | params_sleep |
do.visual | 4 | params_output |
outliers.only | 4 | params_output |
excludefirstlast | 4 | params_cleaning |
criterror | 4 | params_output |
includenightcrit | 4 | params_cleaning |
relyonguider | 4 | params_sleep |
relyonsleeplog | 4 | not in parameter objects |
def.noc.sleep | 4 | params_sleep |
data_cleaning_file | 4, 5 | params_cleaning |
excludefirst.part4 | 4 | params_cleaning |
excludelast.part4 | 4 | params_cleaning |
sleeplogsep | 4 | params_cleaning |
sleepwindowType | 4 | params_cleaning |
excludefirstlast.part5 | 5 | params_cleaning |
boutcriter.mvpa | 5 | params_phyact |
boutcriter.in | 5 | params_phyact |
boutcriter.lig | 5 | params_phyact |
threshold.lig | 5 | params_phyact |
threshold.mod | 5 | params_phyact |
threshold.vig | 5 | params_phyact |
timewindow | 5 | params_output |
boutdur.mvpa | 5 | params_phyact |
boutdur.in | 5 | params_phyact |
boutdur.lig | 5 | params_phyact |
save_ms5rawlevels | 5 | params_output |
part5_agg2_60seconds | 5 | params_general |
save_ms5raw_format | 5 | params_output |
save_ms5raw_without_invalid | 5 | params_output |
includedaycrit.part5 | 5 | params_cleaning |
frag.metrics | 5 | params_phyact |
LUXthresholds | 5 | params_247 |
LUX_cal_constant | 5 | params_247 |
LUX_cal_exponent | 5 | params_247 |
LUX_day_segments | 5 | params_247 |
do.sibreport | 5 | params_output |
Cut-points to estimate time spent in acceleration levels that are roughly liked to levels of energy metabolism have been proposed by:
Acceleration metric not available in GGIR? Some of the above publications make use of acceleration metrics that sum their values per epoch rather than average them per epoch like GGIR does. So, to use their cut-point value we need to multiply the proposed cut-point by the sample frequency used in the study that proposed it. For each of the studies this is detailed below. Note that GGIR intentionally does not sum values per epoch because that approach makes the cut-point sample frequency dependent, which complicates comparisons and harmonisation of literature. The explained variance and accuracy remains identical because we are only multiplying with a constant.
Esliger 2011, Phillips 2013, Fraysse 2020, Dibben2020:
do.enmoa = TRUE
, do.enmo = FALSE
, and
acc.metric=”ENMOa”
.threshold.lig = ((LightCutPointFromPaper_in_gmins/sampleRateInStudy)*60) * 1000
threshold.mod = ((ModerateCutPointFromPaper_in_gmins/sampleRateInStudy)*60) * 1000
threshold.vig = ((VigorousCutPointFromPaper_in_gmins/sampleRateInStudy)*60) * 1000
mvpathreshold = ((ModerateCutPointFromPaper_in_gmins/sampleRateInStudy)*60) * 1000
sampleRateInStudy
was 80 for Esliger and
Phillips and 100 for Fraysse.Roscoe 2017:
do.enmoa = TRUE
, do.enmo = FALSE
, and
acc.metric=”ENMOa”
.threshold.lig = (LightCutPointFromPaper_in_gsecs/85.7) * 1000
threshold.mod = (ModerateCutPointFromPaper_in_gsecs/85.7) * 1000
threshold.vig = (VigorousCutPointFromPaper_in_gsecs/85.7) * 1000
mvpathreshold = (ModerateCutPointFromPaper_in_gsecs/85.7) * 1000
Schaeffer 2014:
do.en = TRUE
, do.enmo = FALSE
, and
acc.metric=”EN”
.threshold.lig = (LightCutPointFromPaper/75) * 1000
threshold.mod = (ModerateCutPointFromPaper/75) * 1000
threshold.vig = (VigorousCutPointFromPaper/75) * 1000
mvpathreshold = (ModerateCutPointFromPaper/75) * 1000
Vaha-Ypya et al 2015:
Hildebrand 2014, Hildebrand 2016, Migueles 2021. Sanders 2018:
Sensor calibration
In all of the studies above, excluding Hildebrand et al. 2016, no effort was made to calibrate the acceleration sensors relative to gravitational acceleration prior to cut-point development. Theoretically this can be expected to cause a bias in the cut-point estimates proportional to the calibration error in each device, especially for cut-points based on acceleration metrics which rely on the assumption of accurate calibration such as metrics: ENMO, EN, ENMOa, and by that also metric SVMgs used by studies such as Esliger 2011, Phillips 2013, and Dibben 2020.
Idle sleep mode and ActiGraph
Studies done with ActiGraph devices when configured with ‘idle sleep mode’ on, will have zero-strings in all three axes during periods of no movement. Studies do not clarify how these zeros strings are accounted for. The insertion of zero strings is problematic as raw data accelerometers should always measure the gravitational component when not moving. This directly impacts metrics that rely on the presence of a gravitational component such as ENMO, EN, ENMOAa, and SVMgs. However, also other metrics may be affected as the sudden disappearance of gravitational acceleration will cause a spike at the start and end of the non-movement time segment. More generally speaking, we advise ActiGraph users to disable the ‘idle sleep mode’ as it harms the transparency and reproducibility since no mechanism exists to replicate it in other accelerometer brands, and it is likely to challenge accurate assessment of sleep and sedentary behaviour. We also advise that data collected with ‘idle sleep mode’ turned on is not be referred to as raw data accelerometry, because the data collection process has involved proprietary pre-processing steps which is violates the core principle of raw data collection.
Validity of validation studies
Several studies that aimed to independently evaluate cut-point methods failed at recognising these challenges. Further, validation studies are typically limited to laboratory conditions and a small population. Therefore, it is best to interpret cut-points with caution. Future methodological studies around cut-points are advised to account for accelerometer calibration error and the problematic time gaps in for example the ActiGraph when configured with ‘idle sleep mode’.
If you consider all the arguments above you me may end up with a call
to GGIR
that could look as follows.
library(GGIR)
GGIR(
mode=c(1,2,3,4,5),
datadir="C:/mystudy/mydata",
outputdir="D:/myresults",
do.report=c(2,4,5),
#=====================
# Part 2
#=====================
strategy = 1,
hrs.del.start = 0, hrs.del.end = 0,
maxdur = 9, includedaycrit = 16,
qwindow=c(0,24),
mvpathreshold =c(100),
bout.metric = 6,
excludefirstlast = FALSE,
includenightcrit = 16,
#=====================
# Part 3 + 4
#=====================
def.noc.sleep = 1,
outliers.only = TRUE,
criterror = 4,
do.visual = TRUE,
#=====================
# Part 5
#=====================
threshold.lig = c(30), threshold.mod = c(100), threshold.vig = c(400),
boutcriter = 0.8, boutcriter.in = 0.9, boutcriter.lig = 0.8,
boutcriter.mvpa = 0.8, boutdur.in = c(1,10,30), boutdur.lig = c(1,10),
boutdur.mvpa = c(1),
includedaycrit.part5 = 2/3,
#=====================
# Visual report
#=====================
timewindow = c("WW"),
visualreport=TRUE)
Once you have used GGIR
and the output directory
(outputdir) will be filled with milestone data and results.
Function GGIR
stores all the explicitly entered argument
values and default values for the argument that are not explicitly
provided in a csv-file named config.csv stored in the root of the output
folder. The config.csv file is accepted as input to GGIR
with argument configfile
to replace the specification of
all the arguments, except datadir
and
outputdir
, see example below.
library(GGIR)
GGIR(datadir="C:/mystudy/mydata",
outputdir="D:/myresults", configfile = "D:/myconfigfiles/config.csv")
The practical value of this is that it eases the replication of analysis, because instead of having to share you R script, sharing your config.csv file will be sufficient. Further, the config.csv file contribute to the reproducibility of your data analysis.
Note 1: When combining a configuration file with explicitly provided
argument values, the explicitly provided argument values will overrule
the argument values in the configuration file. Note 2: The config.csv
file in the root of the output folder will be overwritten every time you
use GGIR
. So, if you would like to add annotations in the
file, e.g. in the fourth column, then you will need to store it
somewhere outside the output folder and explicitly point to it with
configfile
argument.
You can use
source("pathtoscript/myshellscript.R")
or use the Source button in RStudio if you use RStudio.
GGIR by default support multi-thread processing, which can be turned
off by seting argument do.parallel = FALSE
. If this is
still not fast enough then I advise using a GGIR on a computing cluster.
The way I did it on a Sun Grid Engine cluster is shown below, please
note that some of these commands are specific to the computing cluster
you are working on. Also, you may actually want to use an R package like
clustermq or snowfall, which avoids having to write bash script. Please
consult your local cluster specialist to tailor this to your situation.
In my case, I had three files for the SGE setting:
submit.sh
for i in {1..707}; do
n=1
s=$(($(($n * $[$i-1]))+1))
e=$(($i * $n))
qsub /home/nvhv/WORKING_DATA/bashscripts/run-mainscript.sh $s $e
done
run-mainscript.sh
#! /bin/bash
#$ -cwd -V
#$ -l h_vmem=12G
/usr/bin/R --vanilla --args f0=$1 f1=$2 < /home/nvhv/WORKING_DATA/test/myshellscript.R
myshellscript.R
options(echo=TRUE)
args = commandArgs(TRUE)
if(length(args) > 0) {
for (i in 1:length(args)) {
eval(parse(text = args[[i]]))
}
}
GGIR(f0=f0,f1=f1,...)
You will need to update the ...
in the last line with
the arguments you used for GGIR
. Note that
f0=f0,f1=f1
is essential for this to work. The values of
f0
and f1
are passed on from the bash
script.
Once this is all setup you will need to call
bash submit.sh
from the command line.
Important Note:
Please make sure that you process one GGIR part at the same time on a
cluster, because each part assumes that preceding parts have been ran.
You can make sure of this by always specifying argument
mode
to a single part of GGIR. Once the analysis stops
update argument mode
to the next part until all parts are
done. The speed of the parallel processing is obviously dependent on the
capacity of your computing cluster and the size of your dataset.
GGIR generates the following types of output. - csv-spreadsheets with all the variables you need for physical activity, sleep and circadian rhythm research - Pdfs with on each page a low resolution plot of the data per file and quality indicators - R objects with milestone data - Pdfs with a visual summary of the physical activity and sleep patterns as identified (see example below)