Title: The T-Rex selector for fast high-dimensional variable selection with FDR control

Description: It performs fast variable selection in large-scale high-dimensional settings while controlling the false discovery rate (FDR) at a user-defined target level. The package is based on the T-Rex selector paper (available at

Note: The T-Rex selector performs terminated-random experiments (T-Rex) using the T-LARS algorithm (R package) and fuses the selected active sets of all random experiments to obtain a final set of selected variables. The T-Rex selector provably controls the false discovery rate (FDR), i.e., the expected fraction of selected false positives among all selected variables, at the user-defined target level while maximizing the number of selected variables and, thereby, achieving a high true positive rate (TPR) (i.e., power). The T-Rex selector can be applied in various fields, such as genomics, financial engineering, or any other field that requires a fast and FDR-controlling variable/feature selection method for large-scale high-dimensional settings.

In the following sections, we show you how to install and use the package.


Before installing the ‘TRexSelector’ package, you need to install the required ‘tlars’ package. You can install the ‘tlars’ package from CRAN or GitHub with:

# Install stable version from CRAN

# Install development version from GitHub

Then, you can install the ‘TRexSelector’ package with:


You can open the help pages with:

help(package = "TRexSelector")
# etc.

To cite the package ‘TRexSelector’ in publications use:


Quick Start

This section illustrates the basic usage of the ‘TRexSelector’ package to perform FDR-controlled variable selection in large-scale high-dimensional settings based on the T-Rex selector.

  1. First, we generate a high-dimensional Gaussian data set with sparse support:

# Setup
n <- 75 # number of observations
p <- 150 # number of variables
num_act <- 3 # number of true active variables
beta <- c(rep(1, times = num_act), rep(0, times = p - num_act)) # coefficient vector
true_actives <- which(beta > 0) # indices of true active variables
num_dummies <- p # number of dummy predictors (also referred to as dummies)

# Generate Gaussian data
X <- matrix(stats::rnorm(n * p), nrow = n, ncol = p)
y <- X %*% beta + stats::rnorm(n)
  1. Second, we perform FDR-controlled variable selection using the T-Rex selector for a target FDR of 5%:
# Seed

# Numerical zero
eps <- .Machine$double.eps

# Variable selection via T-Rex
res <- trex(X = X, y = y, tFDR = 0.05, verbose = FALSE)
selected_var <- which(res$selected_var > eps)
paste0("True active variables: ", paste(as.character(true_actives), collapse = ", "))
#> [1] "True active variables: 1, 2, 3"
paste0("Selected variables: ", paste(as.character(selected_var), collapse = ", "))
#> [1] "Selected variables: 1, 2, 3"

So, for a preset target FDR of 5%, the T-Rex selector has selected all true active variables and there are no false positives in this example.

Note that users have to choose the target FDR according to the requirements of their specific applications.


For more information and some examples, please check the GitHub-vignette.

T-Rex paper:

‘TRexSelector’ package: GitHub-TRexSelector.

README file: GitHub-readme.

Vignette: GitHub-vignette.

‘tlars’ package: CRAN-tlars and GitHub-tlars.