HighPerformanceComputing High-Performance and Parallel Computing with R Dirk Eddelbuettel 2017-09-20

This CRAN task view contains a list of packages, grouped by topic, that are useful for high-performance computing (HPC) with R. In this context, we are defining 'high-performance computing' rather loosely as just about anything related to pushing R a little further: using compiled code, parallel computing (in both explicit and implicit modes), working with large objects as well as profiling.

Unless otherwise mentioned, all packages presented with hyperlinks are available from CRAN, the Comprehensive R Archive Network.

Several of the areas discussed in this Task View are undergoing rapid change. Please send suggestions for additions and extensions for this task view to the task view maintainer.

Suggestions and corrections by Achim Zeileis, Markus Schmidberger, Martin Morgan, Max Kuhn, Tomas Radivoyevitch, Jochen Knaus, Tobias Verbeke, Hao Yu, David Rosenberg, Marco Enea, Ivo Welch, Jay Emerson, Wei-Chen Chen, Bill Cleveland, Ross Boylan, Ramon Diaz-Uriarte, Mark Zeligman, Kevin Ushey, Graham Jeffries, and Will Landau (as well as others I may have forgotten to add here) are gratefully acknowledged.

Contributions are always welcome, and encouraged. Since the start of this CRAN task view in October 2008, most contributions have arrived as email suggestions. The source file for this particular task view file now also reside in a GitHub repository (see below) so that pull requests are also possible.

The ctv package supports these Task Views. Its functions install.views and update.views allow, respectively, installation or update of packages from a given Task View; the option coreOnly can restrict operations to packages labeled as core below.

Direct support in R started with release 2.14.0 which includes a new package parallel incorporating (slightly revised) copies of packages multicore and snow. Some types of clusters are not handled directly by the base package 'parallel'. However, and as explained in the package vignette, the parts of parallel which provide snow-like functions will accept snow clusters including MPI clusters.
The parallel package also contains support for multiple RNG streams following L'Ecuyer et al (2002), with support for both mclapply and snow clusters.
The version released for R 2.14.0 contains base functionality: higher-level convenience functions are planned for later R releases.

Parallel computing: Explicit parallelism

Parallel computing: Implicit parallelism

Parallel computing: Grid computing

Parallel computing: Hadoop

Parallel computing: Random numbers

Parallel computing: Resource managers and batch schedulers

Parallel computing: Applications

Parallel computing: GPUs

Large memory and out-of-memory data

Easier interfaces for Compiled code

Profiling tools

aprof batch BatchExperiments BatchJobs batchtools bayesm bcp biglars biglm bigmemory bnlearn caret clustermq cudaBayesreg data.table dclone doFuture doMC doMPI doRedis doRNG doSNOW drake ff ffbase flowr foreach future future.BatchJobs GAMBoost gcbd gmatrix gputools gpuR GUIProfiler h2o HadoopStreaming harvestr HistogramTools inline LaF latentnet lga Matching MonetDB.R nws orloca OpenCL partDSA pbapply pbdBASE pbdDEMO pbdDMAT pbdPROF pbdMPI pbdNCDF4 pbdSLAP peperr permGPU PGICA pls pmclust profr proftools pvclust randomForestSRC Rborist Rcpp RcppParallel Rdsm rgenoud Rhpc RhpcBLASctl RInside rJava rlecuyer Rmpi RProtoBuf rredis Sim.DiffProc rslurm snow snowfall snowFT speedglm sprint sqldf STAR tm toaster varSelRF HPC computing notes by Luke Tierney for HPC class at University of Iowa Mailing List: R Special Interest Group High Performance Computing Schmidberger, Morgan, Eddelbuettel, Yu, Tierney and Mansmann (2009) paper on 'State of the Art in Parallel Computing with R' Luke Tierney's code directory for pnmath and pnmath0 biocep-distrib affyPara maanova multtest puma romp bugsparallel Slurm open-source workload manager Condor project at University of Wisconsin-Madison Parallel Computing in R with sfCluster/snowfall Wikipedia: Message Passing Interface (MPI) Wikipedia: Parallel Virtual Machine (PVM) Slides from Introduction to High-Performance Computing with R tutorial help in Nov 2009 at the Institute for Statistical Mathematics, Tokyo, Japan rgpu project at nbic.nl Magma: Matrix Algebra on GPU and Multicore architectures Parallel R: Data Analysis in the Distributed World" High Performance Statistical Computing for Data Intensive Research Rth: Parallel R through Thrust Programming with Big Data in R RHIPE Beyond Single Core: Parallel Analysis in R GitHub repository for this Task View