Introduction

A heatmap is a graphical representation of data that uses a system of color-coding to represent different values. Heatmaps are used in various forms of analytics, however, this R package specifically focuses on providing an efficient way for creating interactive heatmaps for categorical data or continuous data that can be grouped into categories.

This package is originally being developed for Verkehrsbetriebe Zürich (VBZ), the public transport operator in the Swiss city of Zurich, to illustrate the utilization of different routes and vehicles during different times of the day. Therefore, it groups utilization data (e.g. persons per m^2) into different categories (e.g. low, medium, high utilization) and illustrates it for certain stops over time in a heatmap.

This package can easily be integrated into a shiny dashboard which supports additional interactions with other plots (e.g. boxplot, histogram, forecast) by using plotly events. A mini-demo app is provided in a separate github repository named catmaply_shiny.

This work is based on the plotly.js engine.

Please submit feature requests

This package is still under active development. If you have features you would like to have added, please submit your suggestions (and bug-reports) at: https://github.com/VerkehrsbetriebeZuerich/catmaply/issues/

News

You can see the most recent changes of the package in NEWS.md.

Installation

To install the latest (“cutting-edge”) GitHub version run:

# make sure that you have the corrent RTools installed.
# as you might need to build some packages from source
# if you don't have RTools installed, you can install it with:
# install.packages('installr'); install.Rtools() # not tested on windows
# or download it from here:
# https://cran.r-project.org/bin/windows/Rtools/
# in any case, make sure that you select the correct version, 
# otherwise the installation will fail.
# then you'll need devtools
# if (!require('devtools'))
  # install.packages('devtools')
# finally install the package
# devtools::install_github('VerkehrsbetriebeZuerich/catmaply')

To get the latest version on CRAN, perform:

#install.packages("catmaply")

Thereafter, you can start using the package as usual:

library(catmaply)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

Usage

Catmaply provides data of VBZ to easily start experimenting with the package. For demonstration purposes and simplicity, we will use it in this notebook. As usual, you can access the data as follows.

data("vbz")

df <- na.omit(vbz[[1]]) %>% 
  filter(.data$vehicle == "PO")

str(df)
#> tibble [1,707 × 13] (S3: tbl_df/tbl/data.frame)
#>  $ trip_seq              : int [1:1707] 6 24 22 4 14 42 44 50 28 30 ...
#>  $ stop_seq              : int [1:1707] 1 1 1 1 1 1 1 1 1 1 ...
#>  $ stop_name             : chr [1:1707] "Zuerich, Bahnhof Altstetten N" "Zuerich, Bahnhof Altstetten N" "Zuerich, Bahnhof Altstetten N" "Zuerich, Bahnhof Altstetten N" ...
#>  $ trip_id               : int [1:1707] 44044 33329 22130 33325 33327 22134 33333 12702 82990 12698 ...
#>  $ circulation_name      : int [1:1707] 8 6 4 6 6 4 6 2 10 2 ...
#>  $ line_name             : int [1:1707] 4 4 4 4 4 4 4 4 4 4 ...
#>  $ vehicle               : Factor w/ 3 levels "","CO","PO": 3 3 3 3 3 3 3 3 3 3 ...
#>  $ occupancy             : num [1:1707] 18.38 67.77 86.8 5.98 21.04 ...
#>  $ occ_category          : int [1:1707] 1 2 3 1 1 1 1 1 2 1 ...
#>  $ departure_time        : Factor w/ 3276 levels "","04:58:12",..: 74 435 393 49 223 839 885 1023 519 563 ...
#>  $ number_of_measurements: int [1:1707] 58 47 45 47 47 45 49 51 42 51 ...
#>  $ occ_cat_name          : Factor w/ 6 levels "","high","low",..: 3 4 5 3 3 3 3 3 4 3 ...
#>  $ direction             : int [1:1707] 1 1 1 1 1 1 1 1 1 1 ...
#>  - attr(*, "na.action")= 'omit' Named int [1:281] 145 146 147 148 149 294 295 296 297 298 ...
#>   ..- attr(*, "names")= chr [1:281] "145" "146" "147" "148" ...

The main columns of the vbz data.frame can be described as follows:

Default behaviour

Catmaply expects at least arguments for both axis (x, y) and the fields (z). To visualize the occupancy for all stops and trips, we can put the stop_seq on y, trip_seq on x and occupancy on z as follows:

catmaply(
    df,
    x = trip_seq,
    y = stop_seq,
    z = occ_category
  ) 

By default, catmaply produces an interactive heatmap with a rangeslider, legend items that show and/or hide data for a specific occupancy category by clicking on it and a hover label, that shows the values for x, y and z on hover.

Also, please note that you can use both, column names with and without quotes as column references for e.g. x, y, z. E.g. if we want to put the stop_names on the y axis, we can simply put the stop_name on y and order it using stop_seq (the x axis has of course the same functionality); as shown in the following:

catmaply(
    df,
    x = trip_seq,
    y = stop_name,
    y_order = stop_seq,
    z = occ_category
  )