A heatmap is a graphical representation of data that uses a system of color-coding to represent different values. Heatmaps are used in various forms of analytics, however, this R package specifically focuses on providing an efficient way for creating interactive heatmaps for categorical data or continuous data that can be grouped into categories.
This package is originally being developed for Verkehrsbetriebe Zürich (VBZ), the public transport operator in the Swiss city of Zurich, to illustrate the utilization of different routes and vehicles during different times of the day. Therefore, it groups utilization data (e.g. persons per m^2) into different categories (e.g. low, medium, high utilization) and illustrates it for certain stops over time in a heatmap.
This package can easily be integrated into a shiny dashboard which supports additional interactions with other plots (e.g. boxplot, histogram, forecast) by using plotly events. A mini-demo app is provided in a separate github repository named catmaply_shiny.
This work is based on the plotly.js engine.
This package is still under active development. If you have features you would like to have added, please submit your suggestions (and bug-reports) at: https://github.com/VerkehrsbetriebeZuerich/catmaply/issues/
You can see the most recent changes of the package in NEWS.md.
To install the latest (“cutting-edge”) GitHub version run:
# make sure that you have the corrent RTools installed.
# as you might need to build some packages from source
# if you don't have RTools installed, you can install it with:
# install.packages('installr'); install.Rtools() # not tested on windows
# or download it from here:
# https://cran.r-project.org/bin/windows/Rtools/
# in any case, make sure that you select the correct version,
# otherwise the installation will fail.
# then you'll need devtools
# if (!require('devtools'))
# install.packages('devtools')
# finally install the package
# devtools::install_github('VerkehrsbetriebeZuerich/catmaply')
To get the latest version on CRAN, perform:
#install.packages("catmaply")
Thereafter, you can start using the package as usual:
library(catmaply)
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
Catmaply provides data of VBZ to easily start experimenting with the package. For demonstration purposes and simplicity, we will use it in this notebook. As usual, you can access the data as follows.
data("vbz")
df <- na.omit(vbz[[1]]) %>%
filter(.data$vehicle == "PO")
str(df)
#> tibble [1,707 × 13] (S3: tbl_df/tbl/data.frame)
#> $ trip_seq : int [1:1707] 6 24 22 4 14 42 44 50 28 30 ...
#> $ stop_seq : int [1:1707] 1 1 1 1 1 1 1 1 1 1 ...
#> $ stop_name : chr [1:1707] "Zuerich, Bahnhof Altstetten N" "Zuerich, Bahnhof Altstetten N" "Zuerich, Bahnhof Altstetten N" "Zuerich, Bahnhof Altstetten N" ...
#> $ trip_id : int [1:1707] 44044 33329 22130 33325 33327 22134 33333 12702 82990 12698 ...
#> $ circulation_name : int [1:1707] 8 6 4 6 6 4 6 2 10 2 ...
#> $ line_name : int [1:1707] 4 4 4 4 4 4 4 4 4 4 ...
#> $ vehicle : Factor w/ 3 levels "","CO","PO": 3 3 3 3 3 3 3 3 3 3 ...
#> $ occupancy : num [1:1707] 18.38 67.77 86.8 5.98 21.04 ...
#> $ occ_category : int [1:1707] 1 2 3 1 1 1 1 1 2 1 ...
#> $ departure_time : Factor w/ 3276 levels "","04:58:12",..: 74 435 393 49 223 839 885 1023 519 563 ...
#> $ number_of_measurements: int [1:1707] 58 47 45 47 47 45 49 51 42 51 ...
#> $ occ_cat_name : Factor w/ 6 levels "","high","low",..: 3 4 5 3 3 3 3 3 4 3 ...
#> $ direction : int [1:1707] 1 1 1 1 1 1 1 1 1 1 ...
#> - attr(*, "na.action")= 'omit' Named int [1:281] 145 146 147 148 149 294 295 296 297 298 ...
#> ..- attr(*, "names")= chr [1:281] "145" "146" "147" "148" ...
The main columns of the vbz
data.frame can be described
as follows:
trip_seq
shows the order of the tripsstop_name
shows the names of the stops, that need to be
ordered by stop_seq
occ_category
shows the category of the data point
(e.g. 1 - very few people in the bus, 5 - bus is full)occupancy
is e.g. the number of people per m^2.Catmaply expects at least arguments for both axis (x
,
y
) and the fields (z
). To visualize the
occupancy for all stops and trips, we can put the stop_seq
on y
, trip_seq
on x
and
occupancy
on z
as follows:
catmaply(
df,
x = trip_seq,
y = stop_seq,
z = occ_category
)
By default, catmaply produces an interactive heatmap with a
rangeslider, legend items that show and/or hide data for a specific
occupancy category by clicking on it and a hover label, that shows the
values for x
, y
and z
on
hover.
Also, please note that you can use both, column names with and
without quotes as column references for e.g. x, y, z. E.g. if we want to
put the stop_names on the y
axis, we can simply put the
stop_name
on y
and order it using
stop_seq
(the x axis has of course the same functionality);
as shown in the following:
catmaply(
df,
x = trip_seq,
y = stop_name,
y_order = stop_seq,
z = occ_category
)