Dominance Plots



Create 5 datasets from standard gam example #1. The noise parameter is scaled, such that the first GAM has lowest noise and the last has highest noise. In a dominance plot, we plot the relative contributions of each dataset on the prediction. By the way we generated the data, we expect largest influence by dataset 1 and lowest by dataset 5.

## simulate datasets
datasets <- lapply(1:num.datasets, function(x) mgcv::gamSim(scale = x, verbose = FALSE))

Delete all data in the first dataset that has values lower than .2 on the dimension x2. Thus, we expect a low (i.e., zero) contribution of the first dataset on low values of dimensions x2.

df <- datasets[[1]]
df[df$x2<0.2,] <- NA
datasets[[1]] <- df

Next, delete all values of the second dataset for large values (x2>.8).

df <- datasets[[2]]
df[df$x2 > 0.8, ] <- NA
datasets[[2]] <- df

Fit GAMs

Next, we fit the individual GAMs for each of the eight datasets.


Then, we analyze the term s(x2).

meta_analysis <- metagam(models, grid_size = 500, terms = "s(x2)", intercept = TRUE)

Dominance Analysis

Finally, we create a dominance plot that tells us how much the points on axis x2 are influenced by the individual GAMs. Since plot_dominance returns a ggplot object, we can modify the colors using, e.g. the viridis package. We see that the influence of the GAMs is graded according to the simulated noise levels. Second, we see that on the left-hand side, the influence of the first dataset is almost zero whereas the influence of the second dataset is almost zero on the righthand side.