`swag`

package**swag** is a package that trains a meta-learning
procedure which combines screening and wrapper methods to find a set of
extremely low-dimensional attribute combinations.

First install the **devtools** package. Then
**swag** with the following code:

```
## if not installed
## install.packages("remotes")
::install_github("SMAC-Group/SWAG-R-Package")
remotes
library(swag) #load the new package
```

We propose to use the **breastcancer** dataset readily
available from the package **mlbench** to give an overview
of **swag**.

```
# After having installed the mlbench package
data(BreastCancer, package = "mlbench")
# Pre-processing of the data
<- BreastCancer$Class # response variable
y <- as.matrix(BreastCancer[setdiff(names(BreastCancer),c("Id","Class"))]) # features
x
# remove missing values and change to 'numeric'
<- which(apply(x,1,function(x) sum(is.na(x)))>0)
id <- y[-id]
y <- x[-id,]
x <- apply(x,2,as.numeric)
x
# Training and test set
set.seed(180) # for replication
<- sample(1:dim(x)[1],dim(x)[1]*0.2)
ind <- y[ind]
y_test <- y[-ind]
y_train <- x[ind,]
x_test <-x[-ind,] x_train
```

Now we are ready to train with **swag**! The first step
is to define the meta-parameters of the **swag** procedure:
(p_{max}) the maximum dimension of attributes, () a performance quantile
which represents the percentage of learners which are selected at each
dimension and (m), the maximum numbers of learners trained at each
dimension. We can set all these meta-parameters, together with a seed
for replicability purposes and `verbose = TRUE`

to get a
message as each dimension is completed, thanks to the
*swagcontrol()* function which behaves similarly to the
`trControl =`

argument of **caret**.

```
# Meta-parameters chosen for the breast cancer dataset
<- swagControl(pmax = 4L,
swagcon alpha = 0.5,
m = 20L,
seed = 163L, #for replicability
verbose = T #keeps track of completed dimensions
)
# Given the low dimensional dataset, we can afford a wider search
# by fixing alpha = 0.5 as a smaller alpha may also stop the
# training procedure earlier than expected.
```

Having set-up the meta-parameters as explained above, we are now
ready to train the **swag**. We start with the linear
Support Vector Machine learner:

```
### SVM Linear Learner ###
<- swag(
train_swag_svml # arguments for swag
x = x_train,
y = y_train,
control = swagcon,
auto_control = FALSE,
# arguments for caret
trControl = caret::trainControl(method = "repeatedcv", number = 10, repeats = 1, allowParallel = F),
metric = "Accuracy",
method = "svmLinear", # Use method = "svmRadial" to train this alternative learner
preProcess = c("center", "scale")
)
```

```
## [1] "Dimension explored: 1 - CV errors at alpha: 0.115"
## [1] "Dimension explored: 2 - CV errors at alpha: 0.0549"
## [1] "Dimension explored: 3 - CV errors at alpha: 0.0403"
## [1] "Dimension explored: 4 - CV errors at alpha: 0.0394"
```

The only difference with respect to the classic
**caret** train function, is the specification of the
**swag** arguments which have been explained previously. In
the above chunk for the *svmLinear* learner, we define the
estimator of the out-of-sample accuracy as 10-fold cross-validation
repeated 1 time. For this specific case, we have chosen to center and
rescale the data, as usually done for SVMs, and, the parameter that
controls the margin in SVMs is automatically fixed at unitary value
(i.e. (c=1)).

Let’s have a look at the typical output of a **swag**
training object for the *svmLinear* learner:

`$CVs train_swag_svml`

```
## [[1]]
## [1] 0.14094276 0.06959836 0.07499399 0.15157407 0.10811688 0.08592593 0.11502886
## [8] 0.12070707 0.22122896
##
## [[2]]
## [1] 0.05107744 0.06225950 0.03852213 0.05492304 0.06030544 0.04377104
## [7] 0.05108225 0.06212121 0.07485570 0.05491582
##
## [[3]]
## [1] 0.04010101 0.04761063 0.03848846 0.04030784 0.04575758 0.04016835 0.03841991
## [8] 0.04387205 0.05105099
##
## [[4]]
## [1] 0.03464646 0.04572751 0.04030664 0.03852213
```

`# A list which contains the cv training errors of each learner explored in a given dimension`

`$VarMat train_swag_svml`

```
## [[1]]
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
## [1,] 1 2 3 4 5 6 7 8 9
##
## [[2]]
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## [1,] 2 2 2 2 3 3 3 5 5 6
## [2,] 3 5 6 7 5 6 7 6 7 7
##
## [[3]]
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
## [1,] 2 2 2 3 2 2 3 3 5
## [2,] 3 3 6 6 3 5 5 5 6
## [3,] 6 7 7 7 5 6 6 7 7
##
## [[4]]
## [,1] [,2] [,3] [,4]
## [1,] 2 2 2 3
## [2,] 3 3 5 5
## [3,] 6 5 6 6
## [4,] 7 6 7 7
```

`# A list which contrains a matrix, for each dimension, with the attributes tested at that step `

`$cv_alpha train_swag_svml`

`## [1] 0.11502886 0.05491943 0.04030784 0.03941438`

`# The cut-off cv training error, at each dimension, determined by the choice of alpha`

The other two learners that we have implemented on
**swag** are: lasso (**glmnet** package
required) and random forest (**party** package required).
The training phase for these learners, differs a little with respect to
the SVM one. We can look at the random forest for a practical
example:

```
### Random Forest Learner ###
<- swag(
train_swag_rf # arguments for swag
x = x,
y = y,
control = swagcon,
auto_control = FALSE,
# arguments for caret
trControl = caret::trainControl(method = "repeatedcv", number = 10, repeats = 1, allowParallel = F),
metric = "Accuracy",
method = "rf",
# dynamically modify arguments for caret
caret_args_dyn = function(list_arg,iter){
$tuneGrid = expand.grid(.mtry=sqrt(iter))
list_arg
list_arg
} )
```

```
## [1] "Dimension explored: 1 - CV errors at alpha: 0.0996"
## [1] "Dimension explored: 2 - CV errors at alpha: 0.0534"
## [1] "Dimension explored: 3 - CV errors at alpha: 0.0461"
## [1] "Dimension explored: 4 - CV errors at alpha: 0.0425"
```

The newly introduced argument `caret_args_dyn`

enables the
user to modify the hyper-parameters related to a given learner in a
dynamic way since they can change as the dimension grows up to the
desired (p_{max}). This allows to adapt the *mtry*
hyper-parameter as the dimension grows. In the example above, we have
fixed *mtry* to the square root of the number of attributes at
each step as it is usually done in practice.

You can tailor the learning arguments of *swag()* as you like,
introducing for example grids for the hyper-parameters specific of a
given learner or update these grids as the dimension increases similarly
to what is usually done for the **caret** package. This
gives you a wide range of possibilities and a lot of flexibility in the
training phase.

To conclude this brief introduction, we present the usual
*predict()* function which can be applied to a
**swag** trained object similarly to many other packages in
R. We pick the random forest learner for this purpose.

```
# best learner predictions
# if `newdata` is not specified, then predict gives predictions based on the training
# sample
sapply(predict(object = train_swag_rf), function(x) head(x))
```

```
## $predictions
## [,1]
## [1,] 1
## [2,] 1
## [3,] 1
## [4,] 1
## [5,] 1
## [6,] 2
##
## $models
## $models[[1]]
## [1] 3 5 6 7
```

```
# best learner predictions
<- predict(object = train_swag_rf,
best_pred newdata = x_test)
sapply(best_pred, function(x) head(x))
```

```
## $predictions
## [,1]
## [1,] 1
## [2,] 1
## [3,] 1
## [4,] 2
## [5,] 1
## [6,] 1
##
## $models
## $models[[1]]
## [1] 3 5 6 7
```

```
# predictions for a given dimension
<- predict(
dim_pred object = train_swag_rf,
newdata = x_test,
type = "attribute",
attribute = 4L)
sapply(dim_pred,function(x) head(x))
```

```
## $predictions
## [,1] [,2] [,3] [,4]
## [1,] 1 1 1 1
## [2,] 1 1 1 1
## [3,] 1 1 1 1
## [4,] 2 2 2 2
## [5,] 1 1 1 1
## [6,] 1 1 1 1
##
## $models
## $models[[1]]
## [1] 2 3 5 6
##
## $models[[2]]
## [1] 2 3 5 7
##
## $models[[3]]
## [1] 3 5 6 7
##
## $models[[4]]
## [1] 2 3 6 7
```

```
# predictions below a given CV error
<- predict(
cv_pred object = train_swag_rf,
newdata = x_test,
type = "cv_performance",
cv_performance = 0.04)
sapply(cv_pred,function(x) head(x))
```

```
## $predictions
## [,1]
## [1,] 1
## [2,] 1
## [3,] 1
## [4,] 2
## [5,] 1
## [6,] 1
##
## $models
## $models[[1]]
## [1] 3 5 6 7
```

Now we can evaluate the performance of the best learner selected by
**swag** thanks to the *confusionMatrix()* function
of **caret**.

```
# transform predictions into a data.frame of factors with levels of `y_test`
<- factor(levels(y_test)[best_pred$predictions])
best_learn ::confusionMatrix(best_learn,y_test) caret
```

```
## Confusion Matrix and Statistics
##
## Reference
## Prediction benign malignant
## benign 90 0
## malignant 0 46
##
## Accuracy : 1
## 95% CI : (0.9732, 1)
## No Information Rate : 0.6618
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 1
##
## Mcnemar's Test P-Value : NA
##
## Sensitivity : 1.0000
## Specificity : 1.0000
## Pos Pred Value : 1.0000
## Neg Pred Value : 1.0000
## Prevalence : 0.6618
## Detection Rate : 0.6618
## Detection Prevalence : 0.6618
## Balanced Accuracy : 1.0000
##
## 'Positive' Class : benign
##
```

Thanks for the attention. You can definitely say that you worked with
**swag** !!!