In this tutorial, we show how to use ocf
to estimate the
conditional choice probabilities and the covariates’ marginal effects,
and conduct inference about these statistical targets. For illustration
purposes, we use the synthetic data set provided in the orf
package:
## Load data from orf package.
set.seed(1986)
library(orf)
data(odata)
<- as.numeric(odata[, 1])
y <- as.matrix(odata[, -1]) X
The ocf
function constructs a collection of forests, one
for each category of y
(three in this case). We can then
use the forests to predict out-of-sample conditional probabilities using
the predict
method. By default, predict
returns a matrix with the predicted probabilities and a vector of
predicted class labels (each observation is labelled to the
highest-probability class).
## Training-test split.
<- sample(seq_len(length(y)), floor(length(y) * 0.5))
train_idx
<- y[train_idx]
y_tr <- X[train_idx, ]
X_tr
<- y[-train_idx]
y_test <- X[-train_idx, ]
X_test
## Fit ocf on training sample. Use default settings.
<- ocf(y_tr, X_tr)
forests
## Summary of data and tuning parameters.
summary(forests)
## Out-of-sample predictions.
<- predict(forests, X_test)
predictions
head(predictions$probabilities)
table(y_test, predictions$classification)
We can also implement honesty, which is a necessary condition to
produce asymptotically normal and consistent predictions. In the
following, we set honesty = TRUE
to construct honest
forests.
## Honest forests.
<- ocf(y_tr, X_tr, honesty = TRUE)
honest_forests <- predict(honest_forests, X_test)
honest_predictions
## Compare predictions with adaptive fit.
cbind(head(predictions$probabilities), head(honest_predictions$probabilities))
To estimate standard errors for the predicted probabilities, we set
inference = TRUE
. This requires also to set
honesty = TRUE
: the formula for the variance is valid only
for honest predictions. The estimation of standard errors considerably
slows down the routine. However, we can increase the number of threads
used to construct the forests to speed up the routine.
## Compute standard errors.
<- ocf(y_tr, X_tr, honesty = TRUE, inference = TRUE, n.threads = 0) # Use all CPUs.
honest_forests head(honest_forests$predictions$standard.errors)
The marginal_effects
function post-processes the
predictions to estimate mean marginal effects, marginal effects at the
mean, or marginal effects at the median, according to the
eval
argument. In the following, we construct our forests
in the training sample and use them to estimate the marginal effects at
the mean in the test sample.
## Fit ocf on training sample.
<- ocf(y_tr, X_tr)
forests
## Marginal effects at the mean on test sample.
<- marginal_effects(forests, data = X_test, eval = "atmean")
me_atmean summary(me_atmean)
As before, we can set inference = TRUE
to estimate the
standard errors. Again, this requires the use of honest forests and
considerably slows down the routine.
## Honest forests.
<- ocf(y_tr, X_tr, honesty = TRUE) # Notice we do not need inference here!
honest_forests
## Compute standard errors.
<- marginal_effects(honest_forests, data = X_test , eval = "atmean", inference = TRUE)
honest_me_atmean
## LATEX.
print(honest_me_atmean, latex = TRUE)