vimp {randomSurvivalForest}R Documentation

VIMP for Single or Grouped Variables

Description

Calculate variable importance (VIMP) for a single variable or group of variables.

Usage

  vimp(object, predictorNames = NULL, subset = NULL,
       joint = TRUE, rough = FALSE,
       importance = c("randomsplit", "permute", "none")[1], 
       seed = NULL, do.trace = FALSE, ...)

Arguments

object

An object of class (rsf, grow) or (rsf, forest).

predictorNames

Character vector of x-variable names to be considered. If NULL (the default) all variables are used. Only x-variables listed in the object predictor matrix will be used.

subset

Indices indicating which rows of the predictor matrix to be used (note: this applies to the object predictor matrix, predictors). Default is to use all rows.

joint

Should joint-VIMP or individual VIMP be calculated? See details below.

rough

Should fast approximation be used?

importance

Type of VIMP.

seed

Seed (negative integer) for random number generator.

do.trace

Should trace output be enabled? Integer values can also be passed. A positive value causes output to be printed each do.trace iteration.

...

Further arguments passed to or from other methods.

Details

Using a previously grown forest, and restricting the data to that indicated by subset, calculate the VIMP for variables listed in predictorNames. If joint=TRUE, then joint-VIMP for the group of variables is calculated. This equals the amount that prediction error changes when the group of variables are simultaneously "noised-up" (perturbed). If joint=FALSE, the VIMP for each variable is calculated separately.

Depending upon the setting for importance, VIMP is determined by using either random daughter assignment ("randomsplit") or random permutation ("permute") noising-up of the variable(s). A NULL value is returned if importance="none".

For competing risk data, VIMP and error rates are given for the ensemble CHF and the conditional CHF (CCHF) for each event type. For more details see Ishwaran et al. (2010).

Value

A list with the following components:

err.rate

OOB error rate for the (unperturbed) ensemble restricted to the subsetted data.

err.perturb.rate

OOB error rate for the perturbed ensemble restricted to the subsetted data. Dimension depends upon the option joint.

importance

Variable importance (VIMP). Dimension depends upon the option joint.

Author(s)

Hemant Ishwaran hemant.ishwaran@gmail.com

Udaya B. Kogalur kogalurshear@gmail.com

References

Ishwaran H. (2007). Variable importance in binary regression trees and forests, Electronic J. Statist., 1:519-537.

Ishwaran H., Kogalur U.B., Moore R.D., Gange S.J. and Lau B.M. (2010). Random survival forests for competing risks.

See Also

find.interaction.

Examples

#------------------------------------------------------------------------
# Example of paired-VIMP. 
# Veteran data.

data(veteran, package = "randomSurvivalForest") 
v.out <- rsf(Survrsf(time,status) ~ ., veteran, ntree = 1000)
vimp(v.out, c("karno","celltype"))$importance

## Not run: 
#------------------------------------------------------------------------
# Individual VIMP for data restricted to events only.
# PBC data.

data(pbc, package = "randomSurvivalForest") 
rsf.out <- rsf(Surv(days,status) ~ ., pbc, ntree = 1000, nsplit = 3)
o.r <- rev(order(rsf.out$importance))
imp <- rsf.out$importance[o.r]
imp.events <- rep(0, length(imp))
events <- which(rsf.out$cens == 1)
imp.events <-
 vimp(rsf.out, names(imp), events, joint = FALSE)$importance
imp.all <- as.data.frame(cbind(imp.events = imp.events, imp = imp))
print(round(imp.all, 3))

#------------------------------------------------------------------------
# Estimate variability of VIMP in two ways (PBC data):
# (i)  Monte Carlo:  Estimates variability of the procedure
# (ii) Bootstrap:    Estimates statistical variability

data(pbc, package = "randomSurvivalForest") 
rsf.out <- rsf(Surv(days,status) ~ ., pbc, ntree = 1000, nsplit = 3)
o.r <- rev(order(rsf.out$importance))
imp.names <- names(rsf.out$importance[o.r])
subset.index <- 1:nrow(rsf.out$predictors)
imp.mc <- imp.boot <- NULL
for (k in 1:100) {
  cat("iteration:", k , "\n")
  imp.mc <-
    cbind(imp.mc, vimp(rsf.out, imp.names, joint = FALSE)$importance)
  imp.boot <-
    cbind(imp.boot, vimp(rsf.out, imp.names,
    subset = sample(subset.index, replace = TRUE), joint = FALSE)$importance)
}
imp.mc <- as.data.frame(cbind(imp.mean = apply(imp.mc, 1, mean),
                    imp.sd = apply(imp.mc, 1, sd)))
imp.boot <- as.data.frame(cbind(imp.mean = apply(imp.boot, 1, mean),
                    imp.sd = apply(imp.boot, 1, sd)))
print(round(imp.mc, 3))
print(round(imp.boot, 3))

## End(Not run)

[Package randomSurvivalForest version 3.6.3 Index]