max.subtree {randomSurvivalForest}R Documentation

Extract Maximal Subtree Information

Description

Extract maximal subtree information from a forest. Used for variable selection and identifying interactions between variables.

Usage

    max.subtree(object, max.order = 2, sub.order = FALSE, ...)

Arguments

object

An object of class (rsf, grow) or (rsf, forest).

max.order

Non-negative integer specifying the target number of order depths. Default is to return the first and second order depths. Used to identify predictive variables. See details below.

sub.order

Set this value to TRUE to return the minimal depth of each variable relative to another variable. Used to identify interrelationship between variables. See details below.

...

Further arguments passed to or from other methods.

Details

The maximal subtree for a variable x is the largest subtree whose root node splits on x. Thus, all parent nodes of x's maximal subtree have nodes that split on variables other than x. The largest maximal subtree possible is the root node. In general, however, there can be more than one maximal subtree for a variable. A maximal subtree may also not exist if there are no splits on the variable. For details see Ishwaran et al. (2010).

The minimal depth of a maximal subtree measures predictiveness of a variable x. It equals the shortest distance (the depth) from the root node to the parent node of the maximal subtree (zero is the smallest value possible). The smaller the minimal depth, the more impact x has on prediction. The second order depth is the shortest distance from the root node to the second node split using x. To specify the target order depth, use the max.order option (e.g., setting max.order=2 returns the first and second order depths).

Set sub.order=TRUE to obtain the minimal depth of a variable relative to another variable. This returns a p x p matrix, where p is the number of variables, and entries [i][j] are the normalized relative minimal depth of a variable [j] within the maximal subtree for variable [i], where normalization adjusts for the size of [i]'s maximal subtree. Entry [i][i] is the normalized minimal depth of i relative to the root node. The matrix should be read by looking across rows (not down columns) and identifies interrelationship between variables. Small [i][j] entries indicate interactions. See find.interaction for further details.

Applies to competing risk data, but the analysis is non-event specific.

Value

A list with the following components:

mean

Minimal depth averaged over a tree and forest for each variable.

order

Order depths for a given variable up to max.order averaged over a tree and the forest. Matrix of dimension p x max.order. If max.order=0, a matrix of p x ntree is returned containing the minimum maximal subtree distance for each variable by tree.

count

Averaged number of maximal subtrees, normalized by the size of a tree, for each variable.

terminal

Average terminal depth of each tree.

nodesAtDepth

Number of nodes per depth per tree. Matrix of dimension maxDepth x ntree.

subOrder

Average minimal depth of a variable relative to another variable. Matrix of dimension p x p. Can be NULL.

threshold

Threshold used to select variables. Variables whose minimal depth exceeds this value are considered to be noise.

Author(s)

Hemant Ishwaran hemant.ishwaran@gmail.com

Udaya B. Kogalur kogalurshear@gmail.com

References

Ishwaran H., Kogalur U.B., Gorodeski E.Z, Minn A.J. and Lauer M.S. (2010). High-dimensional variable selection for survival data. J. Amer. Statist. Assoc., 105:205-217.

See Also

find.interaction, varSel.

Examples

## Not run: 
# First and second order depths for all variables
data(veteran, package = "randomSurvivalForest")
veteran.out <- rsf(Surv(time, status) ~ . , data = veteran)
v <- max.subtree(veteran.out)

# first and second order depths
print(round(v$order, 3))

# weak variables have minimal depth greater than the following threshold
print(v$threshold)

## End(Not run)

[Package randomSurvivalForest version 3.6.3 Index]