find.interaction {randomSurvivalForest} | R Documentation |
Find pairwise interactions between variables.
find.interaction(object, predictorNames = NULL, method = c("maxsubtree", "vimp")[1], sorted = TRUE, npred = NULL, subset = NULL, nrep = 1, rough = FALSE, importance = c("randomsplit", "permute")[1], seed = NULL, do.trace = FALSE, ...)
object |
An object of class |
predictorNames |
Character vector of names of target x-variables. Default is to use all variables. |
method |
Method of analysis: maximal subtree or VIMP. See details below. |
sorted |
Should variables be sorted? |
npred |
Use the first npred ordered variables. Default is to use all variables. |
subset |
Indices indicating which rows of the predictor matrix
to be used (note: this applies to the object predictor
matrix, |
nrep |
Number of Monte Carlo replicates. Applies only when
|
rough |
Should fast approximation be used? Applies only when
|
importance |
Type of variable importance (VIMP). Applies only
when |
seed |
Seed (negative integer) for random number generator. |
do.trace |
Logical. Should trace output be enabled? Integer
values can also be passed. A positive value causes output to be
printed each |
... |
Further arguments passed to or from other methods. |
Using a previously grown forest, identify pairwise interactions for
all pairs of variables from a specified list. There are two
distinct approaches specified by the method
option.
If method
="maxsubtree
", then a maximal subtree
analysis is used. In this case, a matrix is returned where entries
[i][i] are the normalized minimal depth of variable [i] relative to
the root node (normalized w.r.t. the size of the tree) and entries
[i][j] indicate the normalized minimal depth of a variable [j]
w.r.t. the maximal subtree for variable [i] (normalized w.r.t. the
size of [i]'s maximal subtree). Smaller [i][i] entries indicate
predictive variables. Small [i][j] entries having small [i][i]
entries are a sign of an interaction between variable i and j
(note: the user should scan rows, not columns, for small entries).
See Ishwaran et al. (2010) for more details.
If method
="vimp
", then a joint-VIMP approach is used.
Two variables are paired and their paired VIMP calculated (refered
to as Paired importance). The VIMP for each separate variable is
also calculated. The sum of these two values is refered to as
Additive importance. A large positive or negative difference
between Paired and Additive indicates an association worth pursuing
if the VIMP's for each variable are reasonably large. See Ishwaran
(2007) for more details.
Computations might be slow depending upon the size of the data and
the forest. In such cases, consider setting npred
to a
smaller number, or using rough
=TRUE if
method
="vimp
". If method
="maxsubtree
",
consider using a smaller number of trees in the original grow call.
If nrep
is greater than 1, the analysis is repeated
nrep
times and results averaged over the replications
(applies only when method
="vimp
").
For competing risk data, maximal subtree analyses correspond to
unconditional values (i.e., they are non-event specific). Setting
method
="vimp
", however, yields pairwise interactions
for both event and non-event specific settings.
Invisibly, the interaction table (a list for competing risk data) or the maximal subtree matrix.
Hemant Ishwaran hemant.ishwaran@gmail.com
Udaya B. Kogalur kogalurshear@gmail.com
Ishwaran H. (2007). Variable importance in binary regression trees and forests, Electronic J. Statist., 1:519-537.
Ishwaran H., Kogalur U.B., Gorodeski E.Z, Minn A.J. and Lauer M.S. (2010). High-dimensional variable selection for survival data. J. Amer. Statist. Assoc., 105:205-217.
max.subtree
,
vimp
.
## Not run: #------------------------------------------------------------------------ # Maximal subtree approach, top 8 predictors (PBC data). data(pbc, package = "randomSurvivalForest") pbc.out <- rsf(Surv(days,status) ~ ., pbc, nsplit = 10) find.interaction(pbc.out, npred = 8) #------------------------------------------------------------------------ # VIMP approach (PBC data). # Use fast approximation to speed up computations. data(pbc, package = "randomSurvivalForest") pbc.out <- rsf(Surv(days,status) ~ ., pbc, nsplit = 10) find.interaction(pbc.out, method = "vimp", nrep=3, rough=T) #------------------------------------------------------------------------ # Competing risks (WIHS data). data(wihs, package = "randomSurvivalForest") wihs.out <- rsf(Surv(time, status) ~ ., wihs, nsplit = 3, ntree = 200) find.interaction(wihs.out, method = "vimp") ## End(Not run)