kNN {VIM} | R Documentation |
k-Nearest Neighbour Imputation based on a variation of the Gower Distance for numerical, categorical, ordered and semi-continous variables.
kNN(data, variable = colnames(data), metric = NULL, k = 5, dist_var = colnames(data), weights = NULL, numFun = median, catFun = maxCat, makeNA = NULL, NAcond = NULL, impNA = TRUE, donorcond = NULL, mixed = vector(), trace = FALSE, imp_var = TRUE, imp_suffix = "imp", addRandom = FALSE) sampleCat(x) maxCat(x) gowerD(data.x, data.y = data.x, weights = NULL, numerical, factors, orders, mixed, levOrders) which.minN(x, n)
data |
data.frame or matrix |
variable |
variables where missing values should be imputed |
metric |
metric to be used for calculating the distances between |
k |
number of Nearest Neighbours used |
dist_var |
names or variables to be used for distance calculation |
weights |
weights for the variables for distance calculation |
numFun |
function for aggregating the k Nearest Neighbours in the case of a numerical variable |
catFun |
function for aggregating the k Nearest Neighbours in the case of a categorical variable |
makeNA |
vector of values, that should be converted to NA |
NAcond |
a condition for imputing a NA |
impNA |
TRUE/FALSE whether NA should be imputed |
donorcond |
condition for the donors e.g. ">5" |
trace |
TRUE/FALSE if additional information about the imputation process should be printed |
imp_var |
TRUE/FALSE if a TRUE/FALSE variables for each imputed variable should be created show the imputation status |
imp_suffix |
suffix for the TRUE/FALSE variables showing the imputation status |
addRandom |
TRUE/FALSE if an additional random variable should be added for distance calculation |
x |
factor or character vector / numerical vector for which.minN |
data.x |
data frame or matrix |
data.y |
data frame or matrix |
numerical |
names of numerical variables |
factors |
names of factors |
orders |
names of ordered variables |
mixed |
names of mixed variables |
levOrders |
list of the ordered levels for each factor |
n |
number of ordered smallest values |
The function sampleCat samples with probabilites corresponding to the occurrence of the level in the NNs. The function maxCat chooses the level with the most occurrences and random if the maximum is not unique. The function gowerD is used by kNN to compute the distances for numerical, factor ordered and semi-continous variables. The function which.minN is used by kNN.
the imputed data set.
Alexander Kowarik
data(sleep) kNN(sleep)