vgam {VGAM} | R Documentation |
Fit a vector generalized additive model (VGAM). This is a large class of models that includes generalized additive models (GAMs) and vector generalized linear models (VGLMs) as special cases.
vgam(formula, family, data = list(), weights = NULL, subset = NULL, na.action = na.fail, etastart = NULL, mustart = NULL, coefstart = NULL, control = vgam.control(...), offset = NULL, method = "vgam.fit", model = FALSE, x.arg = TRUE, y.arg = TRUE, contrasts = NULL, constraints = NULL, extra = list(), qr.arg = FALSE, smart = TRUE, ...)
formula |
a symbolic description of the model to be fit. The RHS of the formula is applied to each linear/additive predictor. Different variables in each linear/additive predictor can be chosen by specifying constraint matrices. |
family |
a function of class |
data |
an optional data frame containing the variables in the model.
By default the variables are taken from
|
weights |
an optional vector or matrix of (prior) weights
to be used in the fitting process.
If |
subset |
an optional logical vector specifying a subset of observations to be used in the fitting process. |
na.action |
a function which indicates what should happen when
the data contain |
etastart, mustart, coefstart |
Same as for |
control |
a list of parameters for controlling the fitting process.
See |
offset |
a vector or M-column matrix of offset values. These are a priori known and are added to the linear/additive predictors during fitting. |
method |
the method to be used in fitting the model.
The default (and presently only) method |
model |
a logical value indicating whether the model frame should be
assigned in the |
x.arg, y.arg |
logical values indicating whether the model matrix and response
vector/matrix used in the fitting process should be assigned in the
|
contrasts |
an optional list. See the |
constraints |
an optional list of constraint matrices. The components of the list
must be named with the term it corresponds to (and it must match in
character format exactly). Each constraint matrix must have M rows, and
be of full-column rank. By default, constraint matrices are the M
by M identity matrix unless arguments in the family function
itself override these values. If |
extra |
an optional list with any extra information that might be needed by the VGAM family function. |
qr.arg |
logical value indicating whether the slot |
smart |
logical value indicating whether smart prediction
( |
... |
further arguments passed into |
A vector generalized additive model (VGAM) is loosely defined as a statistical model that is a function of M additive predictors. The central formula is given by
eta_j = sum_{k=1}^p f_{(j)k}(x_k)
where x_k is the kth explanatory variable
(almost always x_1=1 for the intercept term),
and
f_{(j)k} are smooth functions of x_k that are estimated
by smoothers. The first term in the summation is just the intercept.
Currently only one type of smoother is
implemented and this is called a vector (cubic smoothing spline)
smoother.
Here, j=1,…,M where M is finite.
If all the functions are constrained to be linear then the resulting
model is a vector generalized linear model (VGLM).
VGLMs are best fitted with vglm
.
Vector (cubic smoothing spline) smoothers are represented
by s()
(see s
).
Local regression via lo()
is not
supported. The results of vgam
will differ from the S-PLUS and R
gam
function (in the gam R package) because vgam
uses a different knot selection algorithm. In general, fewer knots
are chosen because the computation becomes expensive when the number
of additive predictors M is large.
The underlying algorithm of VGAMs is iteratively
reweighted least squares (IRLS) and modified vector backfitting
using vector splines. B-splines are used as the basis functions
for the vector (smoothing) splines.
vgam.fit
is the function that actually does the work.
The smoothing code is based on F. O'Sullivan's BART code.
A closely related methodology based on VGAMs called
constrained additive ordination (CAO)
first forms a linear combination of the explanatory variables
(called latent variables) and then fits a GAM to these.
This is implemented in the function cao
for a very
limited choice of family functions.
An object of class "vgam"
(see vgam-class
for further information).
This function can fit a wide variety of statistical models. Some of
these are harder to fit than others because of inherent numerical
difficulties associated with some of them. Successful model fitting
benefits from cumulative experience. Varying the values of arguments
in the VGAM family function itself is a good first step if
difficulties arise, especially if initial values can be inputted.
A second, more general step, is to vary the values of arguments in
vgam.control
.
A third step is to make use of arguments such as etastart
,
coefstart
and mustart
.
Some VGAM family functions end in "ff"
to avoid
interference with other functions, e.g., binomialff
,
poissonff
, gaussianff
,
gammaff
. This is because VGAM family
functions are incompatible with glm
(and also gam
in the gam library and
gam
in the mgcv library).
The smart prediction (smartpred
) library is packed with
the VGAM library.
The theory behind the scaling parameter is currently being made more rigorous, but it it should give the same value as the scale parameter for GLMs.
Thomas W. Yee
Yee, T. W. and Wild, C. J. (1996) Vector generalized additive models. Journal of the Royal Statistical Society, Series B, Methodological, 58, 481–493.
Yee, T. W. (2008)
The VGAM
Package.
R News, 8, 28–39.
Documentation accompanying the VGAM package at http://www.stat.auckland.ac.nz/~yee contains further information and examples.
vgam.control
,
vgam-class
,
vglmff-class
,
plotvgam
,
vglm
,
s
,
vsmooth.spline
,
cao
.
# Nonparametric proportional odds model pneumo = transform(pneumo, let = log(exposure.time)) vgam(cbind(normal, mild, severe) ~ s(let), cumulative(parallel = TRUE), pneumo) # Nonparametric logistic regression fit = vgam(agaaus ~ s(altitude, df = 2), binomialff, hunua) ## Not run: plot(fit, se = TRUE) pfit = predict(fit, type = "terms", raw = TRUE, se = TRUE) names(pfit) head(pfit$fitted) head(pfit$se.fit) pfit$df pfit$sigma # Fit two species simultaneously fit2 = vgam(cbind(agaaus, kniexc) ~ s(altitude, df = c(2, 3)), binomialff(mv = TRUE), hunua) coef(fit2, matrix = TRUE) # Not really interpretable ## Not run: plot(fit2, se = TRUE, overlay = TRUE, lcol = 1:2, scol = 1:2) ooo = with(hunua, order(altitude)) with(hunua, matplot(altitude[ooo], fitted(fit2)[ooo,], ylim = c(0, .8), xlab = "Altitude (m)", ylab = "Probability of presence", las = 1, main = "Two plant species' response curves", type = "l", lwd = 2)) with(hunua, rug(altitude)) ## End(Not run)