mglm {edgeR}R Documentation

Fit Negative Binomial Generalized Linear Model to Multiple Response Vectors

Description

Fit the same log-link negative binomial or Poisson generalized linear model (GLM) to each row of a matrix of counts.

Usage

mglmLS(y, design, dispersion=0, offset=0, start=NULL, tol=1e-5, maxit=50, trace=FALSE)
mglmOneGroup(y, dispersion=0, offset=0, maxit=50, trace=FALSE)
mglmOneWay(y, design=NULL, dispersion=0, offset=0, maxit=50, trace=FALSE)
mglmSimple(y, design, dispersion=0, offset=0, weights=NULL)
mglmLevenberg(y, design, dispersion=0, offset=0, start=NULL)
deviances.function(dispersion)
designAsFactor(design)

Arguments

y

numeric matrix containing the negative binomial counts. Rows for tags and columns for libraries.

design

numeric matrix giving the design matrix of the GLM. Assumed to be full column rank.

dispersion

numeric scalar or vector giving the dispersion parameter for each GLM. Can be a scalar giving one value for all tags, or a vector of length equal to the number of tags giving tag-wise dispersions.

offset

numeric vector or matrix giving the offset that is to be included in the log-linear model predictor. Can be a scalar, a vector of length equal to the number of libraries, or a matrix of the same size as y.

weights

numeric vector or matrix of non-negative quantitative weights. Can be a vector of length equal to the number of libraries, or a matrix of the same size as y.

start

numeric matrix of starting values for the GLM coefficients. Number of rows should agree with y and number of columns should agree with design.

tol

numeric scalar giving the convergence tolerance.

maxit

scalar giving the maximum number of iterations for the Fisher scoring algorithm.

trace

logical, whether or not to information should be output at each iteration.

Details

The functions mglmLS, mglmOneGroup and mglmSimple all fit negative binomial generalized linear models, with the same design matrix but possibly different dispersions, offsets and weights, to a series of response vectors. mglmLS and mglmOneGroup are vectorized in R for fast execution, while mglmSimple simply makes tagwise calls to glm.fit in the stats package. The functions are all low-level functions in that they operate on atomic objects such as matrices. They are used as work-horses by higher-level functions in the edgeR package.

mglmOneGroup fits the null model, with intercept term only, to each response vector. In other words, it treats the libraries as belonging to one group. It implements Fisher scoring with a score-statistic stopping criterion for each tag. Excellent starting values are available for the null model, so this function seldom has any problems with convergence. It is used by other edgeR functions to compute the overall abundance for each tag.

mglmLS fits an arbitrary log-linear model to each response vector. It implements a vectorized approximate scoring algorithm with a likelihood derivative stopping criterion for each tag. A simple line search strategy is used to ensure that the residual deviance is reduced at each iteration. This function is the work-horse of other edgeR functions such as glmFit and glmLRT.

mglmSimple is not vectorized, and simply makes tag-wise calls to glm.fit. This has the advantage that it accesses all the usual information generated by glm.fit. Unfortunately, glm.fit does not always converge, and the tag-wise fitting is relatively slow. mglmLevenberg is similar to mglmSimple, but makes tagwise calls to glmnb.fit in the statmod package instead of glm.fit. glmnb.fit implements a Levenberg-Marquardt modification of the scoring algorithm to prevent divergence.

All these functions treat the dispersion parameter of the negative binomial distribution as a known input.

deviances.function simply chooses the appropriate deviance function to use given a scalar or vector of dispersion parameters. If the dispersion values are zero, then the Poisson deviance function is returned; if the dispersion values are positive, then the negative binomial deviance function is returned.

Value

mglmOneGroup produces a vector of length equal to the number of tags/genes (number of rows of y) providing the single coefficent from the GLM fit for each tag/gene. This can be interpreted as a measure of the 'average expression' level of the tag/gene.

mglmLS produces a list with the following components:

coefficients

matrix of estimated coefficients for the linear models

fitted.values

matrix of fitted values

fail

vector of indices of tags that fail the line search, in that the maximum number of step-halvings in exceeded

not.converged

vector of indices of tags that exceed the iteration limit before satisying the convergence criterion

mglmSimple produces a list with the following components:

coefficients

matrix of estimated coefficients for the linear models

df.residual

vector of residual degrees of freedom for the linear models

deviance

vector of deviances for the linear models

design

matrix giving the experimental design that was used for each of the linear models

offset

scalar, vector or matrix of offset values used for the linear models

dispersion

scalar or vector of the dispersion values used for the linear model fits

weights

matrix of final weights for the observations from the linear model fits

fitted.values

matrix of fitted values

deviances.function returns a function to calculate the deviance as appropriate for the given values of the dispersion.

designAsFactor returns a factor of length equal to nrow(design).

Author(s)

Davis McCarthy, Yunshun Chen, Gordon Smyth

See Also

glmFit, for more object-orientated GLM modelling for DGE data.

Examples

y<-matrix(rnbinom(1000,mu=10,size=2),ncol=4)
dispersion <- 0.1
## Fit the NB GLM to the counts
ave.expression <- mglmOneGroup(y, dispersion=dispersion)
head(ave.expression)
## Fit the NB GLM to the counts with a given design matrix
f1<-factor(c(1,1,2,2))
f2<-factor(c(1,2,1,2))
x<-model.matrix(~f1+f2)
ave.expression <- mglmLS(y, x, dispersion=dispersion)
head(ave.expression$coef)

[Package edgeR version 2.4.3 Index]