glmFit {edgeR}R Documentation

Fit negative binomial generalized linear model for each transcript

Description

Fit a negative binomial generalized linear model (GLM) for each transcript (tag) with the unadjusted counts provided, a value for the dispersion parameter and, optionally, offsets and weights for different libraries or transcripts.

Usage

## S3 method for class 'DGEList'
glmFit(y, design=NULL, dispersion=NULL, offset=NULL, weights=NULL, lib.size=NULL, start=NULL, method="auto", ...)
glmLRT(y, glmfit, coef=ncol(glmfit$design), contrast=NULL)

Arguments

y

an object that contains the raw counts for each library (the measure of expression level); alternatively, a matrix of counts, or a DGEList object with (at least) elements counts (table of unadjusted counts) and samples (data frame containing information about experimental group, library size and normalization factor for the library size)

design

numeric matrix giving the design matrix for the GLM that is to be fit. Must be of full column rank. Defaults to a single column of ones, equivalent to treating the columns as replicate libraries.

dispersion

numeric scalar or vector providing the value for the dispersion parameter that is used in fitting the GLM for each transcript. Can be a common value for all tags, or a vector of values can provide a unique dispersion value for each tag. If NULL (default) then dispersion will be detected and extracted from y, if possible, with order of precedence: tagwise dispersion, trended dispersions, common dispersion.

offset

numeric scalar, vector or matrix giving the offset that is to be included in the NB GLM for the transcripts. Only one of offset and lib.size should be supplied—if both are supplied then offset will be used and lib.size will be ignored. If a scalar, then this value will be used as an offset for all transcripts and libraries. If a vector, it should be have length equal to the number of libraries, and the same vector of offsets will be used for each transcript. If a matrix, then each library for each transcript can have a unique offset, if desired. If NULL (the default) then the log of the effective library size (library size multiplied by normalization factors) will be used as the offsets in the GLMs.

weights

optional numeric matrix giving prior weights for the observations (for each library and transcript) to be used in the GLM calculations. Not supported by methods "linesearch" or "levenberg".

lib.size

optional numeric vector providing the (effective) library size for each library (must have length equal to the number of columns, or libraries, in the matrix of counts). If NULL, then a default is used. If y is a DGEList object then the default for lib.size is the product of the library sizes and the normalization factors (in the samples slot of the object). If y is a simple matrix of counts, then the default for lib.size is the vector of column sums of y.

start

optional numeric matrix of initial estimates for the fitted coefficients

method

which fitting algorithm to use. Possible values are "auto", "linesearch", "levenberg" or "simple".

...

other arguments are passed to lower-level functions, for example to mglmLS.

glmfit

a DGEGLM object, the output from glmFit.

coef

scalar or vector indicating the column(s) of design that are to be dropped when creating the null model for the Likelihood Ratio (LR) Test. Can be numeric or character. If character, the string(s) provided to coef must match a column of the design matrix in the glmfit object passed to glmLRT. The glmLRT fits the null model and then conducts an LR test of the model fit provided in glmfit against the null model defined by the choice of coef. By default, the last column of the design matrix is dropped to form the design matrix for the null model.

contrast

contrast vector for which the test is required, of length equal to the number of columns of design. If specified, then takes precedence over coef.

Details

Given a fixed value for the dispersion parameter, a negative binomial model can be fitted to the counts for each tag/transcript in a dataset. The function glmFit calls the in-built function glm.fit to fit the NB GLM for each tag. Once we have a fit for a given design matrix, glmLRT can be run with a given coefficient or contrast specified and evidence for differential expression assessed using a likelihood ratio test. Tags can be ranked in order of evidence for differential expression, based on the p-value computed for each tag.

Value

glmFit produces an object of class DGEGLM with the following components:

coefficients

matrix of estimated coefficients from the NB model

df.residual

vector giving the residual degrees of freedom for each tag. In theory it can be different for different tags (if there are missing values), but in practice these will usually be identical for each tag.

deviance

vector giving the deviance from the NB model fit for each tag.

design

design matrix used in the NB model fit for each tag.

offset

scalar, vector or matrix giving the offset to use in the NB model for each tag.

samples

data frame providing information about the samples (libraries) in the experiment; taken from the object y.

genes

vector or data frame providing gene information for each tag; taken from the object y.

dispersion

scalar or vector giving the the value of the dispersion parameter used in each tag's NB model fit.

lib.size

vector of library sizes used in the model fit.

weights

matrix of final weights used in the NB model fits for each tag.

fitted.values

matrix of fitted values from the NB model for each tag.

abundance

vector of gene/tag abundances (expression level), on the log2 scale, computed from the mean count for each gene/tag after scaling count by normalized library size.

glmLRT produces an object of class DGELRT with the following components:

table

data frame (table) containing the abundance of each tag (log-concentration, logConc), the log-fold change of expression between conditions/contrasts being tested (logFC), the likelihood ratio statistic (LR.statistic) and the p-value from the LR test (p.value), for each tag in the dataset.

coefficients

matrix of coefficients for the full model defined by the design matrix (i.e. for the full model).

dispersion.used

scalar or vector of the dispersion value(s) used in the GLM fits and LR test.

The DGELRT object also contains all the elements of y except for the table of counts (raw data) and the table of pseudo-counts (if applicable).

Author(s)

Davis McCarthy and Gordon Smyth

See Also

estimateGLMCommonDisp, estimateGLMTrendedDisp or estimateGLMTagwiseDisp for estimating the negative binomial dispersion.

topTags for displaying results from glmLRT.

Examples

nlibs <- 3
ntags <- 100
dispersion.true <- 0.1

# Make first transcript respond to covariate x
x <- 0:2
design <- model.matrix(~x)
beta.true <- cbind(Beta1=2,Beta2=c(2,rep(0,ntags-1)))
mu.true <- 2^(beta.true %*% t(design))

# Generate count data
y <- rnbinom(ntags*nlibs,mu=mu.true,size=1/dispersion.true)
y <- matrix(y,ntags,nlibs)
colnames(y) <- c("x0","x1","x2")
rownames(y) <- paste("Gene",1:ntags,sep="")
d <- DGEList(y)

# Normalize
d <- calcNormFactors(d)

# Fit the NB GLMs
fit <- glmFit(d, design, dispersion=dispersion.true)

# Likelihood ratio tests for trend
results <- glmLRT(d, fit, coef=2)
topTags(results)

# Estimate the dispersion (may be unreliable with so few tags)
d <- estimateGLMCommonDisp(d, design)
d$common.dispersion

[Package edgeR version 2.4.3 Index]