bin.dispersion {edgeR}R Documentation

Estimate Common Dispersion for Negative Binomial GLMs in Bins of Genes Sorted by Overall Abundance

Description

Estimates the common dispersion parameter for each of a number of bins of data for a DGE dataset. Genes are sorted into bins based on overall expression level. For multiple-group (one-way layout) experimental designs, conditional maximum likelihood (CML) methods can be used. For general experimental designs the binned common dispersions we can use Cox-Reid approximate conditional inference, Pearson or deviance estimators for a negative binomial generalized linear model.

Usage

binCMLDispersion(y, nbins=50)
binGLMDispersion(y, design, min.n=500, offset=NULL,  method="CoxReid", ...)

Arguments

y

an object that contains the raw counts for each library (the measure of expression level); it can either be a matrix of counts, or a DGEList object with (at least) elements counts (table of unadjusted counts) and samples (data frame containing information about experimental group, library size and normalization factor for the library size)

nbins

scalar, the number of bins for which to compute common dispersions. Default is 50 bins.

design

numeric matrix giving the design matrix for the GLM that is to be fit.

min.n

scalar, the minimum number of genes to be included in each bin.

offset

(optional) numeric scalar, vector or matrix giving the offset (in addition to the log of the effective library size) that is to be included in the NB GLM for the transcripts. If a scalar, then this value will be used as an offset for all transcripts and libraries. If a vector, it should be have length equal to the number of libraries, and the same vector of offsets will be used for each transcript. If a matrix, then each library for each transcript can have a unique offset, if desired. Default is NULL. If NULL, then offset is log(lib.size) if y is a matrix or log(y$samples$lib.size * y$samples$norm.factors) if y is a DGEList object.

method

method used to estimated the dispersion. Argument passed to estimateGLMCommonDisp, which calls the functions to do the computations. Possible values are "CoxReid", "Pearson" or "deviance".

...

other arguments are passed to lower-level functions.

Details

To obtain estimates of the common dispersion parameters conditional maximum likelihood (estimateCommonDisp) is used for binCMLDispersion and one of Cox-Reid approximate conditional inference (dispCoxReid), the deviance (dispDeviance) or Pearson (dispPearson) estimates are used for binGLMDispersion. Genes are assigned to bins using the cutWithMinN function to obtain bins spread over the abundance range of the genes while ensuring that each bin has a minimum number of genes, thus permitting reliable estimation of the common dispersion for each bin.

Value

Returns a list with two components:

dispersion

numeric vector providing the common dispersion for each bin

abundance

numeric vector providing the average abundance (expression level) of genes in each bin

Author(s)

Gordon Smyth, Davis McCarthy

References

Cox, DR, and Reid, N (1987). Parameter orthogonality and approximate conditional inference. Journal of the Royal Statistical Society Series B 49, 1-39.

See Also

estimateGLMCommonDisp, dispCoxReid, dispPearson, dispDeviance

Examples

y <- matrix(rnbinom(1000,mu=10,size=10),ncol=4)
d <- DGEList(counts=y,group=c(1,1,2,2),lib.size=c(1000:1003))
design <- model.matrix(~group, data=d$samples) # Define the design matrix for the full model
bindisp.CML <- binCMLDispersion(d, nbins=50)
bindisp.GLM <- binGLMDispersion(d, design, min.n=10)

[Package edgeR version 2.4.3 Index]