exactTest {edgeR}R Documentation

Exact Tests for Differences between Two Groups of Negative-Binomial Counts

Description

Compute genewise exact tests for differences in the means between two groups of negative-binomially distributed counts.

Usage

exactTest(object, pair=NULL, dispersion="auto", rejection.region="doubletail", big.count=900)
exactTestDoubleTail(y1, y2, dispersion=0, big.count=900)
exactTestBySmallP(y1, y2, dispersion=0, big.count=900)
exactTestByDeviance(y1, y2, dispersion=0, big.count=900)
exactTestBetaApprox(y1, y2, dispersion=0)

Arguments

object

an object of class DGEList.

pair

vector of length two, either numeric or character, providing the pair of groups to be compared; if a character vector, then should be the names of two groups (e.g. two levels of object$samples$group); if numeric, then groups to be compared are chosen by finding the levels of object$samples$group corresponding to those numeric values and using those levels as the groups to be compared; if NULL, then first two levels of object$samples$group (a factor) are used. Note that the first group listed in the pair is the baseline for the comparison—so if the pair is c("A","B") then the comparison is B - A, so genes with positive log-fold change are up-regulated in group B compared with group A (and vice versa for genes with negative log-fold change).

dispersion

either a numeric vector of dispersions or a character string indicating that dispersions should be taken from the data object. If a numeric vector, then can be either of length one or of length equal to the number of tags. Allowable character values are "common", "tagwise" or "auto". "common" uses object$common.dispersion, "tagwise" uses object$tagwise.dispersion and "auto" uses tagwise when available and otherwise common.

rejection.region

type of rejection region for two-sided exact test. Possible values are "doubletail", "smallp" or "deviance".

big.count

count size above which asymptotic beta approximation will be used.

y1

numeric matrix of counts for the first the two experimental groups to be tested for differences. Rows correspond to genes or transcripts and columns to libraries. Libraries are assumed to be equal in size - e.g. adjusted pseudocounts from the output of equalizeLibSizes.

y2

numeric matrix of counts for the second of the two experimental groups to be tested for differences. Rows correspond to genes or transcripts and columns to libraries. Libraries are assumed to be equal in size - e.g. adjusted pseudocounts from the output of equalizeLibSizes. Must have the same number of rows as y1.

Details

The functions test for differential expression between two groups of count libraries. They implement the exact test proposed by Robinson and Smyth (2008) for a difference in mean between two groups of negative binomial random variables. The functions accept two groups of count libraries, and a test is performed for each row of data. For each row, the test is conditional on the sum of counts for that row. The test can be viewed as a generalization of the well-known exact binomial test, implemented in the function binom.test in the stats package, but generalized to overdispersed counts.

The low level functions exactTestDoubleTail, exactTestBetaApprox, exactTestBySmallP and exactTestByDeviance all assume that the libraries have been normalized to have the same size (expected column sum under the null hypothesis). The higher level function exactTest is intended to be called by users. This has a more object-orientated flavor and produces an object containing all the necessary components for downstream analysis. exactTest equalizes the library sizes using equalizeLibSizes before calling one of the low level functions.

The functions exactTestDoubleTail, exactTestBySmallP and exactTestByDeviance correspond to different ways to define the two-sided rejection region when the two groups have different numbers of samples. exactTestBySmallP implements the method of small probabilities as proposed by Robinson and Smyth (2008). This method corresponds to binom.test when the dispersion is near zero, but gives poor results when the dispersion is very large. exactTestDoubleTail computes two-sided p-values by doubling the smaller tail probability. exactTestByDeviance uses the deviance goodness of fit statistics to define the rejection region, and is therefore equivalent to a conditional likelihood ratio test. This has good statistical properties but is relatively slow to compute. For general remarks on different types of rejection regions for exact tests see Gibbons and Pratt (1975).

exactTestBetaApprox implements an asymptotical beta distribution approximation to the conditional count distribution.

Value

exactTestDoubleTail and friends produce a numeric vector of genewise p-values, one for each row of y1 and y2.

exactTest produces an object of class DGEExact containing the following components:

table

a data frame containing the elements logConc, the log-average concentration/abundance for each tag in the two groups being compared, logFC, the log-abundance ratio, i.e. fold change, for each tag in the two groups being compared, p.value, exact p-value for differential expression using the NB model

comparison

a vector giving the names of the two groups being compared

genes

a data frame containing information about each transcript; taken from object and can be NULL

Author(s)

Mark Robinson, Davis McCarthy, Gordon Smyth

References

Robinson MD and Smyth GK (2008). Small-sample estimation of negative binomial dispersion, with applications to SAGE data. Biostatistics, 9, 321-332.

Gibbons, JD and Pratt, JW (1975). P-values: interpretation and methodology. The American Statistician 29, 20-25.

See Also

equalizeLibSizes, binomTest

Examples

# generate raw counts from NB, create list object
y <- matrix(rnbinom(80,size=1/0.2,mu=10),nrow=20,ncol=4)
rownames(y) <- paste("Gene",1:nrow(y),sep=".")
group <- factor(c(1,1,2,2))
d <- DGEList(counts=y,group=group,lib.size=rep(1000,4))

# estimate dispersions and find differences in expression
d <- estimateCommonDisp(d)
d <- estimateTagwiseDisp(d)
de <- exactTest(d)
topTags(de)

# same example using low level exactTest function directly
p.value <- exactTestDoubleTail(y[,1:2],y[,3:4],dispersion=0.2)

[Package edgeR version 2.4.3 Index]