% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/testDA_GLMM.R
\name{testDA_GLMM}
\alias{testDA_GLMM}
\title{Test for differential abundance: method 'diffcyt-DA-GLMM'}
\usage{
testDA_GLMM(
  d_counts,
  formula,
  contrast,
  min_cells = 3,
  min_samples = NULL,
  normalize = FALSE,
  norm_factors = "TMM"
)
}
\arguments{
\item{d_counts}{\code{\link{SummarizedExperiment}} object containing cluster cell
counts, from \code{\link{calcCounts}}.}

\item{formula}{Model formula object, created with \code{\link{createFormula}}. This
should be a list containing three elements: \code{formula}, \code{data}, and
\code{random_terms}: the model formula, data frame of corresponding variables, and
variable indicating whether the model formula contains any random effect terms. See
\code{\link{createFormula}} for details.}

\item{contrast}{Contrast matrix, created with \code{\link{createContrast}}. See
\code{\link{createContrast}} for details.}

\item{min_cells}{Filtering parameter. Default = 3. Clusters are kept for differential
testing if they have at least \code{min_cells} cells in at least \code{min_samples}
samples.}

\item{min_samples}{Filtering parameter. Default = \code{number of samples / 2}, which
is appropriate for two-group comparisons (of equal size). Clusters are kept for
differential testing if they have at least \code{min_cells} cells in at least
\code{min_samples} samples.}

\item{normalize}{Whether to include optional normalization factors to adjust for
composition effects (see details). Default = FALSE.}

\item{norm_factors}{Normalization factors to use, if \code{normalize = TRUE}. Default =
\code{"TMM"}, in which case normalization factors are calculated automatically using
the 'trimmed mean of M-values' (TMM) method from the \code{edgeR} package.
Alternatively, a vector of values can be provided (the values should multiply to 1).}
}
\value{
Returns a new \code{\link{SummarizedExperiment}} object, with differential test
  results stored in the \code{rowData} slot. Results include raw p-values
  (\code{p_val}) and adjusted p-values (\code{p_adj}), which can be used to rank
  clusters by evidence for differential abundance. The results can be accessed with the
  \code{\link{rowData}} accessor function.
}
\description{
Calculate tests for differential abundance of cell populations using method
'diffcyt-DA-GLMM'
}
\details{
Calculates tests for differential abundance of clusters, using generalized linear mixed
models (GLMMs).

This methodology was originally developed and described by Nowicka et al. (2017),
\emph{F1000Research}, and has been modified here to make use of high-resolution
clustering to enable investigation of rare cell populations. Note that unlike the
original method by Nowicka et al., we do not attempt to manually merge clusters into
canonical cell populations. Instead, results are reported at the high-resolution
cluster level, and the interpretation of significant differential clusters is left to
the user via visualizations such as heatmaps (see the package vignette for an example).

This method fits generalized linear mixed models (GLMMs) for each cluster, and
calculates differential tests separately for each cluster. The response variables in
the models are the cluster cell counts, which are assumed to follow a binomial
distribution. There is one model per cluster. We also include a filtering step to
remove clusters with very small numbers of cells, to improve statistical power.

For more details on the statistical methodology, see Nowicka et al. (2017),
\emph{F1000Research} (section 'Differential cell population abundance'.)

The experimental design must be specified using a model formula, which can be created
with \code{\link{createFormula}}. Flexible experimental designs are possible, including
blocking (e.g. paired designs), batch effects, and continuous covariates. Blocking
variables can be included as either random intercept terms or fixed effect terms (see
\code{\link{createFormula}}). For paired designs, we recommend using random intercept
terms to improve statistical power; see Nowicka et al. (2017), \emph{F1000Research} for
details. Batch effects and continuous covariates should be included as fixed effects.
In addition, we include random intercept terms for each sample to account for
overdispersion typically seen in high-dimensional cytometry count data. The
sample-level random intercept terms are known as 'observation-level random effects'
(OLREs); see Nowicka et al. (2017), \emph{F1000Research} for more details.

The contrast matrix specifying the contrast of interest can be created with
\code{\link{createContrast}}. See \code{\link{createContrast}} for more details.

Filtering: Clusters are kept for differential testing if they have at least
\code{min_cells} cells in at least \code{min_samples} samples. This removes clusters
with very low cell counts across conditions, to improve power.

Normalization: Optional normalization factors can be included to adjust for composition
effects in the cluster cell counts per sample. For example, in an extreme case, if
several additional clusters are present in only one condition, while all other clusters
are approximately equally abundant between conditions, then simply normalizing by the
total number of cells per sample will create a false positive differential abundance
signal for the non-differential clusters. (For a detailed explanation in the context of
RNA sequencing gene expression, see Robinson and Oshlack, 2010.) Normalization factors
can be calculated automatically using the 'trimmed mean of M-values' (TMM) method
(Robinson and Oshlack, 2010), implemented in the \code{edgeR} package (see also the
\code{edgeR} User's Guide for details). Alternatively, a vector of values can be
provided (the values should multiply to 1).
}
\examples{
# For a complete workflow example demonstrating each step in the 'diffcyt' pipeline, 
# see the package vignette.

# Function to create random data (one sample)
d_random <- function(n = 20000, mean = 0, sd = 1, ncol = 20, cofactor = 5) {
  d <- sinh(matrix(rnorm(n, mean, sd), ncol = ncol)) * cofactor
  colnames(d) <- paste0("marker", sprintf("\%02d", 1:ncol))
  d
}

# Create random data (without differential signal)
set.seed(123)
d_input <- list(
  sample1 = d_random(), 
  sample2 = d_random(), 
  sample3 = d_random(), 
  sample4 = d_random()
)

# Add differential abundance (DA) signal
ix_DA <- 801:900
ix_cols_type <- 1:10
d_input[[3]][ix_DA, ix_cols_type] <- d_random(n = 1000, mean = 2, ncol = 10)
d_input[[4]][ix_DA, ix_cols_type] <- d_random(n = 1000, mean = 2, ncol = 10)

experiment_info <- data.frame(
  sample_id = factor(paste0("sample", 1:4)), 
  group_id = factor(c("group1", "group1", "group2", "group2")), 
  stringsAsFactors = FALSE
)

marker_info <- data.frame(
  channel_name = paste0("channel", sprintf("\%03d", 1:20)), 
  marker_name = paste0("marker", sprintf("\%02d", 1:20)), 
  marker_class = factor(c(rep("type", 10), rep("state", 10)), 
                        levels = c("type", "state", "none")), 
  stringsAsFactors = FALSE
)

# Prepare data
d_se <- prepareData(d_input, experiment_info, marker_info)

# Transform data
d_se <- transformData(d_se)

# Generate clusters
d_se <- generateClusters(d_se)

# Calculate counts
d_counts <- calcCounts(d_se)

# Create model formula
formula <- createFormula(experiment_info, cols_fixed = "group_id", cols_random = "sample_id")

# Create contrast matrix
contrast <- createContrast(c(0, 1))

# Test for differential abundance (DA) of clusters
res_DA <- testDA_GLMM(d_counts, formula, contrast)

}
