% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/ancombc.R
\name{ancombc}
\alias{ancombc}
\title{Analysis of Compositions of Microbiomes with Bias Correction
(ANCOM-BC)}
\usage{
ancombc(
  data = NULL,
  taxa_are_rows = TRUE,
  assay.type = NULL,
  assay_name = "counts",
  rank = NULL,
  tax_level = NULL,
  aggregate_data = NULL,
  meta_data = NULL,
  pseudo = 1,
  formula,
  p_adj_method = "holm",
  prv_cut = 0.1,
  lib_cut = 0,
  group = NULL,
  struc_zero = FALSE,
  neg_lb = FALSE,
  tol = 1e-05,
  max_iter = 100,
  conserve = FALSE,
  alpha = 0.05,
  global = FALSE,
  n_cl = 1,
  verbose = TRUE
)
}
\arguments{
\item{data}{the input data. The \code{data} parameter should be either a
\code{matrix}, \code{data.frame}, \code{phyloseq} or a \code{TreeSummarizedExperiment}
object. Both \code{phyloseq} and \code{TreeSummarizedExperiment} objects
consist of a feature table (microbial count table), a sample metadata table,
a taxonomy table (optional), and a phylogenetic tree (optional).
If a \code{matrix} or \code{data.frame} is provided, ensure that the row
names of the \code{metadata} match the sample names (column names if
\code{taxa_are_rows} is TRUE, and row names otherwise) in \code{data}.
if a \code{phyloseq} or a \code{TreeSummarizedExperiment} is used, this
standard has already been enforced. For detailed information, refer to
\code{?phyloseq::phyloseq} or
\code{?TreeSummarizedExperiment::TreeSummarizedExperiment}.
It is recommended to use low taxonomic levels, such as OTU or species level,
as the estimation of sampling fractions requires a large number of taxa.}

\item{taxa_are_rows}{logical. Whether taxa are positioned in the rows of the
feature table. Default is TRUE.}

\item{assay.type}{alias for \code{assay_name}.}

\item{assay_name}{character. Name of the count table in the data object
(only applicable if data object is a \code{(Tree)SummarizedExperiment}).
Default is "counts".
See \code{?SummarizedExperiment::assay} for more details.}

\item{rank}{alias for \code{tax_level}.}

\item{tax_level}{character. The taxonomic level of interest. The input data
can be agglomerated at different taxonomic levels based on your research
interest. Default is NULL, i.e., do not perform agglomeration, and the
ANCOM-BC anlysis will be performed at the lowest taxonomic level of the
input \code{data}.}

\item{aggregate_data}{The abundance data that has been aggregated to the desired
taxonomic level. This parameter is required only when the input data is in
\code{matrix} or \code{data.frame} format. For \code{phyloseq} or \code{TreeSummarizedExperiment}
data, aggregation is performed by specifying the \code{tax_level} parameter.}

\item{meta_data}{a \code{data.frame} containing sample metadata.
This parameter is mandatory when the input \code{data} is a generic
\code{matrix} or \code{data.frame}. Ensure that the row names of the \code{metadata} match the
sample names (column names if \code{taxa_are_rows} is TRUE, and row names
otherwise) in \code{data}.}

\item{pseudo}{A small positive value (default: 1) added to all counts
before log transformation to avoid numerical issues caused by log(0).}

\item{formula}{the character string expresses how microbial absolute
abundances for each taxon depend on the variables in metadata. When
specifying the \code{formula}, make sure to include the \code{group} variable
in the formula if it is not NULL.}

\item{p_adj_method}{character. method to adjust p-values. Default is "holm".
Options include "holm", "hochberg", "hommel", "bonferroni", "BH", "BY",
"fdr", "none". See \code{?stats::p.adjust} for more details.}

\item{prv_cut}{a numerical fraction between 0 and 1. Taxa with prevalences
(the proportion of samples in which the taxon is present)
less than \code{prv_cut} will be excluded in the analysis. For example,
if there are 100 samples, and a taxon has nonzero counts present in less than
100*prv_cut samples, it will not be considered in the analysis.
Default is 0.10.}

\item{lib_cut}{a numerical threshold for filtering samples based on library
sizes. Samples with library sizes less than \code{lib_cut} will be
excluded in the analysis. Default is 0, i.e. do not discard any sample.}

\item{group}{character. the name of the group variable in metadata.
The \code{group} parameter should be a character string representing the name
of the group variable in the metadata. The \code{group} variable should be
discrete, meaning it consists of categorical values. Specifying the
\code{group} variable is required if you are interested in detecting
structural zeros and performing global tests. However, if these analyses are
not of interest to you, you can leave the \code{group} parameter as NULL.
If the \code{group} variable of interest contains only two categories, you
can also leave the \code{group} parameter as NULL. Default is NULL.}

\item{struc_zero}{logical. whether to detect structural zeros based on
\code{group}. Default is FALSE.}

\item{neg_lb}{logical. whether to classify a taxon as a structural zero using
its asymptotic lower bound. Default is FALSE.}

\item{tol}{numeric. the iteration convergence tolerance for the E-M
algorithm. Default is 1e-05.}

\item{max_iter}{numeric. the maximum number of iterations for the E-M
algorithm. Default is 100.}

\item{conserve}{logical. whether to use a conservative variance estimator for
the test statistic. It is recommended if the sample size is small and/or
the number of differentially abundant taxa is believed to be large.
Default is FALSE.}

\item{alpha}{numeric. level of significance. Default is 0.05.}

\item{global}{logical. whether to perform the global test. Default is FALSE.}

\item{n_cl}{numeric. The number of nodes to be forked. For details, see
\code{?parallel::makeCluster}. Default is 1 (no parallel computing).}

\item{verbose}{logical. Whether to generate verbose output during the
ANCOM-BC fitting process. Default is FALSE.}
}
\value{
a \code{list} with components:
        \itemize{
        \item{ \code{feature_table}, a \code{data.frame} of pre-processed
        (based on \code{prv_cut} and \code{lib_cut}) microbial count table.}
        \item{ \code{zero_ind}, a logical \code{data.frame} with TRUE
        indicating the taxon is detected to contain structural zeros in
        some specific groups.}
        \item{ \code{samp_frac}, a numeric vector of estimated sampling
        fractions in log scale (natural log).}
        \item{ \code{delta_em}, estimated sample-specific biases
        through E-M algorithm.}
        \item{ \code{delta_wls}, estimated sample-specific biases through
        weighted least squares (WLS) algorithm.}
        \item{ \code{res},  a \code{list} containing ANCOM-BC primary result,
        which consists of:}
        \itemize{
        \item{ \code{lfc}, a \code{data.frame} of log fold changes
        obtained from the ANCOM-BC log-linear (natural log) model.}
        \item{ \code{se}, a \code{data.frame} of standard errors (SEs) of
        \code{lfc}.}
        \item{ \code{W}, a \code{data.frame} of test statistics.
        \code{W = lfc/se}.}
        \item{ \code{p_val}, a \code{data.frame} of p-values. P-values are
        obtained from two-sided Z-test using the test statistic \code{W}.}
        \item{ \code{q_val}, a \code{data.frame} of adjusted p-values.
        Adjusted p-values are obtained by applying \code{p_adj_method}
        to \code{p_val}.}
        \item{ \code{diff_abn}, a logical \code{data.frame}. TRUE if the
        taxon has \code{q_val} less than \code{alpha}.}
        }
        \item{ \code{res_global},  a \code{data.frame} containing ANCOM-BC
        global test result for the variable specified in \code{group},
        each column is:}
        \itemize{
        \item{ \code{W}, test statistics.}
        \item{ \code{p_val}, p-values, which are obtained from two-sided
        Chi-square test using \code{W}.}
        \item{ \code{q_val}, adjusted p-values. Adjusted p-values are
        obtained by applying \code{p_adj_method} to \code{p_val}.}
        \item{ \code{diff_abn}, A logical vector. TRUE if the taxon has
        \code{q_val} less than \code{alpha}.}
        }
        }
}
\description{
Determine taxa whose absolute abundances, per unit volume, of
the ecosystem (e.g., gut) are significantly different with changes in the
covariate of interest (e.g., group). The current version of
\code{ancombc} function implements Analysis of Compositions of Microbiomes
with Bias Correction (ANCOM-BC) in cross-sectional data while allowing
for covariate adjustment.
}
\details{
A taxon is considered to have structural zeros in some (>=1)
groups if it is completely (or nearly completely) missing in these groups.
For instance, suppose there are three groups: g1, g2, and g3.
If the counts of taxon A in g1 are 0 but nonzero in g2 and g3,
then taxon A will be considered to contain structural zeros in g1.
In this example, taxon A is declared to be differentially abundant between
g1 and g2, g1 and g3, and consequently, it is globally differentially
abundant with respect to this group variable.
Such taxa are not further analyzed using ANCOM-BC, but the results are
summarized in the overall summary. For more details about the structural
zeros, please go to the
\href{https://doi.org/10.3389/fmicb.2017.02114}{ANCOM-II} paper.
Setting \code{neg_lb = TRUE} indicates that you are using both criteria
stated in section 3.2 of
\href{https://doi.org/10.3389/fmicb.2017.02114}{ANCOM-II}
to detect structural zeros; otherwise, the algorithm will only use the
equation 1 in section 3.2 for declaring structural zeros. Generally, it is
recommended to set \code{neg_lb = TRUE} when the sample size per group is
relatively large (e.g. > 30).
}
\examples{
library(ANCOMBC)
if (requireNamespace("microbiome", quietly = TRUE)) {
    data(atlas1006, package = "microbiome")
    # subset to baseline
    pseq = phyloseq::subset_samples(atlas1006, time == 0)

    # run ancombc function
    set.seed(123)
    out = ancombc(data = pseq, tax_level = "Family",
                  formula = "age + nationality + bmi_group",
                  p_adj_method = "holm", prv_cut = 0.10, lib_cut = 1000,
                  group = "bmi_group", struc_zero = TRUE, neg_lb = FALSE,
                  tol = 1e-5, max_iter = 100, conserve = TRUE,
                  alpha = 0.05, global = TRUE, n_cl = 1, verbose = TRUE)
} else {
    message("The 'microbiome' package is not installed. Please install it to use this example.")
}

}
\references{
\insertRef{kaul2017analysis}{ANCOMBC}

\insertRef{lin2020analysis}{ANCOMBC}
}
\seealso{
\code{\link{ancom}} \code{\link{ancombc2}}
}
\author{
Huang Lin
}
