% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/sccomp_remove_outliers.R
\name{sccomp_remove_outliers}
\alias{sccomp_remove_outliers}
\title{sccomp_remove_outliers main}
\usage{
sccomp_remove_outliers(
  .estimate,
  percent_false_positive = 5,
  cores = detectCores(),
  inference_method = attr(.estimate, "inference_method"),
  output_directory = "sccomp_draws_files",
  verbose = TRUE,
  mcmc_seed = sample_seed(),
  max_sampling_iterations = 20000,
  enable_loo = FALSE,
  sig_figs = 9,
  cache_stan_model = sccomp_stan_models_cache_dir,
  approximate_posterior_inference = NULL,
  variational_inference = NULL,
  ...
)
}
\arguments{
\item{.estimate}{A tibble including a cell_group name column, sample name column, read counts column (optional depending on the input class), and factor columns.}

\item{percent_false_positive}{A real number between 0 and 100 (not inclusive), used to identify outliers with a specific false positive rate.}

\item{cores}{Integer, the number of cores to be used for parallel calculations.}

\item{inference_method}{Character string specifying the inference method to use ('pathfinder', 'hmc', or 'variational').}

\item{output_directory}{A character string specifying the output directory for Stan draws.}

\item{verbose}{Logical, whether to print progression details.}

\item{mcmc_seed}{Integer, used for Markov-chain Monte Carlo reproducibility. By default, a random number is sampled from 1 to 999999.}

\item{max_sampling_iterations}{Integer, limits the maximum number of iterations in case a large dataset is used, to limit computation time.}

\item{enable_loo}{Logical, whether to enable model comparison using the R package LOO. This is useful for comparing fits between models, similar to ANOVA.}

\item{sig_figs}{Number of significant figures to use for Stan model output. Default is 9.}

\item{cache_stan_model}{A character string specifying the cache directory for compiled Stan models.
The sccomp version will be automatically appended to ensure version isolation.
Default is \code{sccomp_stan_models_cache_dir} which points to \verb{~/.sccomp_models}.}

\item{approximate_posterior_inference}{DEPRECATED, use the \code{variational_inference} argument.}

\item{variational_inference}{DEPRECATED Logical, whether to use variational Bayes for posterior inference. It is faster and convenient. Setting this argument to \code{FALSE} runs full Bayesian (Hamiltonian Monte Carlo) inference, which is slower but the gold standard.}

\item{...}{Additional arguments passed to the \code{cmdstanr::sample} function.}
}
\value{
A tibble (\code{tbl}), with the following columns:
\itemize{
\item cell_group - The cell groups being tested.
\item parameter - The parameter being estimated from the design matrix described by the input formula_composition and formula_variability.
\item factor - The covariate factor in the formula, if applicable (e.g., not present for Intercept or contrasts).
\item c_lower - Lower (2.5\%) quantile of the posterior distribution for a composition (c) parameter.
\item c_effect - Mean of the posterior distribution for a composition (c) parameter.
\item c_upper - Upper (97.5\%) quantile of the posterior distribution for a composition (c) parameter.
\item c_pH0 - Probability of the c_effect being smaller or bigger than the \code{test_composition_above_logit_fold_change} argument.
\item c_FDR - False discovery rate of the c_effect being smaller or bigger than the \code{test_composition_above_logit_fold_change} argument. False discovery rate for Bayesian models is calculated differently from frequentists models, as detailed in Mangiola et al, PNAS 2023.
\item c_n_eff - Effective sample size, the number of independent draws in the sample. The higher, the better.
\item c_R_k_hat - R statistic, a measure of chain equilibrium, should be within 0.05 of 1.0.
\item v_lower - Lower (2.5\%) quantile of the posterior distribution for a variability (v) parameter.
\item v_effect - Mean of the posterior distribution for a variability (v) parameter.
\item v_upper - Upper (97.5\%) quantile of the posterior distribution for a variability (v) parameter.
\item v_pH0 - Probability of the v_effect being smaller or bigger than the \code{test_composition_above_logit_fold_change} argument.
\item v_FDR - False discovery rate of the v_effect being smaller or bigger than the \code{test_composition_above_logit_fold_change} argument. False discovery rate for Bayesian models is calculated differently from frequentists models, as detailed in Mangiola et al, PNAS 2023.
\item v_n_eff - Effective sample size for a variability (v) parameter.
\item v_R_k_hat - R statistic for a variability (v) parameter, a measure of chain equilibrium.
}

The function also attaches several attributes to the result:
\itemize{
\item count_data - The original count data used in the analysis, stored as an attribute for efficient access.
\item model_input - The model input data used for fitting.
\item formula_composition - The formula used for composition modeling.
\item formula_variability - The formula used for variability modeling.
\item fit - The Stan fit object (if pass_fit = TRUE).
}
}
\description{
The \code{sccomp_remove_outliers} function takes as input a table of cell counts with columns for cell-group identifier, sample identifier, integer count, and factors (continuous or discrete). The user can define a linear model using an input R formula, where the first factor is the factor of interest. Alternatively, \code{sccomp} accepts single-cell data containers (e.g., Seurat, SingleCellExperiment, cell metadata, or group-size) and derives the count data from cell metadata.
}
\examples{

print("cmdstanr is needed to run this example.")
# Note: Before running the example, ensure that the 'cmdstanr' package is installed:
# install.packages("cmdstanr", repos = c("https://stan-dev.r-universe.dev/", getOption("repos")))

\donttest{
  if (instantiate::stan_cmdstan_exists()) {
    data("counts_obj")
    
    estimate = sccomp_estimate(
      counts_obj,
      ~ type,
      ~1,
      "sample",
      "cell_group",
      "count",
      cores = 1
    ) |>
    sccomp_remove_outliers(cores = 1)
  }
}

}
\references{
S. Mangiola, A.J. Roth-Schulze, M. Trussart, E. Zozaya-Valdés, M. Ma, Z. Gao, A.F. Rubin, T.P. Speed, H. Shim, & A.T. Papenfuss, sccomp: Robust differential composition and variability analysis for single-cell data, Proc. Natl. Acad. Sci. U.S.A. 120 (33) e2203828120, https://doi.org/10.1073/pnas.2203828120 (2023).
}
