% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/rna_quality_control.R
\name{rna_quality_control}
\alias{rna_quality_control}
\alias{computeRnaQcMetrics}
\alias{suggestRnaQcThresholds}
\alias{filterRnaQcMetrics}
\title{Quality control for RNA count data}
\usage{
computeRnaQcMetrics(x, subsets, num.threads = 1)

suggestRnaQcThresholds(metrics, block = NULL, num.mads = 3)

filterRnaQcMetrics(thresholds, metrics, block = NULL)
}
\arguments{
\item{x}{A matrix-like object where rows are genes and columns are cells.
Values are expected to be counts.}

\item{subsets}{Named list of vectors specifying gene subsets of interest, typically for control-like features like mitochondrial genes or spike-in transcripts.
Each vector may be logical (whether to keep each row), integer (row indices) or character (row names).
For character vectors, strings not present in \code{rownames(x)} are ignored.}

\item{num.threads}{Integer scalar specifying the number of threads to use.}

\item{metrics}{\link[S4Vectors]{DataFrame} of per-cell QC metrics.
This should have the same structure as the return value of \code{computeRnaQcMetrics}.}

\item{block}{Factor specifying the block of origin (e.g., batch, sample) for each cell in \code{metrics}.
Alternatively \code{NULL} if all cells are from the same block.

For \code{filterRnaQcMetrics}, a blocking factor should be provided if \code{block} was used to construct \code{thresholds}.}

\item{num.mads}{Number of median from the median, to define the threshold for outliers in each metric.}

\item{thresholds}{List with the same structure as produced by \code{suggestRnaQcThresholds}.}
}
\value{
For \code{computeRnaQcMetrics}, a \link[S4Vectors]{DataFrame} is returned with one row per cell in \code{x}.
This contains the following columns:
\itemize{
\item \code{sum}, a numeric vector containing the total RNA count for each cell.
This represents the efficiency of library preparation and sequencing.
Low totals indicate that the library was not successfully captured.
\item \code{detected}, an integer vector containing the number of detected genes per cell.
This also quantifies library preparation efficiency but with greater focus on capturing transcriptional complexity.
\item \code{subsets}, a nested DataFrame where each column corresponds to a feature subset and is a numeric vector containing the proportion of counts in that subset.
The exact interpretation of which depends on the nature of the subset.
For example, if one subset contains all genes on the mitochondrial chromosome, higher proportions are representative of cell damage;
the assumption is that cytoplasmic transcripts leak through tears in the cell membrane while the mitochondria are still trapped inside.
The proportion of spike-in transcripts can be interpreted in a similar manner, where the loss of endogenous transcripts results in higher spike-in proportions.
}
Each vector is of length equal to the number of cells.

For \code{suggestRnaQcThresholds}, a named list is returned.
\itemize{
\item If \code{block=NULL}, the list contains:
\itemize{
\item \code{sum}, a numeric scalar containing the lower bound on the sum.
This is defined as \code{num.mads} MADs below the median of the log-transformed metrics across all cells.
\item \code{detected}, a numeric scalar containing the lower bound on the number of detected genes. 
This is defined as \code{num.mads} MADs below the median of the log-transformed metrics across all cells.
\item \code{subsets}, a numeric vector containing the upper bound on the sum of counts in each feature subset. 
This is defined as \code{num.mads} MADs above the median across all cells.
}
\item Otherwise, if \code{block} is supplied, the list contains:
\itemize{
\item \code{sum}, a numeric vector containing the lower bound on the sum for each blocking level.
Here, the threshold is computed independently for each block, using the same method as the unblocked case.
\item \code{detected}, a numeric vector containing the lower bound on the number of detected genes for each blocking level.
Here, the threshold is computed independently for each block, using the same method as the unblocked case.
\item \code{subsets}, a list of numeric vectors containing the upper bound on the sum of counts in each feature subset for each blocking level.
Here, the threshold is computed independently for each block, using the same method as the unblocked case.
\item \code{block.ids}, a vector containing the identities of the unique blocks.
}
Each vector is of length equal to the number of levels in \code{block} and is named accordingly.
}

For \code{filterRnaQcMetrics}, a logical vector of length \code{ncol(x)} is returned indicating which cells are of high quality. 
High-quality cells are defined as those with sums and detected genes above their respective thresholds and subset proportions below the \code{subsets} threshold.
}
\description{
Compute per-cell QC metrics from an initialized matrix of RNA counts,
and use the metrics to suggest filter thresholds to retain high-quality cells.
}
\examples{
# Mocking a matrix:
library(Matrix)
x <- round(abs(rsparsematrix(1000, 100, 0.1) * 100))

# Mocking up a control set.
sub <- list(mito=rbinom(nrow(x), 1, 0.1) > 0)

qc <- computeRnaQcMetrics(x, sub)
qc

filt <- suggestRnaQcThresholds(qc)
str(filt)

keep <- filterRnaQcMetrics(filt, qc)
summary(keep)

}
\seealso{
The \code{compute_rna_qc_metrics}, \code{compute_rna_qc_filters} and \code{compute_rna_qc_filters_blocked} functions in \url{https://libscran.github.io/scran_qc/}.

\code{\link{quickRnaQc.se}}, to run all of the RNA-related QC functions on a \link[SummarizedExperiment]{SummarizedExperiment}.
}
\author{
Aaron Lun
}
