% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/erssa.R
\name{erssa}
\alias{erssa}
\title{Empirical RNA-seq Sample Size Analysis}
\usage{
erssa(
  count_table = NULL,
  condition_table = NULL,
  DE_ctrl_cond = NULL,
  filter_cutoff = 1,
  counts_filtered = FALSE,
  comb_gen_repeat = 30,
  DE_software = "edgeR",
  DE_cutoff_stat = 0.05,
  DE_cutoff_Abs_logFC = 1,
  DE_save_table = FALSE,
  marginalPlot_stat = "median",
  TPR_FPR_stat = "mean",
  path = ".",
  num_workers = 1,
  save_log = FALSE,
  save_plot = TRUE
)
}
\arguments{
\item{count_table}{A RNA-seq count matrix with genes on each row and samples
on each column. If count_table has already been filtered to remove non- or
low-expressing genes, then counts_filtered argument should be changed to
TRUE.}

\item{condition_table}{A condition table with two columns and each sample as
a row. Column 1 contains sample names and Column 2 contains sample
conditions (e.g. Control, Treatment).}

\item{DE_ctrl_cond}{The name of control condition in the comparison. Must be
one of the two conditions in the condition table.}

\item{filter_cutoff}{The average CPM threshold set for filtering genes.
Default to 1.}

\item{counts_filtered}{Boolean. Whether count table has already been
filtered. Default = FALSE with the function run filtering by average CPM
at the cutoff specified by filter_cutoff value.}

\item{comb_gen_repeat}{The number of maximum unique combinations to generate
at each replicate level. More tests will be performed with a bigger value,
but run time also increases linearly. Default set to 30 unique combinations
at maximum.}

\item{DE_software}{The name of DE analysis software to use. Current options
include "edgeR" and "DESeq2". Default to "edgeR".}

\item{DE_cutoff_stat}{The cutoff in FDR or adjusted p-value used to
determine whether a gene is differentially expressed. Genes with lower
FDR or adjusted p-value pass the cutoff. Default = 0.05.}

\item{DE_cutoff_Abs_logFC}{The cutoff in abs(log2FoldChange) for differential
expression consideration. Genes with higher abs(log2FoldChange) pass the
cutoff. Default = 1.}

\item{DE_save_table}{Boolean. The results of differential expression tests
can be saved to the drive for further analysis. Default setting does not
save the results to save drive space. Default = FALSE.}

\item{marginalPlot_stat}{The statistic used for plotting of values in
marginal plot function. Options include 'mean', 'median'. Default='median'.}

\item{TPR_FPR_stat}{The statistics used to summarize TPR and FPR at
each replicate level in ggplot2_TPR_FPRPlot function. Options include
'mean', 'median'. Default = 'mean'.}

\item{path}{The path to which the plots and results will be saved. Default
to current working directory.}

\item{num_workers}{Number of nodes to use for parallel computing the DE tests}

\item{save_log}{Boolean. Whether to save runtime parameters in log file.
Defualt to false.}

\item{save_plot}{Boolean. Wehther to save ggplot2 plots to drive. Default to
true.}
}
\value{
A list of objects generated during the analysis is returned:
\itemize{
 \item{count_table.filtered}{filtered count table}
 \item{samp.name.comb}{the samples involved in each statistical test}
 \item{list.of.DE.genes}{list of DE genes in each statistical test}
 \item{gg.dotplot.obj}{a list of objects that can be used to recreate the
 dot plot. See function ggplot2_dotplot manual for more detail.}
 \item{gg.marinPlot.obj}{a list of objects that can be used to recreate the
 marginal num. of DE genes plot. See function ggplot2_marginPlot manual
 for more detail.}
 \item{gg.intersectPlot.obj}{a list of objects that can be used to
 recreate the num. of intersect genes plot. See function
 ggplot2_intersectPlot manual for more detail.}
 \item{gg.TPR_FPRPlot.obj}{a list of objects that can be used to
 recreate the TPR vs. FPR plot. See function
 ggplot2_TPR_FPRPlot manual for more detail.}
          }
}
\description{
ERSSA is a package designed to test whether an currently available RNA-seq
dataset has sufficient biological replicates to detect a majority of
differentially expressed (DE) genes between two conditions. Base on the
number of biological replicates available, the algorithm subsamples at
step-wise replicate levels and uses existing differentially expression
analysis softwares (e.g. edgeR and DESeq2) to identify the number of DE
genes. This process is repeated for a given number of times with unique
combinations of samples to generate a distribution of DE genes at each
replicate level. Compare to existing RNA-seq sample size analysis
algorithms, ERSSA does not rely on any a priori assumptions about the
dataset, but rather uses an user-supplied pilot RNA-seq dataset to
determine whether the current replicate level is sufficient to detect a
majority of DE genes.
}
\details{
\code{erssa} function is a wrapper that calls several ERSSA functions in
sequence. For additional description of the functions called, please see
their respective manual.

For the majority of current RNA-seq analysis, RNA-seq samples are aligned to
the reference, followed by running quantification packages to generate
count tables. ERSSA can then take the unfiltered count table and remove non-
to low-expressing genes with count_filter function. Alternatively, a
filtered count table can be supplied and filtering will be skipped.

Next, unique combinations of samples at various biological replicate levels
will be generated by comb_gen function and passed to a differential
expression analysis software for statistical testing for DE genes. The
pipeline currently supports edgeR and DESeq2, but additional software
support can be easily added.

The generated differential expression results are then analyzed by several
plotting functions briefly described here. ggplot2_dotplot function plots
the trend in DE gene identification. ggplot2_marginPlot function plots
the marginal change in the number of DE genes as replicate level
increases. ggplot2_intersectPlot function plots the number of DE genes that
is common across combinations. ggplot2_TPR_FPRPlot function plots the TPR
and FPR of DE detection using the full dataset's list of DE gene as the
ground truth. Base on insights from these plots, the user can determine
whether a desirable level of DE gene discovery has been reached.

At the default setting, the results of statistical tests are not saved,
only the list of DE genes is. However, all of the test results can be
optionally saved for further analysis.

At default setting, only one CPU node is employed and depend on the number
of tests that needs to be done, the calculations can take some time to
complete. If additional nodes are available, additional nodes can be
employed by specifying the num_workers argument. Parallel computing
requires BiocParallel package.

The results including list of DE genes and ggplot2 objects are returned.
All runtime parameters can optionally be saved in a log file named
"erssa.log".
}
\examples{
# load example dataset containing 1000 genes, 4 replicates and 5 comb. per
# rep. level
data(condition_table.partial, package = "ERSSA")
data(count_table.partial, package = "ERSSA")

# run erssa with the "partial" dataset, use default edgeR for DE
ssa = erssa(count_table.partial, condition_table.partial,
            DE_ctrl_cond='heart')

# run erssa with the "full" dataset containing 10 replicates per heart and
# muscle, all genes included.
# Remove comments to run
#   set.seed(1)
#   data(condition_table.full, package="ERSSA")
#   data(count_table.full, package="ERSSA")
#   ssa = erssa(count_table.full, condition_table.full, DE_ctrl_cond='heart')

}
\references{
Ching, Travers, Sijia Huang, and Lana X. Garmire. “Power Analysis and Sample
Size Estimation for RNA-Seq Differential Expression.” RNA, September 22,
2014. https://doi.org/10.1261/rna.046011.114.

Hoskins, Stephanie Page, Derek Shyr, and Yu Shyr. “Sample Size Calculation
for Differential Expression Analysis of RNA-Seq Data.” In Frontiers of
Biostatistical Methods and Applications in Clinical Oncology, 359–79.
Springer, Singapore, 2017. https://doi.org/10.1007/978-981-10-0126-0_22.
}
\author{
Zixuan Shao, \email{Zixuanshao.zach@gmail.com}
}
