% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/Dif_expression_Analysis.R
\name{DTEG.analysis}
\alias{DTEG.analysis}
\title{Run differential TE analysis}
\usage{
DTEG.analysis(
  df.rfp,
  df.rna,
  output.dir = QCfolder(df.rfp),
  target.contrast = design[1],
  design = ORFik::design(df.rfp),
  p.value = 0.05,
  RFP_counts = countTable(df.rfp, "cds", type = "summarized"),
  RNA_counts = countTable(df.rna, "mrna", type = "summarized"),
  batch.effect = FALSE,
  pairs = combn.pairs(unlist(df.rfp[, design])),
  plot.title = "",
  plot.ext = ".pdf",
  width = 6,
  height = 6,
  dot.size = 0.4,
  relative.name = paste0("DTEG_plot", plot.ext),
  complex.categories = FALSE,
  plot_to_console = TRUE,
  fitType = c("parametric", "local", "mean", "glmGamPoi"),
  lfcShrinkType = "normal"
)
}
\arguments{
\item{df.rfp}{a \code{\link{experiment}} of usually Ribo-seq or 80S from TCP-seq.
(the numerator of the experiment, usually having a primary role)}

\item{df.rna}{a \code{\link{experiment}} of usually RNA-seq.
(the denominator of the experiment, usually having a normalizing function)}

\item{output.dir}{character, default \code{QCfolder(df.rfp)}.
output.dir directory to save plots,
plot will be named "TE_between". If NULL, will not save.}

\item{target.contrast}{a character vector, default \code{design[1]}.
The column in the ORFik experiment that represent the comparison contrasts.
By default: the first design factor of the full experimental design.
This is the factor you will do the comparison on. DESeq will normalize
the counts based on the full design, but the log fold change values will
be based on this contrast only. It is usually the 'condition' column.}

\item{design}{a character vector, default \code{design(df.rfp)}.
The full experiment design. Which factors have more than 1 level.
Example: stage column are all HEK293, so it can not be a design factor.
The condition column has 2 possible values, WT and mutant, so it is
a factor of the experiment. Replicates column is not part of design,
that is inserted later with setting \code{batch.effect = TRUE}.
Library type 'libtype' column, can also no be part of initial design,
it is always added inside the function, after initial setup.}

\item{p.value}{a numeric, default 0.05 in interval (0,1). Defines adjusted
p-value to be used as significance threshold for the result groups. I.e.
for exclusive translation group significant subset for p.value = 0.05 means:
TE$padj < 0.05 & Ribo$padj < 0.05 & RNA$padj > 0.05.}

\item{RFP_counts}{a \code{\link{SummarizedExperiment}}, default:
\code{countTable(df.rfp, "cds", type = "summarized")},
unshifted libraries, all transcript CDSs.
If you have pshifted reads and countTables, do:
\code{countTable(df.rfp, "cds", type = "summarized", count.folder = "pshifted")}
Assign a subset if you don't want to analyze all genes.
It is recommended to not subset, to give DESeq2 data for variance analysis.}

\item{RNA_counts}{a SummarizedExperiment, default:
countTable(df.rna, "mrna", type = "summarized"), all transcripts.
Assign a subset if you don't want to analyze all genes.
It is recommended to not subset, to give DESeq2 data for variance analysis.}

\item{batch.effect, }{logical, default TRUE. Makes replicate column of the experiment
part of the design.\cr
If you believe you might have batch effects, keep as TRUE.
Batch effect usually means that you have a strong variance between
biological replicates. Check out \code{\link{pcaExperiment}} and see if replicates
cluster together more than the design factor, to verify if you need to set it to TRUE.}

\item{pairs}{list of character pairs, the experiment contrasts. Default:
\code{combn.pairs(unlist(df.rfp[, target.contrast])}}

\item{plot.title}{title for plots, usually name of experiment etc}

\item{plot.ext}{character, default: ".pdf". Alternatives: ".png" or ".jpg".
Multiple values allowed, if so will save file in each format specified.}

\item{width}{numeric, default 6 (in inches)}

\item{height}{numeric, default 6 (in inches)}

\item{dot.size}{numeric, default 0.4, size of point dots in plot.}

\item{relative.name}{character, Default: \code{paste0("DTEG_plot", plot.ext)}
Relative name of file to be saved in folder specified in output.dir.
Change to .pdf if you want pdf file instead of png.
Multiple values allowed, if so will save file in each format specified.}

\item{complex.categories}{logical, default FALSE. Separate into more groups,
see above for details.}

\item{plot_to_console}{logical, default TRUE. Plot to console before returning,
set to FALSE to save some run time.}

\item{fitType}{either "parametric", "local", "mean", or "glmGamPoi"
for the type of fitting of dispersions to the mean intensity.
See \code{\link{estimateDispersions}} for description.}

\item{lfcShrinkType}{character or NULL. Default "normal", which
shrinkage to apply to results for low count gene subset.
This avoids the problem of extreme fold changes,
 when counts are low. See \link[DESeq2]{lfcShrink}.
A note for DTEG.analysis function: The interaction term (TE),
is not shrunked as this is not counts, but a ratio.}
}
\value{
a data.table with columns:
(contrast variable, gene id, regulation status, log fold changes, p.adjust values,
mean counts, significant (as logical))
}
\description{
Expression analysis of 2 dimensions, usually Ribo-seq vs RNA-seq.\cr
Using an equal reimplementation of the deltaTE algorithm (see reference).\cr
Creates a total of 3 DESeq models (given x is the target.contrast argument)
 (usually 'condition' column) and libraryType is RNA-seq and Ribo-seq):\cr\cr
** \strong{The 3 differential sub models} **
\itemize{
 \item{1. Ribo-seq model : design = ~ x (differences between the x groups in Ribo-seq)}
 \item{2. RNA-seq model: design = ~ x (differences between the x groups in RNA-seq)}
 \item{3. TE model: design = ~ x + libraryType + libraryType:x
 (differences between the x and libraryType groups and the interaction between them)}
}
You need at least 2 groups and 2 replicates per group. By default, the Ribo-seq counts will
be over CDS and RNA-seq counts over whole mRNAs, per transcript. See notes section below
for more details.\cr
}
\details{
Log fold changes and p-values are created from a Walds test on the comparison contrast described bellow.
The RNA-seq and Ribo-seq LFC values are shrunken using DESeq2::lfcShrink(type = "normal"). Note
that the TE LFC values are not shrunken, since these are ratios and not counts
(as following specifications from deltaTE paper).
The adjusted p-values are created using DESEQ pAdjustMethod = "BH" (Benjamini-Hochberg correction).
All other DESEQ2 arguments are default.\cr\cr

Analysis is done between each possible
combination of levels in the target contrast If target contrast is condition column,
with factor levels: WT, mut1 and mut2 with 3 replicates each. You get comparison
of WT vs mut1, WT vs mut2 and mut1 vs mut2. \cr

The respective result categories are defined through 4 main categories,
first some intuition.
The number of ribosomes (Ribo-seq) is significantly different between 2
contrast elements in the model if the relative counts is
statistically higher/lower, for mRNA levels (RNA-seq) it is the same.
So TE is then RFP / RNA which is basically how many ribosomes translated
per mRNA in the sample, if contrast group 1 has TE of 2, it means 2 ribosomes
per mrna fragment, while TE of 4 would be a doubling of 4 ribosomes per mRNA.

Mathematically the groups are defined by the p adjusted values as the
following (te.sign means na_safe(te.padj < p.value),
na_safe is a function where NA values are FALSE for '<=' test
and TRUE for '>' test),
we also use a helper function:
te.sign & rfp.sign & rna.sign, all_models_sign := TRUE.\cr\cr
** \strong{Signicant DTEG Classifications} **
\itemize{
 \item{No change : None of the below categories}
 \item{Translation (only RFP) : te.sign & rfp.sign & !rna.sign}
 \item{Expression (only RNA) : !te.sign & !rfp.sign & rna.sign}
 \item{mRNA abundance : all_models_sign & na_safe(te.lfc * rna.lfc, ">", 0)}
 \item{Inverse (inverse mRNA abundance) : all_models_sign & te.lfc * rna.lfc, "<", 0)}
 \item{Buffering (Stable protein output) : te.sign & !rfp.sign & rna.sign}
 \item{Forwarded (diagonal bottom left to top right) : !te.sign & rfp.sign & rna.sign}
}

If complex.categories is FALSE, then Expression, Inverse and forwarded are defined 'Buffering'.
mRNA abundance is called"Intensified" in original article
For code, of classification, run: View(ORFik:::DTEG_add_regulation_categories).
Feel free to redefine the categories as you want them.

See Figure 1 in the reference article for a clear definition of the groups!\cr
If you do not need isoform variants, subset to longest isoform per gene
either before or in the returned object (See examples).
If you do not have RNA-seq controls, you can still use DESeq on Ribo-seq alone.
\cr The LFC values are shrunken by lfcShrink(type = "normal").\cr \cr
Remember that DESeq by default can not
do global change analysis, it can only find subsets with changes in LFC!
}
\examples{
## Simple example (use ORFik template, then split on Ribo and RNA)
df <- ORFik.template.experiment()
df.rfp <- df[df$libtype == "RFP",]
df.rna <- df[df$libtype == "RNA",]
design(df.rfp) # The experimental design, per libtype
design(df.rfp)[1] # Default target contrast
#dt <- DTEG.analysis(df.rfp, df.rna)
#dt_with_gene_ids <- append_gene_symbols(dt, symbols(df))
## If you want to use the pshifted libs for analysis:
#dt <- DTEG.analysis(df.rfp, df.rna,
#                    RFP_counts = countTable(df.rfp, region = "cds",
#                       type = "summarized", count.folder = "pshifted"))
## Restrict DTEGs by log fold change (LFC):
## subset to abs(LFC) < 1.5 for both rfp and rna
#dt[abs(rfp) < 1.5 & abs(rna) < 1.5, Regulation := "No change"]

## Only longest isoform per gene:
#tx_longest <- filterTranscripts(df.rfp, 0, 1, 0)
#dt <- dt[id \%in\% tx_longest,]
## Convert to gene id
#dt[, id := txNamesToGeneNames(id, df.rfp)]
## To get by gene symbol, use biomaRt conversion
## To flip directionality of contrast pair nr 2:
#design <- "condition"
#pairs <- combn.pairs(unlist(df.rfp[, design])
#pairs[[2]] <- rev(pars[[2]])
#dt <- DTEG.analysis(df.rfp, df.rna,
#                    RFP_counts = countTable(df.rfp, region = "cds",
#                       type = "summarized", count.folder = "pshifted"),
#                       pairs = pairs)
}
\references{
\url{https://doi.org/10.1002/cpmb.108}
}
\seealso{
Other DifferentialExpression: 
\code{\link{DEG.plot.static}()},
\code{\link{DEG_model}()},
\code{\link{DTEG.plot}()},
\code{\link{te.table}()},
\code{\link{te_rna.plot}()}
}
\concept{DifferentialExpression}
