% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/degCreCore.R
\name{runDegCre}
\alias{runDegCre}
\title{Generate DegCre associations}
\usage{
runDegCre(
  DegGR,
  DegP,
  DegLfc = NULL,
  CreGR,
  CreP,
  CreLfc = NULL,
  reqEffectDirConcord = TRUE,
  padjMethod = "qvalue",
  maxDist = 1e+06,
  verbose = TRUE,
  smallestTestBinSize = 100,
  fracMinKsMedianThresh = 0.2,
  alphaVal = 0.01,
  binNOverride = NULL
)
}
\arguments{
\item{DegGR}{A \link[GenomicRanges]{GRanges} object of gene TSSs. Multiple
TSSs per gene are allowed.}

\item{DegP}{A numeric vector of differential expression p-values for genes
in \code{DegGR}.}

\item{DegLfc}{A numeric vector of log fold-change values of differential
expression for gene in \code{DegGR}. Required when
\code{reqEffectDirConcord = TRUE}. (Default: \code{NULL})}

\item{CreGR}{A \link[GenomicRanges]{GRanges} object of CRE regions.}

\item{CreP}{A numeric vector differential signal p-values for regions in
\code{CreGR}.}

\item{CreLfc}{A numeric vector log fold-change values of differential signal
for regions in \code{CreGR}. Required when
\code{reqEffectDirConcord = TRUE}. (Default: \code{NULL})}

\item{reqEffectDirConcord}{A logical whether to require concordance between
the effect direction between DEG and CRE differential values.
(Default: \code{TRUE})}

\item{padjMethod}{A character value indicating the method for p-value
adjustment. Do not change from default under most circumstances. Can be any
method name accepted by \link[stats]{p.adjust} (Default: \code{qvalue})}

\item{maxDist}{An integer value specifying the maximum distance for
probability calculation of TSS to CRE associations. (Default: \code{1e6})}

\item{verbose}{A logical indicating whether to print messages of step
completion and algorithm results. (Default: \code{TRUE})}

\item{smallestTestBinSize}{An integer value specifying the size
(number of elements) of the smallest distance bin to be considered in the
optimization algorithm. (Default: \code{100})}

\item{fracMinKsMedianThresh}{A numeric value between 0 and 1 specifying the
optimization criterion for the distance bin size algorithm (See Details).
(Default: \code{0.2})}

\item{alphaVal}{A numeric value between 0 and 1 specifying the alpha value
for DEG significance. (Default: \code{0.01})}

\item{binNOverride}{An integer value specifying the number of elements per
distance bin. When specified, overrides distance bin size optimization
(Not recommended). (Default: \code{NULL})}
}
\value{
A named list containing:
\describe{
  \item{degCreHits}{A \link[S4Vectors]{Hits} object with metadata.
  The \link[S4Vectors]{queryHits} of
  \code{degCreHits} reference \code{DegGR}.
  The \link[S4Vectors]{subjectHits} of
  \code{degCreHits} reference \code{CreGR}}
  \item{binHeurOutputs}{List of outputs from the distance binning algorithm.}
  \item{alphaVal}{Numeric alpha value used for DEG significance threshold.}
  \item{DegGR}{\link[GenomicRanges]{GRanges} of input \code{DegGR} with
  added metadata columns "pVal", "pAdj",and possibly "logFC"
  if \code{reqEffectDirConcord==TRUE}. Will overwrite existing metadata
  with same colnames.}
  \item{CreGR}{\link[GenomicRanges]{GRanges} of input \code{CreGR} with
  added metadata columns "pVal", "pAdj",and possibly "logFC" if
  \code{reqEffectDirConcord==TRUE}. Will overwrite existing metadata with
  same colnames.}
}
The degCreHits \link[S4Vectors]{Hits} object metadata has these columns:
\describe{
    \item{assocDist}{Integer of distance in base pairs between the TSS and
    CRE for the association.}
    \item{assocProb}{Numeric from 0 to 1 of association probability.}
    \item{assocProbFDR}{Numeric from 0 to 1 of False discovery rate of
    the association probability exceeding distance only null.}
    \item{rawAssocProb}{Numeric from 0 to 1 of association probability not
    adjusted for DEG significance or shorter associations involving
    this CRE.}
    \item{CreP}{Numeric of differential p-value of the CRE.}
    \item{DegP}{Numeric of differential p-value of the DEG.}
    \item{DegPadj}{Numeric of differential adjusted p-value of the DEG.}
    \item{binAssocDist}{Integer of the maximum association distance cutoff
    for the bin containing the association.}
    \item{numObs}{Integer number of associations in the distance bin
    containing the association.}
    \item{distBinId}{Integer that uniquely identifies the distance
    containing the association.}
}
}
\description{
Create DEG to CRE associations from differential data.
}
\details{
The DegCre algorithm considers experimental data from a perturbation
experiment and produces associations between cis-regulatory elements
(CREs) and differentially expressed genes (DEGs).
The user provides differential expression data such as RNA-seq, and
differential regulatory signal data such as ATAC-seq, DNase
Hypersensitivity, and ChIP-seq.
For RNA-seq analysis, we suggest methods such as
\href{https://bioconductor.org/packages/release/bioc/html/DESeq2.html}{DESeq2}
or \href{https://bioconductor.org/packages/release/bioc/html/edgeR.html}{edgeR}.
For the analysis of differential regulatory data we recommend
\href{https://bioconductor.org/packages/release/bioc/html/csaw.html}{csaw}.
As an example experiment, we use data from McDowell et al. (PMID = 30097539)
in which A549 cells were treated with dexamethasone and control.
RNA-seq and ChIP-seq data were collected at various time points.

A complete description of the mathematical basis of the DegCre core
algorithms is provided in
\href{https://www.biorxiv.org/content/10.1101/2023.10.04.560923v1}{DegCre bioRxiv}.
DegCre takes two inputs. The first is a GRanges of p-values and optionally
log fold-changes associated with DEG TSSs.
The second input is a GRanges of differential signal p-values and optionally
log fold-changes for CRE regions.
DegCre generates a \link[S4Vectors]{Hits} object of all associations between
DEG TSSs and CREs within \code{maxDist}.
Associations are then binned by TSS-to-CRE distance according to an
algorithm that balances resolution (many bins with few members)
versus minimization of the deviance of each bin's CRE p-value distribution
from the global distribution, seleting an optimal bin size.

Next, DegCre applies a non-parametric algorithm to find concordance between
and CRE differential effects within bins and derives an association
probability.
For all association probabilities involving one given CRE, the probabilities
are adjusted to favor associations across shorter distances.
An FDR of the association probability is then estimated. Results are
returned in list containing a \link[S4Vectors]{Hits} object and both
input GRanges.
}
\examples{
#Load required packages.
library(GenomicRanges)

#Load sample data.
data(DexNR3C1)

subDegGR <-
 DexNR3C1$DegGR[which(Seqinfo::seqnames(DexNR3C1$DegGR)=="chr1")]
subCreGR <-
 DexNR3C1$CreGR[which(Seqinfo::seqnames(DexNR3C1$CreGR)=="chr1")]

#With defaults.
degCreResListDexNR3C1 <- runDegCre(DegGR=subDegGR,
                                   DegP=subDegGR$pVal,
                                   DegLfc=subDegGR$logFC,
                                   CreGR=subCreGR,
                                   CreP=subCreGR$pVal,
                                   CreLfc=subCreGR$logFC)

#With custom settings.
modDegCreResList <- runDegCre(DegGR=subDegGR,
                           DegP=subDegGR$pVal,
                           CreGR=subCreGR,
                           CreP=subCreGR$pVal,
                           reqEffectDirConcord=FALSE,
                           maxDist=1e5,
                           alphaVal=0.001)

}
\author{
Brian S. Roberts
}
