% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/COCOA.R
\name{getMetaRegionProfile}
\alias{getMetaRegionProfile}
\title{Create a "meta-region" profile}
\usage{
getMetaRegionProfile(
  signal,
  signalCoord,
  regionSet,
  signalCol = c("PC1", "PC2"),
  signalCoordType = "default",
  binNum = 21,
  verbose = TRUE,
  aggrMethod = "default",
  absVal = TRUE
)
}
\arguments{
\item{signal}{Matrix of feature contribution scores (the contribution of 
each epigenetic feature to each target variable). One named column for each 
target variable.
One row for each original epigenetic feature (should be same order 
as original data/signalCoord). For (an unsupervised) example, if PCA was
done on epigenetic data and the
goal was to find region sets associated with the principal components, you 
could use the x$rotation output of prcomp(epigenetic data) as the
feature contribution scores/`signal` parameter.}

\item{signalCoord}{A GRanges object or data frame with coordinates 
for the genomic signal/original epigenetic data. 
Coordinates should be in the 
same order as the original data and the feature contribution scores 
(each item/row in signalCoord
corresponds to a row in signal). If a data.frame, 
must have chr and start columns (optionally can have end column, 
depending on the epigenetic data type).}

\item{regionSet}{A genomic ranges (GRanges) object with regions corresponding
to the same biological annotation.}

\item{signalCol}{A character vector with the names of the sample variables
of interest/target variables (e.g. PCs or sample phenotypes).}

\item{signalCoordType}{Character. Can be "default", "singleBase", or 
"multiBase". This describes whether the coordinates for `signal` 
(`signalCoord`) are each a single base (e.g. as for DNA methylation)
or a region/multiple bases (e.g. as for chromatin accessibility). 
Different scoring
options are available for each type of data. If "default" is given,
the type of coordinates will be detected automatically. For "default", if each
coordinate start value equals the coordinate end value 
(all(start(signalCoord) == end(signalCoord))), "singleBase"
will be used. Otherwise, "multiBase" will be used.}

\item{binNum}{Number of bins to split each region into when
making the aggregate profile. More bins will
give a higher resolution but perhaps more noisy profile.}

\item{verbose}{A "logical" object. Whether progress 
of the function should be shown. One
bar indicates the region set is completed.}

\item{aggrMethod}{character. A character object with the aggregation method.
Similar to aggregateSignalGRList `scoringMetric` parameter.
There are different methods available for 
signalCoordType="singleBase" vs  signalCoordType="multiBase".
For "singleBase", the available methods are "regionMean", 
"regionMedian", "simpleMean", and "simpleMedian". 
The default method is "regionMean".
For "multiBase", the methods are "proportionWeightedMean", 
"simpleMean", and "simpleMedian". The default is "proportionWeightedMean".
"regionMean" is a weighted
average of the signal, weighted by region (absolute value of signal 
if absVal=TRUE). First the signal is
averaged within each regionSet region, 
then all the regions are averaged. With
"regionMean" method, be cautious in interpretation for
region sets with low number of regions that overlap signalCoord. The
"regionMedian" method is the same as "regionMean" but the median is taken
at each step instead of the mean.
The "simpleMean"
method is just the unweighted average of all (absolute) signal values that
overlap the given region set. For multiBase data, this includes
signal regions that overlap a regionSet region at all (1 base
overlap or more) and the signal for each overlapping region is
given the same weight for the average regardless of how much it overlaps.
The "simpleMedian" method is the same as "simpleMean" but takes the median 
instead of the mean. 
"proportionWeightedMean" is a weighted average of all signalCoord 
regions that overlap with regionSet regions. For each signalCoord region
that overlaps with a regionSet region, we calculate what proportion
of the regionSet region is covered. Then this proportion is used to
weight the signal value when calculating the mean. 
The denominator of the mean
is the sum of all the proportion overlaps.}

\item{absVal}{Logical. If TRUE, take the absolute value of values in
signal. Choose TRUE if you think there may be some 
genomic loci in a region set that will increase and others
will decrease (if there may be anticorrelation between
regions in a region set). Choose FALSE if you expect regions in a 
given region set to all change in the same direction (all be positively
correlated with each other).}
}
\value{
A data.frame with the binned meta-region profile,
one row per bin. columns: binID and one column for each target variable
in signalCol. The function will return NULL if there
is no overlap between signalCoord and any of the bin groups that come 
from regionSet (e.g. none of the bin1's overlapped signalCoord, 
NULL returned).
}
\description{
This profile can show enrichment 
of genomic signals with high feature contribution scores 
in the region set but not in the
surrounding genome, suggesting that variation is linked specifically
to that region set.
}
\details{
All regions in a given region set 
are combined into a single aggregate profile. Regions in `regionSet` 
should be
expanded on each side to include a wider area of the genome around
the regions of interest (see example and vignettes). 
To make the profile, first we optionally take 
the absolute value of `signal` (`absVal` parameter). 
Then each expanded regionSet region is
split into `binNum` bins. The corresponding 
bins from each region
(e.g. all bin1's, all bin2's, etc.) are grouped.  
All overlapping values from `signal` are 
aggregated in each bin group according to the `aggrMethod` parameter to 
get a meta-region profile. Since DNA strand information is not considered, 
the profile is averaged symmetrically around the center.
A peak in the middle of this profile suggests
that variability is specific to the region set of interest and is 
not a product of the surrounding genome. A region set can still be
significant even if it does not have a peak. For example, some
histone modification region sets may be in large genomic blocks
and not show a peak, despite having variation across samples.
}
\examples{
data("brcaATACCoord1")
data("brcaATACData1")
data("esr1_chr1")
featureContributionScores <- prcomp(t(brcaATACData1))$rotation
esr1_chr1_expanded <- resize(esr1_chr1, 12000, fix="center")
mrProfile <- getMetaRegionProfile(signal=featureContributionScores,
                                  signalCoord=brcaATACCoord1,
                                  regionSet=esr1_chr1_expanded,
                                  signalCol=c("PC1", "PC2"),
                                  binNum=21)
}
