% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/Harman.R
\name{harman}
\alias{harman}
\title{Harman batch correction method}
\usage{
harman(
  datamatrix,
  expt,
  batch,
  limit = 0.95,
  numrepeats = 100000L,
  randseed,
  forceRand = FALSE,
  printInfo = FALSE
)
}
\arguments{
\item{datamatrix}{matrix or data.frame, the data values to correct with
samples in columns and data values in rows. Internally, a data.frame will be
coerced to a matrix. Matrices need to be of type \code{integer} or
\code{double}.}

\item{expt}{vector or factor with the experimental variable of interest
(variance to be kept).}

\item{batch}{vector or factor with the batch variable (variance to be
removed).}

\item{limit}{numeric, confidence limit. Indicates the limit of confidence in
which to stop removing a batch effect. Must be between \code{0} and \code{1}.}

\item{numrepeats}{integer, the number of repeats in which to run the
simulated batch mean distribution estimator using the random selection
algorithm. (N.B. 32 bit Windows versions may have an upper limit of 300000
before catastrophic failure)}

\item{randseed}{integer, the seed for random number generation.}

\item{forceRand}{logical, to enforce Harman to use a random selection
algorithm to compute corrections. Force the simulated mean code to
use random selection of scores to create the simulated batch mean (rather
than full explicit calculation from all permutations).}

\item{printInfo}{logical, whether to print information during computation or
not.}
}
\value{
A \code{harmanresults} S3 object:
\describe{
  \item{factors}{\code{A data.frame} of the \code{expt} and \code{batch}
  vectors}
  \item{parameters}{The harman runtime parameters. See \code{\link{harman}}
  for details}
  \item{stats}{Confidence intervals and the degree of correction for each
  principal component}
  \item{center}{The centering vector returned by \code{\link{prcomp}} with
  \code{center=TRUE}}
  \item{rotation}{The matrix of eigenvectors (by column) returned from
  \code{\link{prcomp}}}
  \item{original}{The original PC scores returned by \code{\link{prcomp}}}
  \item{corrected}{The harman corrected PC scores}
}
}
\description{
Harman is a PCA and constrained optimisation based technique 
that maximises the removal of batch effects from datasets, with the
constraint that the probability of overcorrection (i.e. removing genuine
biological signal along with batch noise) is kept to a fraction which is set
by the end-user (Oytam et al, 2016;
\url{http://dx.doi.org/10.1186/s12859-016-1212-5}).

Harman expects unbounded data, so for example, with HumanMethylation450
arrays do not use the Beta statistic (with values constrained between 0 and
1), instead use the logit transformed M-values.
}
\details{
The \code{datamatrix} needs to be of type \code{integer} or
\code{numeric}, or alternatively a data.frame that can be coerced into one
using \code{\link{as.matrix}}. The matrix is to be constructed with data
values (typically microarray probes or sequencing counts) in rows and samples
in columns, much like the `assayData` slot in the canonical Bioconductor
\code{eSet} object, or any object which inherits from it. The data should
have normalisation and any other global adjustment for noise reduction
(such as background correction) applied prior to using Harman.

For converge, the number of simulations, \code{numrepeats} parameter should
probably should be at least 100,000. The underlying principle of Harman rests
upon PCA, which is a parametric technique. This implies Harman should be
optimal when the data is normally distributed. However, PCA is known to be
rather robust to very non-normal data.

The output \code{harmanresults} object may be presented to summary and data
exploration functions such as \code{\link{plot.harmanresults}} and
\code{\link{summary.harmanresults}} as well as the
\code{\link{reconstructData}} function which creates a corrected matrix of
data with the batch effect removed.
}
\examples{
library(HarmanData)
data(OLF)
expt <- olf.info$Treatment
batch <- olf.info$Batch
olf.harman <- harman(olf.data, expt, batch)
plot(olf.harman)
olf.data.corrected <- reconstructData(olf.harman)

## Reading from a csv file
datafile <- system.file("extdata", "NPM_data_first_1000_rows.csv.gz",
package="Harman")
infofile <- system.file("extdata", "NPM_info.csv.gz", package="Harman")
datamatrix <- read.table(datafile, header=TRUE, sep=",", row.names="probeID")
batches <- read.table(infofile, header=TRUE, sep=",", row.names="Sample")
res <- harman(datamatrix, expt=batches$Treatment, batch=batches$Batch)
arrowPlot(res, 1, 3)
}
\references{
Oytam et al (2016) BMC Bioinformatics 17:1.
DOI: 10.1186/s12859-016-1212-5
}
\seealso{
\code{\link{reconstructData}},
\code{\link{pcaPlot}}, \code{\link{arrowPlot}}
}
