\name{crossReactivityProbability}
\alias{crossReactivityProbability}
\alias{crossReactivityPrior}
\title{
Compute the probability that compounds in a compound vs target matrix are
promiscuous binders
}
\description{
Queries a compound vs target sparse matrix as generated by the \code{perTargetMatrix} function,
and computes the probability \eqn{P(theta > threshold)} for each compound, where theta
is the probability that the compound would be active in any given new assay against a novel untested target.
This code implements the Bayesian Modeling of Cross-Reactive Compounds
method described by Dancik, V. et al. (see references). This method assumes
that the number of observed active targets out of total tested targets follows a binomial
distribution. A beta conjugate prior distribution is calculated based on the hit ratios (active/total tested)
for a reference database.
}
\usage{
crossReactivityProbability(inputMatrix, 
                            threshold=0.25,
                            prior=list(hit_ratio_mean=0.0126, hit_ratio_sd=0.0375))
crossReactivityPrior(database, minTargets=20, category=FALSE, activesOnly=FALSE)
}
\arguments{
  \item{inputMatrix}{
    A \code{dgCMatrix} sparse matrix as computed by the \code{perTargetMatrix} function with the option
    \code{useNumericScores = FALSE}. The cross-reactivity probability will be computed for each compound
    (column) based on the active and inactive scores present.
    In most cases, the matrix should be generated with \code{getBioassaySetByCids} rather than \code{getAssays},
    so that it includes all relavent activity data for each compound, rather than a selected set of assays.
}
  \item{threshold}{
    A \code{numeric} value between 0 and 1 reflecting the desired hit ratio cutoff for computing
    the probability a compound is a promiscuous binder. 
    This is the probability \eqn{P(theta > threshold)} if theta is the probability that the compound
    will be a hit in a new assay.
    The default of 0.25 was used in Dancik, V. et al. (see references).
}
  \item{prior}{
    A \code{list} with elements \code{hit_ratio_mean} and \code{hit_ratio_sd} representing
    the mean and standard deviation of hit ratios across a large reference database of
    highly-screened compounds. This can be generated with \code{crossReactivityPrior} and fed
    to \code{crossReactivityProbability}. Computing this for a large database can take a very long
    time, so defaults are provided based on the April 6th 2016 version of the 
    pre-built protein target only PubChem BioAssay database provided for use with bioassayR.
    Priors should be recomputed with appropriate reference data if working with a new type
    of experimental data, i.e. in-vivo rather than in-vitro assays.
}
  \item{database}{
    A \code{BioassayDB} database to query, for calculating a prior probability distribution.
}
  \item{minTargets}{
    The minimum number of distinct screened targets for a compound to be included in the prior probability
    distribution.
}
  \item{category}{
Include only once in prior hit ratio counts any targets which share a common annotation of this category
(as used by the \code{translateTargetId} and \code{loadIdMapping} functions). For example,
with the PubChem BioAssay database one could use "UniProt", "kClust", or "domains" to get
selectivity by targets with unique UniProt identifiers, distinct amino acid sequences, or Pfam domains
respectively (the latter is also known as domain selectivity).
}
  \item{activesOnly}{
logical. Should only compounds with at least one active score be used in computing prior? Defaults to FALSE.
}}
\value{
\code{crossReactivityProbability} returns an \code{numeric} vector containing the probability that the hit ratio
(active targets / total targets) is greater than value \code{threshold} for each
compound in the \code{inputMatrix}.
\code{crossReactivityPrior} returns a \code{list} in the prior format described above.
}
\details{
This function models the hit-ratio theta (fraction of distinct targets
which are active) for a given compound with a standard
beta-binomial bayesian model.
The observed activity values for a compound tested against N targets
with n actives is assumed to follow a binomial distribution:
\deqn{p(n | theta) = {N \choose n} {theta}^{n} {(1-theta)}^{N-n}}{
    P(n | theta) = choose(N, n) theta^n (1-theta)^(N-n)}
With a beta conjugate prior distribution
where the parameters a and b (alpha and beta) are calculated from the prior
mean and standard deviation of hit ratios for a large number of highly
screened compounds as follows:
\eqn{mean=a/(a+b)} and \eqn{sd^2=ab/((a+b)^2 (a+b+1))}.
This function then computes and returns the posterior probability 
\eqn{P(theta > threshold)} using the beta distribution function \code{pbeta}.
}
\references{
Dancik, V. et al. Connecting Small Molecules with Similar Assay Performance 
Profiles Leads to New Biological Hypotheses. J Biomol Screen 19, 771-781 (2014).
}
\author{
Tyler Backman
}
\seealso{
\code{\link{pbeta}} for the beta distribution function.
\code{\link{perTargetMatrix}}
\code{\link{targetSelectivity}}
}
\examples{
## connect to a test database
extdata_dir <- system.file("extdata", package="bioassayR")
sampleDatabasePath <- file.path(extdata_dir, "sampleDatabase.sqlite")
sampleDB <- connectBioassayDB(sampleDatabasePath)

## retrieve activity data for three compounds
assays <- getBioassaySetByCids(sampleDB, c("2244","3715","133021"))

## collapse assays into perTargetMatrix
targetMatrix <- perTargetMatrix(assays)

## compute P(theta > 0.25)
crossReactivityProbability(targetMatrix)

## disconnect from sample database
disconnectBioassayDB(sampleDB)
}
\keyword{ utilities }
