% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/clone_id.R
\name{Clone ID}
\alias{Clone ID}
\alias{clone_id}
\alias{clone_id_EM}
\alias{clone_id_Gibbs}
\title{Infer clonal identity of single cells}
\usage{
clone_id(
  A,
  D,
  Config = NULL,
  n_clone = NULL,
  Psi = NULL,
  relax_Config = TRUE,
  relax_rate_fixed = NULL,
  inference = "sampling",
  n_chain = 1,
  n_proc = 1,
  verbose = TRUE,
  ...
)

clone_id_EM(
  A,
  D,
  Config,
  Psi = NULL,
  min_iter = 10,
  max_iter = 1000,
  logLik_threshold = 1e-05,
  verbose = TRUE
)

clone_id_Gibbs(
  A,
  D,
  Config,
  Psi = NULL,
  relax_Config = TRUE,
  relax_rate_fixed = NULL,
  relax_rate_prior = c(1, 9),
  keep_base_clone = TRUE,
  prior0 = c(0.2, 99.8),
  prior1 = c(0.45, 0.55),
  min_iter = 5000,
  max_iter = 20000,
  buin_frac = 0.5,
  wise = "variant",
  relabel = FALSE,
  verbose = TRUE
)
}
\arguments{
\item{A}{variant x cell matrix of integers; number of alternative allele
reads in variant i cell j}

\item{D}{variant x cell matrix of integers; number of total reads covering
variant i cell j}

\item{Config}{variant x clone matrix of binary values. The clone-variant
configuration, which encodes the phylogenetic tree structure. This is the
output Z of Canopy}

\item{n_clone}{integer(1), the number of clone to reconstruct. This is in use
only if Config is NULL}

\item{Psi}{A vector of float. The fractions of each clone, output P of Canopy}

\item{relax_Config}{logical(1), If TRUE, relaxing the Clone Configuration by
changing it from fixed value to act as a prior Config with a relax rate.}

\item{relax_rate_fixed}{numeric(1), If the value is between 0 to 1,
the relax rate will be set as a fix value during updating clone Config. If
NULL, the relax rate will be learned automatically with relax_rate_prior.}

\item{inference}{character(1), the method to use for inference, either
"sampling" to use Gibbs sampling (default) or "EM" to use
expectation-maximization (faster)}

\item{n_chain}{integer(1), the number of chains to run, which will be
averaged as an output result}

\item{n_proc}{integer(1), the number of processors to use. This parallel
computing can largely reduce time when using multiple chains}

\item{verbose}{logical(1), should the function output verbose information as
it runs?}

\item{...}{arguments passed to \code{\link{clone_id_Gibbs}} or
\code{\link{clone_id_EM}} (as appropriate)}

\item{min_iter}{A integer. The minimum number of iterations in the Gibbs
sampling. The real iteration may be longer until the convergence.}

\item{max_iter}{A integer. The maximum number of iterations in the Gibbs
sampling, even haven't passed the convergence diagnosis}

\item{logLik_threshold}{A float. The threshold of logLikelihood increase for
detecting convergence.}

\item{relax_rate_prior}{numeric(2), the two parameters of beta prior
distribution of the relax rate for relaxing the clone Configuration. This
mode is used when relax_relax is NULL.}

\item{keep_base_clone}{bool(1), if TRUE, keep the base clone of Config to its
input values when relax mode is used.}

\item{prior0}{numeric(2), alpha and beta parameters for the Beta prior
distribution on the inferred false positive rate.}

\item{prior1}{numeric(2), alpha and beta parameters for the Beta prior
distribution on the inferred (1 - false negative) rate.}

\item{buin_frac}{numeric(1), the fraction of chain as burn-in period}

\item{wise}{A string, the wise of parameters for theta1: global, variant,
element.}

\item{relabel}{bool(1), if TRUE, relabel the samples of both Config and prob
during the Gibbs sampling.}
}
\value{
If inference method is "EM", a list containing \code{theta}, a vector of
two floats denoting the parameters of the two components of the base model,
i.e., mean of Bernoulli or binomial model given variant exists or not,
\code{prob}, the matrix of posterior probabilities of each cell belonging to
each clone with fitted parameters, and \code{logLik}, the log likelihood of
the final parameters.

If inference method is "sampling", a list containing: \code{theta0}, the mean
of sampled false positive parameter values; \code{theta1} the mean of sampled
(1 - false negative rate) parameter values; \code{theta0_all}, all sampled
false positive parameter values; \code{theta1_all}, all sampled (1 - false
negative rate) parameter values; \code{element}; \code{logLik_all},
log-likelihood for model for all sampled parameter sets; \code{prob_all};
\code{prob}, matrix with mean of sampled cell-clone assignment posterior
probabilities (the key output of the model); \code{prob_variant}.

a list containing \code{theta}, a vector of two floats denoting the
binomial rates given variant exists or not, \code{prob}, the matrix of
posterior probabilities of each cell belonging to each clone with fitted
parameters, and \code{logLik}, the log likelihood of the final parameters.
}
\description{
Infer clonal identity of single cells

Assign cells to clones using an EM algorithm

Assign cells to clones using a Gibbs sampling algorithm
}
\details{
The two Bernoulli components correspond to false positive and false negative
rates. The two binomial components correspond to the read distributions
with and without the mutation present.
}
\examples{
data(example_donor)
assignments <- clone_id(A_clone, D_clone,
    Config = tree$Z,
    min_iter = 800, max_iter = 1200
)
prob_heatmap(assignments$prob)

assignments_EM <- clone_id(A_clone, D_clone,
    Config = tree$Z,
    inference = "EM"
)
prob_heatmap(assignments_EM$prob)
}
\author{
Yuanhua Huang and Davis McCarthy

Yuanhua Huang
}
