% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/AllGenerics.R
\docType{methods}
\name{embedSamples}
\alias{embedSamples}
\alias{embedSamples,SingleCellExperiment-method}
\alias{embedSamples,matrix-method}
\title{Spectral embedding of biological samples}
\usage{
embedSamples(x, design = NULL)

\S4method{embedSamples}{matrix}(x, design = NULL)
}
\arguments{
\item{x}{A \code{SingleCellExperiment} object or a numeric matrix with
samples in columns and features in rows}

\item{design}{A numeric matrix describing the factors that should be blocked}
}
\value{
A list containing the following components:
  \item{\code{eigenvectors}}{Ordered components of latent space}
  \item{\code{eigenvalues}}{Information content of latent components}
}
\description{
Non-linear learning of a data representation that captures the
intrinsic geometry of the trajectory. This function performs spectral
decomposition of a graph encoding conditional entropy-based
sample-to-sample similarities.
}
\details{
Single-cell gene expression measurements comprise high-dimensional
data of large volume, i.e. many features (e.g., genes) are measured in many
samples (e.g., cells); or more formally, \emph{m} samples can be described
by the expression of \emph{n} features (i.e., \emph{n} dimensions). The
cells’ expression profiles are shaped by many distinct unobserved biological
causes related to each cell's geno- and phenotype, such as developmental
age, tissue region of origin, cell cycle stage, as well as extrinsic sources
such as status of signaling receptors, and environmental stressors, but also
technical noise. In other words, a single dimension, despite just containing
gene expression information, represents an underlying combination of multiple
dependent and independent, relevant and non-relevant factors, whereat each
factors’ individual contribution is non-uniform. To obtain a better
resolution and to extract underlying information, CellTrails aims to find a
meaningful low-dimensional structure - a manifold - that represents cells
mainly by their temporal relation along a biological process.
\cr \cr
This method assumes that the expression vectors are lying on or near a
manifold with dimensionality \emph{d} that is embedded in the
\emph{n}-dimensional space. By using spectral embedding CellTrails aims to
amplify latent temporal information; it reduces noise (ie. truncates
non-relevant dimensions) by transforming the expression matrix into a new
dataset while retaining the geometry of the original dataset as much as
possible.CellTrails captures overall cell-to-cell relations based on the
statistical mutual dependency between any two data vectors. A high
dependency between two samples should be represented by their close
proximity in the lower-dimensional space.
\cr \cr
First, the mutual depencency between samples is scored using mutual
information. This entropy framework naturally requires discretization
of data vectors by an indicator function, which assigns each continuous
data point (expression value) to exactly one discrete interval (e.g. low,
mid or high). However, measurement points located close to the interval
borders may get wrongly assigned due to noise-induced fluctuations.
Therefore, CellTrails fuzzifies the indicator function by using a piecewise
polynomial function, i.e. the domain of each sample expression vector is
divided into contiguous intervals (based on Daub \emph{et al.}, 2004).
Second, the computed mutual information matrix, which is left-bounded and
composed of bits, is scaled to a generalized correlation coefficient. Third,
CellTrails constructs a simple complete graph with \emph{m} nodes, one for
each data vector (ie. sample), and weights each edge between two nodes by a
heat kernel function applied on the generalzied correlation coefficient.
Finally, nonlinear spectral embedding (ie. spectral decomposition of the
graph's adjacency matrix) is performed
(Belkin & Niyogi, 2003; Sussman \emph{et al.}, 2012) unfolding the manifold.
Please note that this methods only uses the set of defined trajectory
features in a \code{SingleCellExperiment} object; spike-in controls are
ignored and are not listed as trajectory features.
\cr \cr
To account for systematic bias in the expression data
(e.g., cell cycle effects), a design matrix can be
provided for the learning process. It should list the factors that should be
blocked and their values per sample. It is suggested to construct a
design matrix with \code{model.matrix}.
\cr \cr
\emph{Diagnostic messages}
\cr \cr
The method throws an error if expression matrix contains samples
with zero entropy (e.g., the samples exclusively contain non-detects, that
is all expression values are zero).
}
\examples{
# Example data
data(exSCE)

# Embed samples
res <- embedSamples(exSCE)
}
\references{
Daub, C.O., Steuer, R., Selbig, J., and Kloska, S. (2004).
Estimating mutual information using B-spline functions -- an improved
similarity measure for analysing gene expression data.
BMC Bioinformatics 5, 118.

Belkin, M., and Niyogi, P. (2003). Laplacian eigenmaps for
dimensionality reduction and data representation. Neural computation 15,
1373-1396.

Sussman, D.L., Tang, M., Fishkind, D.E., and Priebe, C.E.
(2012). A Consistent Adjacency Spectral Embedding for Stochastic Blockmodel
Graphs. J Am Stat Assoc 107, 1119-1128.
}
\seealso{
\code{SingleCellExperiment} \code{trajectoryFeatureNames}
\code{model.matrix}
}
\author{
Daniel C. Ellwanger
}
