% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/spca.R
\name{spca}
\alias{spca}
\title{Sparse Principal Components Analysis}
\usage{
spca(
  X,
  ncomp = 2,
  center = TRUE,
  scale = TRUE,
  keepX = rep(ncol(X), ncomp),
  max.iter = 500,
  tol = 1e-06,
  logratio = c("none", "CLR"),
  multilevel = NULL,
  verbose.call = FALSE
)
}
\arguments{
\item{X}{a numeric matrix (or data frame) which provides the data for the sparse
principal components analysis. It should not contain missing values.}

\item{ncomp}{Integer, if data is complete \code{ncomp} decides the number of
components and associated eigenvalues to display from the \code{pcasvd}
algorithm and if the data has missing values, \code{ncomp} gives the number
of components to keep to perform the reconstitution of the data using the
NIPALS algorithm. If \code{NULL}, function sets \code{ncomp = min(nrow(X),
ncol(X))}}

\item{center}{(Default=TRUE) Logical, whether the variables should be shifted
to be zero centered. Only set to FALSE if data have already been centered.
Alternatively, a vector of length equal the number of columns of \code{X}
can be supplied. The value is passed to \code{\link{scale}}. If the data
contain missing values, columns should be centered for reliable results.}

\item{scale}{(Default=TRUE) Logical indicating whether the variables should be
scaled to have unit variance before the analysis takes place.}

\item{keepX}{numeric vector of length \code{ncomp}, the number of variables to keep
in loading vectors. By default all variables are kept in the model. See
details.}

\item{max.iter}{Integer, the maximum number of iterations in the NIPALS
algorithm.}

\item{tol}{Positive real, the tolerance used in the NIPALS algorithm.}

\item{logratio}{one of ('none','CLR'). Specifies the log ratio
transformation to deal with compositional values that may arise from
specific normalisation in sequencing data. Default to 'none'}

\item{multilevel}{sample information for multilevel decomposition for
repeated measurements.}

\item{verbose.call}{Logical (Default=FALSE), if set to TRUE then the \code{$call}
component of the returned object will contain the variable values for all 
parameters. Note that this may cause large memory usage.}
}
\value{
\code{spca} returns a list with class \code{"spca"} containing the
following components:
\describe{
\item{call}{if \code{verbose.call = FALSE}, then just the function call is returned.
If \code{verbose.call = TRUE} then all the inputted values are accessable via
this component}
\item{ncomp}{the number of components to keep in the
calculation.} 
\item{prop_expl_var}{the adjusted percentage of variance
explained for each component.} 
\item{cum.var}{the adjusted cumulative percentage of variances
explained.}
\item{keepX}{the number of variables kept in each loading
vector.} 
\item{iter}{the number of iterations needed to reach convergence
for each component.} 
\item{rotation}{the matrix containing the sparse
loading vectors.} 
\item{x}{the matrix containing the principal components.}
}
}
\description{
Performs a sparse principal component analysis for variable selection using
singular value decomposition and lasso penalisation on the loading vectors.
}
\details{
\code{scale= TRUE} is highly recommended as it will help obtaining orthogonal
sparse loading vectors.

\code{keepX} is the number of variables to select in each loading vector,
i.e. the number of variables with non zero coefficient in each loading
vector.

Note that data can contain missing values only when \code{logratio = 'none'}
is used. In this case, \code{center=TRUE} should be used to center the data
in order to effectively ignore the missing values. This is the default
behaviour in \code{spca}.

According to Filzmoser et al., a ILR log ratio transformation is more
appropriate for PCA with compositional data. Both CLR and ILR are valid.

Logratio transform and multilevel analysis are performed sequentially as
internal pre-processing step, through \code{\link{logratio.transfo}} and
\code{\link{withinVariation}} respectively.

Logratio can only be applied if the data do not contain any 0 value (for
count data, we thus advise the normalise raw data with a 1 offset). For ILR
transformation and additional offset might be needed.

The principal components are not guaranteed to be orthogonal in sPCA. 
We adopt the approach of Shen and Huang 2008 (Section 2.3) to estimate 
the explained variance  in the case where the sparse loading vectors 
(and principal components) are not orthogonal. The data are projected 
onto the space spanned by the first loading vectors and the variance 
explained is then adjusted for potential correlation between PCs. 
Note that in practice, the loading vectors tend to be orthogonal if the
data are centered and scaled in sPCA.
}
\examples{
data(liver.toxicity)
spca.rat <- spca(liver.toxicity$gene, ncomp = 3, keepX = rep(50, 3))
spca.rat

## variable representation
plotVar(spca.rat, cex = 1)
\dontrun{
plotVar(spca.rat,style="3d")
}

## samples representation
plotIndiv(spca.rat, ind.names = liver.toxicity$treatment[, 3],
          group = as.numeric(liver.toxicity$treatment[, 3]))

\dontrun{
plotIndiv(spca.rat, cex = 0.01,
col = as.numeric(liver.toxicity$treatment[, 3]),style="3d")
}

## example with multilevel decomposition and CLR log ratio transformation
data("diverse.16S")
spca.res = spca(X = diverse.16S$data.TSS, ncomp = 5,
logratio = 'CLR', multilevel = diverse.16S$sample)
plot(spca.res)
plotIndiv(spca.res, ind.names = FALSE, group = diverse.16S$bodysite, title = '16S diverse data',
legend=TRUE)
}
\references{
Shen, H. and Huang, J. Z. (2008). Sparse principal component
analysis via regularized low rank matrix approximation. \emph{Journal of
Multivariate Analysis} \bold{99}, 1015-1034.
}
\seealso{
\code{\link{pca}} and http://www.mixOmics.org for more details.
}
\author{
Kim-Anh Lê Cao, Fangzhou Yao, Leigh Coonan, Ignacio Gonzalez, Al J Abadi
}
\keyword{algebra}
