% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/Feature_Selection.R
\name{DaMiR.FSelect}
\alias{DaMiR.FSelect}
\title{Feature selection for classification}
\usage{
DaMiR.FSelect(
  data,
  df,
  th.corr = 0.6,
  type = c("spearman", "pearson"),
  th.VIP = 3,
  nPlsIter = 1
)
}
\arguments{
\item{data}{A transposed data frame or a matrix of normalized expression
 data.
Rows and Cols should be,
respectively, observations and features}

\item{df}{A data frame with known variables; at least one column
with
'class' label must be included}

\item{th.corr}{Minimum threshold of correlation between class and
PCs; default is 0.6. Note. If df$class has more than two levels,
 this option is disable and the number of PCs is set to 3.}

\item{type}{Type of correlation metric; default is "spearman"}

\item{th.VIP}{Threshold for \code{bve_pls} function, to remove
non-important variables; default is 3}

\item{nPlsIter}{Number of times that \link{bve_pls} has to run.
Each iteration produces a set of selected features, usually similar
to each other but not exacly the same! When nPlsIter is > 1, the
intersection between each set of selected features is performed;
so that, only the most robust features are selected. Default is 1}
}
\value{
A list containing:
\itemize{
  \item An expression matrix with only informative features.
  \item A data frame with class and optional variables information.
}
}
\description{
This function identifies the class-correlated principal
 components (PCs)
which are then used to implement a backward variable elimination
procedure for the removal of non informative features.
}
\details{
The function aims to reduce the number of features to obtain
the most informative variables for classification purpose. First,
PCs obtained by principal component analysis (PCA) are correlated
with "class". The correlation threshold is defined by the user
in \code{th.corr} argument. The higher is the correlation, the
lower is the number of PCs returned. Importantly, if df$class has
more than two levels, the number of PCs is automatically set to 3.
In a binary experimental setting, users should pay attention to
appropriately set the \code{th.corr} argument because it will also
affect the total number of selected features that ultimately
depend on the number of PCs. The \code{\link{bve_pls}} function
of \code{plsVarSel} package is, then, applied.
This function exploits a backward variable elimination procedure
coupled to a partial least squares approach to remove those variable
which are less informative with respect to class. The returned
vector of variables is further reduced by the following
\code{\link{DaMiR.FReduct}} function in order to obtain a subset of
non correlated putative predictors.
}
\examples{
# use example data:
data(data_norm)
data(df)
# extract expression data from SummarizedExperiment object
# and transpose the matrix:
t_data<-t(assay(data_norm))
t_data <- t_data[,seq_len(100)]
# select class-related features
data_reduced <- DaMiR.FSelect(t_data, df,
th.corr = 0.7, type = "spearman", th.VIP = 1)

}
\references{
Tahir Mehmood, Kristian Hovde Liland, Lars Snipen and
Solve Saebo (2011).
A review of variable selection methods in Partial Least Squares
Regression. Chemometrics and Intelligent Laboratory Systems
118, pp. 62-69.
}
\seealso{
\itemize{
  \item \code{\link{bve_pls}}
  \item \code{\link{DaMiR.FReduct}}
}
}
\author{
Mattia Chiesa, Luca Piacentini
}
