Here, we describe the workflow to run variance-sensitive clustering on data stored in a SummarizedExperiment, QFeatures or MultiAssayExperiment object. This vignette is distributed under a CC BY-SA license.
vsclust 1.9.3
For a more detailed explanation of the VSClust function and the workflow, please take a look on the vignette for running the VSClust workflow.
Here, we present an example script to integrate the clustering with data object
from Bioconductor, such as QFeatures
, SummarizedExperiment
and
MultiAssayExperiment
.
Use the common Bioconductor commands for installation:
# uncomment in case you have not installed vsclust yet
#if (!require("BiocManager", quietly = TRUE))
# install.packages("BiocManager")
#BiocManager::install("vsclust")
The full functionality can be obtained by additionally installing and loading the
packages yaml
, shiny
, clusterProfiler
, and matrixStats
.
Here, we define the different parameters for the data set RNASeq2GeneNorm
from
the miniACC
object.
The number of replicates and experimental conditions will be retrieved automatically by specifying the metadata for the grouping.
#### Input parameters, only read when now parameter file was provided #####
## All principal parameters for running VSClust can be defined as in the
## shiny app at computproteomics.bmb.sdu.dk/Apps/VSClust
# name of study
Experiment <- "miniACC"
# Paired or unpaired statistical tests when carrying out LIMMA for
# statistical testing
isPaired <- FALSE
# Number of threads to accelerate the calculation (use 1 in doubt)
cores <- 1
# If 0 (default), then automatically estimate the cluster number for the
# vsclust run from the Minimum Centroid Distance
PreSetNumClustVSClust <- 0
# If 0 (default), then automatically estimate the cluster number for the
# original fuzzy c-means from the Minimum Centroid Distance
PreSetNumClustStand <- 0
# max. number of clusters when estimating the number of clusters.
# Higher numbers can drastically extend the computation time.
maxClust <- 10
At first, we load will log-transform the original data and normalize it to the median. Statistical testing will be applied on the resulting object. After estimating the standard deviations, the matrix consists of the averaged quantitative feature values and a last column for the standard deviations of the features.
We will separate the samples according to their OncoSign
.
data(miniACC, package="MultiAssayExperiment")
# log-transformation and remove of -Inf values
logminiACC <- log2(assays(miniACC)$RNASeq2GeneNorm)
logminiACC[!is.finite(logminiACC)] <- NA
# normalize to median
logminiACC <- t(t(logminiACC) - apply(logminiACC, 2, median, na.rm=TRUE))
miniACC2 <- c(miniACC, log2rnaseq = logminiACC, mapFrom=1L)
## Warning: Assuming column order in the data provided
## matches the order in 'mapFrom' experiment(s) colnames
boxplot(logminiACC)