This vignette provides an introduction to the BANDLE package (Crook et al. 2022) and follows a short theortical example of how to perform differential localisation analysis of quantitative proteomics data using the BANDLE model. Explanation and general recommendations of the input parameters are provided here. For a more comprehensive workflow which follows a real-life use case, please see the second vignette in this package.
bandle 1.11.0
Bayesian ANalysis of Differential Localisation Experiments (BANDLE) is an integrative semi-supervised functional mixture model, developed by Crook et al (Crook et al. 2022), to obtain the probability of a protein being differentially localised between two conditions.
In this vignette we walk users through how to install and use the R (R Development Core Team 2011)
Bioconductor (Gentleman et al. 2004) bandle
package
by simulating a well-defined differential localisation experiment from spatial
proteomics data from the pRolocdata
package (Gatto et al. 2014).
The BANDLE method uses posterior Bayesian computations performed using Markov-chain Monte-Carlo (MCMC) and thus uncertainty estimates are available (Gilks, Richardson, and Spiegelhalter 1995). It is inspired by the T-augmented Gaussuan mixture model (TAGM) by Crook et al. 2018 (Crook et al. 2018) which was developed to allow interrogation of multiply localised proteins through uncertainty estimation. Throughout this vignette we use the term differentially localised to pertain to proteins which are assigned to different sub-cellular localisations between two conditions.
The output of BANDLE provides users with:
Protein subcellular localisation predictions. For each protein the full probability distribution over all possible organelles/complexes is computed. Through uncertainty quantification users can gain insight into proteins that may localise to more than one location.
Differential localisation predictions. The full probability of a protein being differentially localised between two conditions is computed. Uncertainty estimates are available for the differential localisation probability to aid filtering of candidate movers for validation.
The package can be installed with the BiocManager
package:
if (!requireNamespace("BiocManager", quietly=TRUE))
install.packages("BiocManager")
BiocManager::install("bandle")
and then loaded,
library("bandle")
For visualisation we also load the packages,
library("pheatmap")
library("viridis")
library("dplyr")
library("ggplot2")
library("gridExtra")
In this vignette and Crook et al. (2022), the main data source that we use to study
differential protein sub-cellular localisation are data from high-throughput
mass spectrometry-based experiments. The data from these types of experiments
traditionally yield a matrix of measurements wherein we have, for example, PSMs,
peptides or proteins along the rows, and samples/channels/fractions along the
columns. The bandle
package uses the MSnSet
class as implemented in the
Bioconductor MSnbase package and thus requires users to import
and store their data as a MSnSet
instance. For more details on how to create a
MSnSet
see the relevant vignettes in pRoloc. The
pRolocdata experiment data package is a good starting place to
look for test data. This data package contains tens of quantitative proteomics
experiments, stored as MSnSet
s.
To get started with the basics of using bandle
we begin by generating a simple
example dataset which simulates a differential localisation experiment (please
see the second vignette in this package for a full real-life biological use
case). In this example data, the key elements are replicates, and a perturbation
of interest. There is code within the bandle package to simulate
an example experiment.
In the code chunk below we begin by loading the pRolocdata
package to obtain a spatial proteomics dataset. This will be the basis of our
simulation which will use boostrapping to generate new datasets. The dataset we
have chosen to load is a dataset from 2009 (tan2009r1
). This is data from a
early LOPIT experiment performed on Drosophila embryos by Tan et al. (2009). The aim of
this experiment was to apply LOPIT to an organism with heterogeneous cell types.
This experiment used four isotopes across four distinct fractions and thus
yielded four measurements (features) per protein profile. We visualise the
data by using principal components analysis.
library("pRolocdata")
data("tan2009r1")
## Let's set the stock colours of the classes to plot to be transparent
setStockcol(NULL)
setStockcol(paste0(getStockcol(), "90"))
## Plot the data using plot2D from pRoloc
plot2D(tan2009r1,
main = "An example spatial proteomics datasets",
grid = FALSE)
addLegend(tan2009r1, where = "topleft", cex = 0.7, ncol = 2)
The following code chuck simulates a differential localisation experiment. It
will generate numRep/2
of each a control and treatment condition. We will also
simulate relocalisations for numDyn
proteins.
set.seed(1)
tansim <- sim_dynamic(object = tan2009r1,
numRep = 6L,
numDyn = 100L)
## [1] "markers"
The list of the 6 simulated experiments are found in tansim$lopitrep
. Each one
is an MSnSet
instance (the standard data container for proteomics experimental
data). The first 3 are the simulated control experiments (see
tansim$lopitrep[1:3]
), and the following 3 in the list are the treatment
condition simulated experiments (see tansim$lopitrep[4:6]
).
# To access the first replicate
tansim$lopitrep[[1]]
## MSnSet (storageMode: lockedEnvironment)
## assayData: 888 features, 4 samples
## element names: exprs
## protocolData: none
## phenoData
## sampleNames: X114 X115 X116 X117
## varLabels: Fractions
## varMetadata: labelDescription
## featureData
## featureNames: P20353 P53501 ... P07909 (888 total)
## fvarLabels: FBgn Protein.ID ... knn.scores (18 total)
## fvarMetadata: labelDescription
## experimentData: use 'experimentData(object)'
## pubMedIds: 19317464
## Annotation:
## - - - Processing information - - -
## Added markers from 'mrk' marker vector. Thu Jul 16 22:53:44 2015
## Performed knn prediction (k=10) Tue Nov 5 16:11:29 2024
## MSnbase version: 1.17.12
We can plot them using the plot2D
function from pRoloc
.
plot_title <- c(paste0("Replicate ", seq(3), " condition", " A"),
paste0("Replicate ", seq(3), " condition", " B"))
par(mfrow = c(2, 3))
out <- lapply(seq(tansim$lopitrep), function(z)
plot2D(tansim$lopitrep[[z]], grid = FALSE, main = plot_title[z]))