1 Introduction

Bayesian ANalysis of Differential Localisation Experiments (BANDLE) is an integrative semi-supervised functional mixture model, developed by Crook et al (Crook et al. 2022), to obtain the probability of a protein being differentially localised between two conditions.

In this vignette we walk users through how to install and use the R (R Development Core Team 2011) Bioconductor (Gentleman et al. 2004) bandlepackage by simulating a well-defined differential localisation experiment from spatial proteomics data from the pRolocdata package (Gatto et al. 2014).

The BANDLE method uses posterior Bayesian computations performed using Markov-chain Monte-Carlo (MCMC) and thus uncertainty estimates are available (Gilks, Richardson, and Spiegelhalter 1995). It is inspired by the T-augmented Gaussuan mixture model (TAGM) by Crook et al. 2018 (Crook et al. 2018) which was developed to allow interrogation of multiply localised proteins through uncertainty estimation. Throughout this vignette we use the term differentially localised to pertain to proteins which are assigned to different sub-cellular localisations between two conditions.

The output of BANDLE provides users with:

  1. Protein subcellular localisation predictions. For each protein the full probability distribution over all possible organelles/complexes is computed. Through uncertainty quantification users can gain insight into proteins that may localise to more than one location.

  2. Differential localisation predictions. The full probability of a protein being differentially localised between two conditions is computed. Uncertainty estimates are available for the differential localisation probability to aid filtering of candidate movers for validation.

2 Installation

The package can be installed with the BiocManager package:

if (!requireNamespace("BiocManager", quietly=TRUE))
    install.packages("BiocManager")
BiocManager::install("bandle")

and then loaded,

library("bandle")

For visualisation we also load the packages,

library("pheatmap")
library("viridis")
library("dplyr")
library("ggplot2")
library("gridExtra")

3 The data

In this vignette and Crook et al. (2022), the main data source that we use to study differential protein sub-cellular localisation are data from high-throughput mass spectrometry-based experiments. The data from these types of experiments traditionally yield a matrix of measurements wherein we have, for example, PSMs, peptides or proteins along the rows, and samples/channels/fractions along the columns. The bandle package uses the MSnSet class as implemented in the Bioconductor MSnbase package and thus requires users to import and store their data as a MSnSet instance. For more details on how to create a MSnSet see the relevant vignettes in pRoloc. The pRolocdata experiment data package is a good starting place to look for test data. This data package contains tens of quantitative proteomics experiments, stored as MSnSets.

3.1 A well-defined theoretical example

To get started with the basics of using bandle we begin by generating a simple example dataset which simulates a differential localisation experiment (please see the second vignette in this package for a full real-life biological use case). In this example data, the key elements are replicates, and a perturbation of interest. There is code within the bandle package to simulate an example experiment.

In the code chunk below we begin by loading the pRolocdata package to obtain a spatial proteomics dataset. This will be the basis of our simulation which will use boostrapping to generate new datasets. The dataset we have chosen to load is a dataset from 2009 (tan2009r1). This is data from a early LOPIT experiment performed on Drosophila embryos by Tan et al. (2009). The aim of this experiment was to apply LOPIT to an organism with heterogeneous cell types. This experiment used four isotopes across four distinct fractions and thus yielded four measurements (features) per protein profile. We visualise the data by using principal components analysis.

library("pRolocdata")
data("tan2009r1")

## Let's set the stock colours of the classes to plot to be transparent
setStockcol(NULL)
setStockcol(paste0(getStockcol(), "90")) 

## Plot the data using plot2D from pRoloc
plot2D(tan2009r1,
       main = "An example spatial proteomics datasets", 
       grid = FALSE)
addLegend(tan2009r1, where = "topleft", cex = 0.7, ncol = 2)

The following code chuck simulates a differential localisation experiment. It will generate numRep/2 of each a control and treatment condition. We will also simulate relocalisations for numDyn proteins.

set.seed(1)
tansim <- sim_dynamic(object = tan2009r1,
                      numRep = 6L,
                      numDyn = 100L)
## [1] "markers"

The list of the 6 simulated experiments are found in tansim$lopitrep. Each one is an MSnSet instance (the standard data container for proteomics experimental data). The first 3 are the simulated control experiments (see tansim$lopitrep[1:3]), and the following 3 in the list are the treatment condition simulated experiments (see tansim$lopitrep[4:6]).

# To access the first replicate
tansim$lopitrep[[1]]
## MSnSet (storageMode: lockedEnvironment)
## assayData: 888 features, 4 samples 
##   element names: exprs 
## protocolData: none
## phenoData
##   sampleNames: X114 X115 X116 X117
##   varLabels: Fractions
##   varMetadata: labelDescription
## featureData
##   featureNames: P20353 P53501 ... P07909 (888 total)
##   fvarLabels: FBgn Protein.ID ... knn.scores (18 total)
##   fvarMetadata: labelDescription
## experimentData: use 'experimentData(object)'
##   pubMedIds: 19317464 
## Annotation:  
## - - - Processing information - - -
## Added markers from  'mrk' marker vector. Thu Jul 16 22:53:44 2015 
## Performed knn prediction (k=10) Tue Nov  5 16:11:29 2024 
##  MSnbase version: 1.17.12

We can plot them using the plot2D function from pRoloc.

plot_title <- c(paste0("Replicate ", seq(3), " condition", " A"), 
               paste0("Replicate ", seq(3), " condition", " B"))

par(mfrow = c(2, 3))
out <- lapply(seq(tansim$lopitrep), function(z) 
    plot2D(tansim$lopitrep[[z]], grid = FALSE, main = plot_title[z]))