MetaboDynamics 0.99.18
This package was developed to facilitate the analysis of longitudinal metabolomics data. Most tools only allow the comparison between two time points or experimental conditions and are using frequentist statistical methods.
Here we want to show a complete workflow to analyze concentration tables with probabilistic models.
MetaboDynamics can be installed from the devel branch of Bioconductor.
if (!require("BiocManager", quietly = TRUE)) {
install.packages("BiocManager")
}
# The following initializes usage of Bioc devel
BiocManager::install(version = "devel")
BiocManager::install("MetaboDynamics")
library(MetaboDynamics)
library(SummarizedExperiment)
## Loading required package: MatrixGenerics
## Loading required package: matrixStats
##
## Attaching package: 'MatrixGenerics'
## The following objects are masked from 'package:matrixStats':
##
## colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,
## colCounts, colCummaxs, colCummins, colCumprods, colCumsums,
## colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,
## colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,
## colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,
## colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
## colWeightedMeans, colWeightedMedians, colWeightedSds,
## colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
## rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
## rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,
## rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
## rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
## rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,
## rowWeightedMads, rowWeightedMeans, rowWeightedMedians,
## rowWeightedSds, rowWeightedVars
## Loading required package: GenomicRanges
## Loading required package: stats4
## Loading required package: BiocGenerics
## Loading required package: generics
##
## Attaching package: 'generics'
## The following objects are masked from 'package:base':
##
## as.difftime, as.factor, as.ordered, intersect, is.element, setdiff,
## setequal, union
##
## Attaching package: 'BiocGenerics'
## The following objects are masked from 'package:stats':
##
## IQR, mad, sd, var, xtabs
## The following objects are masked from 'package:base':
##
## Filter, Find, Map, Position, Reduce, anyDuplicated, aperm, append,
## as.data.frame, basename, cbind, colnames, dirname, do.call,
## duplicated, eval, evalq, get, grep, grepl, is.unsorted, lapply,
## mapply, match, mget, order, paste, pmax, pmax.int, pmin, pmin.int,
## rank, rbind, rownames, sapply, saveRDS, table, tapply, unique,
## unsplit, which.max, which.min
## Loading required package: S4Vectors
##
## Attaching package: 'S4Vectors'
## The following object is masked from 'package:utils':
##
## findMatches
## The following objects are masked from 'package:base':
##
## I, expand.grid, unname
## Loading required package: IRanges
## Loading required package: GenomeInfoDb
## Loading required package: Biobase
## Welcome to Bioconductor
##
## Vignettes contain introductory material; view with
## 'browseVignettes()'. To cite Bioconductor, see
## 'citation("Biobase")', and for packages 'citation("pkgname")'.
##
## Attaching package: 'Biobase'
## The following object is masked from 'package:MatrixGenerics':
##
## rowMedians
## The following objects are masked from 'package:matrixStats':
##
## anyMissing, rowMedians
library(ggplot2)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following object is masked from 'package:Biobase':
##
## combine
## The following objects are masked from 'package:GenomicRanges':
##
## intersect, setdiff, union
## The following object is masked from 'package:GenomeInfoDb':
##
## intersect
## The following objects are masked from 'package:IRanges':
##
## collapse, desc, intersect, setdiff, slice, union
## The following objects are masked from 'package:S4Vectors':
##
## first, intersect, rename, setdiff, setequal, union
## The following objects are masked from 'package:BiocGenerics':
##
## combine, intersect, setdiff, setequal, union
## The following object is masked from 'package:generics':
##
## explain
## The following object is masked from 'package:matrixStats':
##
## count
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(tidyr)
##
## Attaching package: 'tidyr'
## The following object is masked from 'package:S4Vectors':
##
## expand
We have a simulated data (") set of 98 metabolites with three measurement replicates at four time points (1-4) across 3 experimental conditions (A-B). In the first step in this workflow we estimate the dynamics of every single metabolite at every experimental condition (here: radiation dose).
The simulated data is represented as SummarizedExperiment object where concentration tables of each experimental condition are stored in assays (raw concentrations in “concentrations”, log-transformed transformations in “log_con” and scaled log-transformed concentrations" in “scaled_log”) and metabolite names, KEGG IDs, experimental conditions and clustering solutions per experimental condition are stored in colData and timepoint specifications in rowData.
As metabolomics data is often noisy and we generally have few replicates due to high costs, a robust method is needed for the estimation of mean concentrations at every time point. For this we employ a Bayesian hierarchical model that assumes normal distributions of log-transformed metabolite concentrations. The next plot shows the raw data.
data("longitudinalMetabolomics")
# convert to dataframe
concentrations <- as.data.frame(cbind(
as.data.frame(t(assays(longitudinalMetabolomics)$concentrations)),
as.data.frame(colData(longitudinalMetabolomics))
))
concentrations <- concentrations %>% pivot_longer(
cols = seq_len(4), names_to = "time",
values_to = "concentration"
)
ggplot(concentrations, aes(x = concentration)) +
geom_density() +
theme_bw() +
facet_grid(cols = vars(time), rows = vars(condition)) +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1)) +
ggtitle("raw data", "raw measurements")
The raw data is not distributed normally. So let’s log-transform the values. In the integrated simulated data set this is already done in the column “log_m”.
ggplot(concentrations, aes(x = log(concentration))) +
geom_density() +
theme_bw() +
facet_grid(cols = vars(time), rows = vars(condition)) +
ggtitle("data", "log-transformed values")
The next plot shows the raw dynamics of single metabolites.
ggplot(concentrations) +
geom_line(aes(
x = time, y = log(concentration), col = metabolite,
group = interaction(metabolite, replicate)
)) +
theme_bw() +
xlab("timepoint") +
theme(legend.position = "none") +
facet_grid(rows = vars(condition)) +
ggtitle("raw metabolite dynamics", "color=metabolite")
We define dynamics as deviations at the observed time points from the metabolite’s mean concentration. As the raw concentrations of metabolites can differ by orders of magnitude from each other, and we want to be able to compare dynamics of metabolites with each other, we standardize each metabolite at each radiation dose to a mean of zero and a standard deviation of one. In the simulated data set the scaled measurements are in column “m_scaled”.
data("longitudinalMetabolomics")
# convert to dataframe
concentrations <- as.data.frame(cbind(
as.data.frame(t(assays(longitudinalMetabolomics)$scaled_log)),
as.data.frame(colData(longitudinalMetabolomics))
))
concentrations <- concentrations %>% pivot_longer(
cols = seq_len(4), names_to = "time",
values_to = "concentration"
)
ggplot(concentrations) +
geom_line(aes(
x = time,
y = concentration, col = metabolite,
group = interaction(metabolite, replicate)
)) +
theme_bw() +
xlab("timepoint") +
theme(legend.position = "none") +
facet_grid(rows = vars(condition)) +
ggtitle("standardized dynamics", "color=metabolite")
Now we can finally model the dynamics. This might take of the order of 10 minutes per experimental condition for 98 metabolites.
We employ a Bayesian hierarchical model with con = metabolite concentrations, m = metabolite, c = experimental condition and t = time point ID:
\[\begin{align*} \log(con_{m,c,t})&\sim {\sf normal}(\mu_{m,c,t},\sigma_{m,c,t}) \\ \mu_{m,c,t}&\sim {\sf normal}(0,2) \\ \sigma_{m,c,t}&\sim {\sf exponential}(\lambda_{m,c}) \\ \lambda_{m,c}&\sim {\sf exponential}(2) \end{align*}\]
The code below shows how to fit the model and how to extract the diagnostic criteria from the model fits.
# we can hand a SummarizedExperiment object to the function
data("longitudinalMetabolomics")
# we only use a subsection of the simulated data (1 condition and subsample of
# the whole dataset) for demonstration purposes
samples <- c(
"UMP", "Taurine", "Succinate", "Phosphocreatine", "PEP",
"Malic acid", "L-Cystine", "CMP", "Citramalic acid", "2-Aminomuconate"
)
# only one condition
data <- longitudinalMetabolomics[, longitudinalMetabolomics$condition == "A" &
longitudinalMetabolomics$metabolite %in% samples]
# fit model
data <- fit_dynamics_model(
data = data, scaled_measurement = "m_scaled",
assay = "scaled_log", time = "time",
condition = "condition", max_treedepth = 10,
adapt_delta = 0.95, # default 0.95
iter = 5000,
cores = 1,
chains = 2 # only set to 2 for vignette, default = 4
)
## Is your data normalized and standardized?
## We recommend normalization by log-transformation.
## Scaling and centering (mean=0, sd=1) should be metabolite and condition specific.
##
## SAMPLING FOR MODEL 'm_ANOVA_partial_pooling' NOW (CHAIN 1).
## Chain 1:
## Chain 1: Gradient evaluation took 5.5e-05 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.55 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1:
## Chain 1:
## Chain 1: Iteration: 1 / 5000 [ 0%] (Warmup)
## Chain 1: Iteration: 500 / 5000 [ 10%] (Warmup)
## Chain 1: Iteration: 1000 / 5000 [ 20%] (Warmup)
## Chain 1: Iteration: 1251 / 5000 [ 25%] (Sampling)
## Chain 1: Iteration: 1750 / 5000 [ 35%] (Sampling)
## Chain 1: Iteration: 2250 / 5000 [ 45%] (Sampling)
## Chain 1: Iteration: 2750 / 5000 [ 55%] (Sampling)
## Chain 1: Iteration: 3250 / 5000 [ 65%] (Sampling)
## Chain 1: Iteration: 3750 / 5000 [ 75%] (Sampling)
## Chain 1: Iteration: 4250 / 5000 [ 85%] (Sampling)
## Chain 1: Iteration: 4750 / 5000 [ 95%] (Sampling)
## Chain 1: Iteration: 5000 / 5000 [100%] (Sampling)
## Chain 1:
## Chain 1: Elapsed Time: 1.397 seconds (Warm-up)
## Chain 1: 3.216 seconds (Sampling)
## Chain 1: 4.613 seconds (Total)
## Chain 1:
##
## SAMPLING FOR MODEL 'm_ANOVA_partial_pooling' NOW (CHAIN 2).
## Chain 2:
## Chain 2: Gradient evaluation took 3.1e-05 seconds
## Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.31 seconds.
## Chain 2: Adjust your expectations accordingly!
## Chain 2:
## Chain 2:
## Chain 2: Iteration: 1 / 5000 [ 0%] (Warmup)
## Chain 2: Iteration: 500 / 5000 [ 10%] (Warmup)
## Chain 2: Iteration: 1000 / 5000 [ 20%] (Warmup)
## Chain 2: Iteration: 1251 / 5000 [ 25%] (Sampling)
## Chain 2: Iteration: 1750 / 5000 [ 35%] (Sampling)
## Chain 2: Iteration: 2250 / 5000 [ 45%] (Sampling)
## Chain 2: Iteration: 2750 / 5000 [ 55%] (Sampling)
## Chain 2: Iteration: 3250 / 5000 [ 65%] (Sampling)
## Chain 2: Iteration: 3750 / 5000 [ 75%] (Sampling)
## Chain 2: Iteration: 4250 / 5000 [ 85%] (Sampling)
## Chain 2: Iteration: 4750 / 5000 [ 95%] (Sampling)
## Chain 2: Iteration: 5000 / 5000 [100%] (Sampling)
## Chain 2:
## Chain 2: Elapsed Time: 1.263 seconds (Warm-up)
## Chain 2: 3.219 seconds (Sampling)
## Chain 2: 4.482 seconds (Total)
## Chain 2:
This returns a list of model fits that are named by the experimental condition (“A”,“B”,“C”). If data is a summarizedExperiment the fits are stored in metadata(data)[[“dynamic_fits”]]. With diagnostics_dynamics() we can extract all the diagnostic criteria of MCMC runs to fit a Bayesian model (rhat, neff, divergences, max_treedepth) and visualize them. If data is a SummarizedExperiment the diagnostics are stored in metadata(data)[[“diagnostics_dynamics”]]. Additionally data frames for visual Posterior predictive checks (PPC) are prepared and plots for the PPCs and diagnostic criteria can be generated with plot_PPC() and plot_diagnostics().
# extract diagnostics
data <- diagnostics_dynamics(
data = data,
iter = 5000, # number of iterations used for model fitting
# the dynamic model
chains = 2 # number of chains used for model fitting
)
plot_diagnostics(
data = data
)
## $divergences
##
## $max_treedepth
##
## $Rhat
##
## $n_eff
# PPCs can be accessed with
plot_PPC(
data = data
)
## $posterior_A
## Warning: Removed 18627 rows containing non-finite outside the scale range
## (`stat_ydensity()`).
After checking the diagnostic criteria and the PPC we can extract the estimates: If data is a SummarizedExperiment the estimates are stored in metadata(data)[[“estimates_dynamics”]]
# #extract estimates
data <- estimates_dynamics(
data = data, iter = 5000,
chains = 2, condition = "condition"
)
We get two major outputs: 1) the estimation of concentration differences between two subsequent time points of each metabolite at each experimental condition 2) the dynamic profiles of each metabolites at each experimental condition
# 1) the differences between two timepoints
plot_estimates(
data = data,
dynamics = FALSE
)
## $plot_timepoint_differences
If the 95% highest density interval (Credible Interval(CrI)) of the posterior does not include zero we can rather credibly state that there is a difference in mean concentrations between two time points. If the 95% HDI lies below zero we likely have a decrease in concentrations between the two time points, if it is above zero we likely have an increase in concentrations between time points.
# 2) dynamic profiles
plot_estimates(
data = data,
delta_t = FALSE
)
## $plot_dynamics
So we now have dynamic profiles of many metabolites at each radiation dose.
We could now cluster these metabolite specific dynamics vectors
(estimates[,c(“mu1.mean”:"mut.mean)]) to see if groups of metabolites
have similar dynamics.
For the sake of demonstration we use from here on a clustering result (data(“cluster”)) on the full simulated data set (data(“longitudinalMetabolomics”)). In a real life example the optimal number of clusters (“k”) should be determined by optimal clustering criteria such as Gap statistics and average silhouette. The code below shows an example how the estimated dynamics profiles can be used for clustering.
# get distances between vectors
clust_A <- metadata(data)[["estimates_dynamics"]][["A"]]
clust_A <- clust_A %>%
select(metabolite.ID, condition, time.ID, mu_mean) %>%
pivot_wider(id_cols = c(metabolite.ID, condition), names_from = time.ID, values_from = mu_mean)
dist <- clust_A %>% select(-c(metabolite.ID, condition))
dd_A <- dist(dist,
method = "euclidean"
)
# hierarchical clustering
clust <- hclust(dd_A, method = "ward.D2")
clust_cut <- cutree(clust, k = 8)
# adding cluster ID to estimates
clust_A$cluster <- clust_cut
clust_A
## # A tibble: 10 × 7
## metabolite.ID condition `1` `2` `3` `4` cluster
## <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <int>
## 1 1 A 0.310 0.609 -0.255 -0.889 1
## 2 2 A -0.130 0.897 -0.190 -0.423 2
## 3 3 A -0.831 0.0665 -0.814 0.0168 3
## 4 4 A 1.02 0.413 0.437 -0.703 4
## 5 5 A -0.0838 0.597 0.0662 -0.642 2
## 6 6 A 0.209 -0.764 -0.640 0.170 5
## 7 7 A -0.0668 -0.108 -0.574 -0.0348 6
## 8 8 A 0.0620 0.497 0.178 0.0672 7
## 9 9 A -0.285 0.274 -0.273 -0.0579 6
## 10 10 A 0.833 -0.220 0.774 0.217 8
The metadata of the SummarizedExperiment object “longitudinalMetabolomics” holds the clustering results of the whole simulated dataset data (“longitudinalMetabolomics”).
data <- metadata(longitudinalMetabolomics)[["cluster"]]
temp <- data
temp <- temp %>% pivot_longer(
cols = 4:7,
names_to = "timepoint", values_to = "mu_mean"
)
ggplot(temp, aes(
x = as.factor(as.numeric(as.factor(timepoint))),
y = mu_mean, group = metabolite
)) +
geom_line() +
xlab("timepoint") +
ylab("estimated mean concentration") +
theme_bw() +
theme(legend.position = "none") +
facet_grid(rows = vars(condition), cols = vars(cluster)) +
ggtitle("clustered dynamics", "panels=cluster ID")
As we can see metabolites show different dynamics in different experimental conditions. Can we quantify the biological function of these dynamics clusters?
To quantify the possible biological function of these dynamics clusters we retrieved from the KEGG-database the following information with package KEGGREST: 1) to which functional modules our experimental metabolites are annotated and 2) which metabolites are annotated to functional modules in general.
The functional modules of the KEGG-database are organised in three hierarchies: upper, middle and lower. Here we will do functional analysis on the middle hierarchy. To facilitate analysis the data frames “metabolite_modules”, which holds the information about experimental metabolites, and “modules_compounds”, which holds the information about which metabolites are in general annotated to functional modules, were prepared. We load both data sets and can inspect the documentation.
data("metabolite_modules")
help("metabolite_modules")
head(metabolite_modules)
## # A tibble: 6 × 8
## ...1 metabolite KEGG module_id module_name upper_hierarchy middle_hierarchy
## <dbl> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 1 1-Aminocyc… C012… M00368 Ethylene b… Pathway modules Amino acid meta…
## 2 2 2-Aminomuc… C022… M00038 Tryptophan… Pathway modules Amino acid meta…
## 3 3 2-Phosphog… C006… M00001 Glycolysis… Pathway modules Carbohydrate me…
## 4 4 2-Phosphog… C006… M00002 Glycolysis… Pathway modules Carbohydrate me…
## 5 5 2-Phosphog… C006… M00003 Gluconeoge… Pathway modules Carbohydrate me…
## 6 6 2-Phosphog… C006… M00346 Formaldehy… Pathway modules Energy metaboli…
## # ℹ 1 more variable: lower_hierarchy <chr>
data("modules_compounds")
help("modules_compounds")
head(modules_compounds)
## module_id module_name KEGG
## 2 M00001 Glycolysis (Embden-Meyerhof pathway), glucose => pyruvate C00267
## 3 M00001 Glycolysis (Embden-Meyerhof pathway), glucose => pyruvate C00668
## 4 M00001 Glycolysis (Embden-Meyerhof pathway), glucose => pyruvate C00085
## 5 M00001 Glycolysis (Embden-Meyerhof pathway), glucose => pyruvate C00354
## 6 M00001 Glycolysis (Embden-Meyerhof pathway), glucose => pyruvate C00111
## 7 M00001 Glycolysis (Embden-Meyerhof pathway), glucose => pyruvate C00118
## upper_hierarchy middle_hierarchy lower_hierarchy
## 2 Pathway modules Carbohydrate metabolism Central carbohydrate metabolism
## 3 Pathway modules Carbohydrate metabolism Central carbohydrate metabolism
## 4 Pathway modules Carbohydrate metabolism Central carbohydrate metabolism
## 5 Pathway modules Carbohydrate metabolism Central carbohydrate metabolism
## 6 Pathway modules Carbohydrate metabolism Central carbohydrate metabolism
## 7 Pathway modules Carbohydrate metabolism Central carbohydrate metabolism
Here we have to keep in mind that not all KEGG modules are suitable for testing on every observed organism and experimental condition. For example the modules “Xenobiotics biodegradation”,“Biosynthesis of other secondary metabolites” and “Biosynthesis of terpenoids and polyketides” should not be analyzed in a human lung cancer cell line.
For the functional analysis we employ a hypergeometric model. We consider a functional module as over-represented in a cluster if the 95% inter-quantile range (ICR) of the log-transformed probabilities of OvEs (observed vs expected) lies above zero. OvE refers to the ratio of observed metabolites in a cluster being mapped to a functional module over the number of expected metabolites in a cluster being mapped to a module under the assumption of a hypergeometric distribution (=drawing without replacement). If data is a SummarizedExperiment object where the clustering solution is stored in metadata(data)[[“cluster”]] the ORA results are stored in metadata(data)[[“ORA_tested_column”]] We apply the functional analysis to the middle and lower hierarchy of functional modules.
data <- ORA_hypergeometric(
background = modules_compounds,
annotations = metabolite_modules,
data = longitudinalMetabolomics, tested_column = "middle_hierarchy"
)
plot_ORA(data, tested_column = "middle_hierarchy")
data <- ORA_hypergeometric(
background = modules_compounds,
annotations = metabolite_modules,
data = longitudinalMetabolomics,
tested_column = "lower_hierarchy"
)
plot_ORA(data, tested_column = "lower_hierarchy")
Great, we can now see which functional module is over- (green points and error-bars) or under-represented (none in this example) in which dynamics cluster! For instance in cluster 3 at condition A and C the modules “Energy metabolism” and “Carbohydrate metabolism” are over-represented.
We can not only do over-representation analysis of KEGG-functional modules but also compare dynamics clusters across different experimental conditions. For this we employ a Bayesian model that estimates the mean difference as well as the standard deviation of differences between dynamics clusters. If data is a SummarizedExperiment object the results of compare_metabolites are stored in metadata(data)[[“comparison_dynamics”]]
dist = vector of pairwise euclidean distances between each dynamic vector of cluster a and every dynamic vector of cluster b, ID = cluster pair ID \[\begin{align*} dist_{ID}&\sim {\sf normal}(\mu_{ID},\sigma_{ID}) \\ \mu_{ID}&\sim {\sf normal^+}(0,2) \\ \sigma_{ID}&\sim {\sf exponential}(1) \end{align*}\]
data("longitudinalMetabolomics")
longitudinalMetabolomics <- compare_dynamics(
data = longitudinalMetabolomics,
dynamics = c("mu1_mean", "mu2_mean", "mu3_mean", "mu4_mean"),
cores = 1
)
##
## SAMPLING FOR MODEL 'm_cluster_distances_padded' NOW (CHAIN 1).
## Chain 1:
## Chain 1: Gradient evaluation took 0.000512 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 5.12 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1:
## Chain 1:
## Chain 1: Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 1: Iteration: 200 / 2000 [ 10%] (Warmup)
## Chain 1: Iteration: 400 / 2000 [ 20%] (Warmup)
## Chain 1: Iteration: 501 / 2000 [ 25%] (Sampling)
## Chain 1: Iteration: 700 / 2000 [ 35%] (Sampling)
## Chain 1: Iteration: 900 / 2000 [ 45%] (Sampling)
## Chain 1: Iteration: 1100 / 2000 [ 55%] (Sampling)
## Chain 1: Iteration: 1300 / 2000 [ 65%] (Sampling)
## Chain 1: Iteration: 1500 / 2000 [ 75%] (Sampling)
## Chain 1: Iteration: 1700 / 2000 [ 85%] (Sampling)
## Chain 1: Iteration: 1900 / 2000 [ 95%] (Sampling)
## Chain 1: Iteration: 2000 / 2000 [100%] (Sampling)
## Chain 1:
## Chain 1: Elapsed Time: 3.996 seconds (Warm-up)
## Chain 1: 8.49 seconds (Sampling)
## Chain 1: 12.486 seconds (Total)
## Chain 1:
##
## SAMPLING FOR MODEL 'm_cluster_distances_padded' NOW (CHAIN 2).
## Chain 2:
## Chain 2: Gradient evaluation took 0.000405 seconds
## Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 4.05 seconds.
## Chain 2: Adjust your expectations accordingly!
## Chain 2:
## Chain 2:
## Chain 2: Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 2: Iteration: 200 / 2000 [ 10%] (Warmup)
## Chain 2: Iteration: 400 / 2000 [ 20%] (Warmup)
## Chain 2: Iteration: 501 / 2000 [ 25%] (Sampling)
## Chain 2: Iteration: 700 / 2000 [ 35%] (Sampling)
## Chain 2: Iteration: 900 / 2000 [ 45%] (Sampling)
## Chain 2: Iteration: 1100 / 2000 [ 55%] (Sampling)
## Chain 2: Iteration: 1300 / 2000 [ 65%] (Sampling)
## Chain 2: Iteration: 1500 / 2000 [ 75%] (Sampling)
## Chain 2: Iteration: 1700 / 2000 [ 85%] (Sampling)
## Chain 2: Iteration: 1900 / 2000 [ 95%] (Sampling)
## Chain 2: Iteration: 2000 / 2000 [100%] (Sampling)
## Chain 2:
## Chain 2: Elapsed Time: 4.234 seconds (Warm-up)
## Chain 2: 8.363 seconds (Sampling)
## Chain 2: 12.597 seconds (Total)
## Chain 2:
##
## SAMPLING FOR MODEL 'm_cluster_distances_padded' NOW (CHAIN 3).
## Chain 3:
## Chain 3: Gradient evaluation took 0.000408 seconds
## Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 4.08 seconds.
## Chain 3: Adjust your expectations accordingly!
## Chain 3:
## Chain 3:
## Chain 3: Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 3: Iteration: 200 / 2000 [ 10%] (Warmup)
## Chain 3: Iteration: 400 / 2000 [ 20%] (Warmup)
## Chain 3: Iteration: 501 / 2000 [ 25%] (Sampling)
## Chain 3: Iteration: 700 / 2000 [ 35%] (Sampling)
## Chain 3: Iteration: 900 / 2000 [ 45%] (Sampling)
## Chain 3: Iteration: 1100 / 2000 [ 55%] (Sampling)
## Chain 3: Iteration: 1300 / 2000 [ 65%] (Sampling)
## Chain 3: Iteration: 1500 / 2000 [ 75%] (Sampling)
## Chain 3: Iteration: 1700 / 2000 [ 85%] (Sampling)
## Chain 3: Iteration: 1900 / 2000 [ 95%] (Sampling)
## Chain 3: Iteration: 2000 / 2000 [100%] (Sampling)
## Chain 3:
## Chain 3: Elapsed Time: 3.978 seconds (Warm-up)
## Chain 3: 8.383 seconds (Sampling)
## Chain 3: 12.361 seconds (Total)
## Chain 3:
##
## SAMPLING FOR MODEL 'm_cluster_distances_padded' NOW (CHAIN 4).
## Chain 4:
## Chain 4: Gradient evaluation took 0.000373 seconds
## Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 3.73 seconds.
## Chain 4: Adjust your expectations accordingly!
## Chain 4:
## Chain 4:
## Chain 4: Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 4: Iteration: 200 / 2000 [ 10%] (Warmup)
## Chain 4: Iteration: 400 / 2000 [ 20%] (Warmup)
## Chain 4: Iteration: 501 / 2000 [ 25%] (Sampling)
## Chain 4: Iteration: 700 / 2000 [ 35%] (Sampling)
## Chain 4: Iteration: 900 / 2000 [ 45%] (Sampling)
## Chain 4: Iteration: 1100 / 2000 [ 55%] (Sampling)
## Chain 4: Iteration: 1300 / 2000 [ 65%] (Sampling)
## Chain 4: Iteration: 1500 / 2000 [ 75%] (Sampling)
## Chain 4: Iteration: 1700 / 2000 [ 85%] (Sampling)
## Chain 4: Iteration: 1900 / 2000 [ 95%] (Sampling)
## Chain 4: Iteration: 2000 / 2000 [100%] (Sampling)
## Chain 4:
## Chain 4: Elapsed Time: 5.578 seconds (Warm-up)
## Chain 4: 8.372 seconds (Sampling)
## Chain 4: 13.95 seconds (Total)
## Chain 4:
heatmap_dynamics(data = longitudinalMetabolomics)
The bigger and brighter a point, the smaller is the mean distance between dynamics clusters and the smaller is the standard deviation. That means big bright points indicate high dynamic similarity which small spread. Here B_8 and A_4 have high similarity in dynamics.
We can also compare metabolite composition of clusters. For this we employ the Jaccard Index which is a metric for similarity of two vectors. Values near 0 indicate low similarity, values near 1 high similarity. If data is a SummarizedExperiment object the results of compare_metabolites are stored in metadata(data)[[“comparison_metabolites”]]
longitudinalMetabolomics <- compare_metabolites(
data = longitudinalMetabolomics
)
heatmap_metabolites(data = longitudinalMetabolomics)
The brighter a tile, the higher is the similarity of two clusters in regard
to their metabolite composition.
We have two clusters that are similar in their metabolite composition:
C_6 and A_5. If we compare that to the dynamics profiles and ORA analysis
we see that similar functional modules are over-expressed as expected BUT
the dynamics differ between the two radiation doses.
Can we facilitate visualization?
dynamics <- metadata(longitudinalMetabolomics)[["comparison_dynamics"]][["estimates"]]
metabolites <- metadata(longitudinalMetabolomics)[["comparison_metabolites"]]
temp <- left_join(dynamics, metabolites, join_by("cluster_a", "cluster_b"))
x <- unique(c(temp[, "cluster_a"], temp[, "cluster_b"]))
temp <- temp %>% mutate(scale_Jaccard = scale(Jaccard))
ggplot(temp, aes(x = cluster_b, y = cluster_a)) +
geom_point(aes(size = Jaccard, col = mu_mean)) +
theme_bw() +
scale_color_viridis_c(option = "magma") +
scale_x_discrete(limits = x) +
xlab("") +
ylab("") +
scale_y_discrete(limits = x) +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1)) +
labs(col = "dynamics distance", size = "metabolite similarity") +
ggtitle("comparison of clusters")
We can find a cluster pair that is pretty similar in regards to their composing metabolites but dissimilar in regards to their dynamics. Their ORA profiles are quite similar as expected from the similar metabolite compositions but they show different dynamics between experimental conditions: B_7 and A_4
sessionInfo()
## R Under development (unstable) (2025-01-20 r87609)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.21-bioc/R/lib/libRblas.so
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0 LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: America/New_York
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] tidyr_1.3.1 dplyr_1.1.4
## [3] ggplot2_3.5.1 SummarizedExperiment_1.37.0
## [5] Biobase_2.67.0 GenomicRanges_1.59.1
## [7] GenomeInfoDb_1.43.4 IRanges_2.41.2
## [9] S4Vectors_0.45.2 BiocGenerics_0.53.6
## [11] generics_0.1.3 MatrixGenerics_1.19.1
## [13] matrixStats_1.5.0 MetaboDynamics_0.99.18
## [15] BiocStyle_2.35.0
##
## loaded via a namespace (and not attached):
## [1] KEGGREST_1.47.0 gtable_0.3.6 xfun_0.50
## [4] bslib_0.9.0 QuickJSR_1.5.1 inline_0.3.21
## [7] lattice_0.22-6 vctrs_0.6.5 tools_4.5.0
## [10] curl_6.2.0 parallel_4.5.0 tibble_3.2.1
## [13] pkgconfig_2.0.3 Matrix_1.7-2 RcppParallel_5.1.10
## [16] lifecycle_1.0.4 GenomeInfoDbData_1.2.13 farver_2.1.2
## [19] compiler_4.5.0 stringr_1.5.1 Biostrings_2.75.3
## [22] tinytex_0.54 munsell_0.5.1 codetools_0.2-20
## [25] htmltools_0.5.8.1 sass_0.4.9 yaml_2.3.10
## [28] pillar_1.10.1 crayon_1.5.3 jquerylib_0.1.4
## [31] DelayedArray_0.33.5 cachem_1.1.0 magick_2.8.5
## [34] StanHeaders_2.32.10 abind_1.4-8 rstan_2.32.6
## [37] tidyselect_1.2.1 digest_0.6.37 stringi_1.8.4
## [40] purrr_1.0.2 bookdown_0.42 labeling_0.4.3
## [43] fastmap_1.2.0 grid_4.5.0 colorspace_2.1-1
## [46] cli_3.6.3 SparseArray_1.7.5 magrittr_2.0.3
## [49] loo_2.8.0 S4Arrays_1.7.2 utf8_1.2.4
## [52] pkgbuild_1.4.6 withr_3.0.2 UCSC.utils_1.3.1
## [55] scales_1.3.0 rmarkdown_2.29 XVector_0.47.2
## [58] httr_1.4.7 gridExtra_2.3 png_0.1-8
## [61] evaluate_1.0.3 knitr_1.49 V8_6.0.1
## [64] viridisLite_0.4.2 rstantools_2.4.0 rlang_1.1.5
## [67] Rcpp_1.0.14 glue_1.8.0 BiocManager_1.30.25
## [70] jsonlite_1.8.9 R6_2.5.1