limpca
on the UCH metabolomics dataset.limpca 1.0.0
Package loading
library(pander)
library(gridExtra)
library(ggplot2)
library(SummarizedExperiment)
The model used in this example is a three-way ANOVA with fixed effects. This document presents all the usual steps of the analysis, from importing the data to visualising the results. Details on the methods used and the package implementation can be found in the articles of Thiel, Féraud, and Govaerts (2017), Guisset, Martin, and Govaerts (2019) and Thiel et al. (2023).
limpca
packagelimpca
can be installed from Bioconductor:
if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager")
BiocManager::install("limpca")
And then loaded into your R session:
library("limpca")
In order to use the limpca core functions, the data need to be formatted as a list (informally called an lmpDataList) with the following elements: outcomes
(multivariate matrix), design
(data.frame) and formula
(character string).
The UCH
data set is already formatted appropriately and can be loaded from limpca
with the data
function.
data("UCH")
str(UCH)
List of 3
$ design :'data.frame': 34 obs. of 5 variables:
..$ Hippurate: Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 2 2 2 2 ...
..$ Citrate : Factor w/ 3 levels "0","2","4": 1 1 2 2 3 3 1 1 2 2 ...
..$ Dilution : Factor w/ 1 level "diluted": 1 1 1 1 1 1 1 1 1 1 ...
..$ Day : Factor w/ 2 levels "2","3": 1 1 1 1 1 1 1 1 1 1 ...
..$ Time : Factor w/ 2 levels "1","2": 1 2 1 2 1 2 1 2 1 2 ...
$ outcomes: num [1:34, 1:600] 0.0312 0.0581 0.027 0.0341 0.0406 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:34] "M2C00D2R1" "M2C00D2R2" "M2C02D2R1" "M2C02D2R2" ...
.. ..$ X1: chr [1:600] "9.9917004" "9.9753204" "9.9590624" "9.9427436" ...
$ formula : chr "outcomes ~ Hippurate + Citrate + Time + Hippurate:Citrate + Time:Hippurate + Time:Citrate + Hippurate:Citrate:Time"
Alternatively, the lmpDataList can be created with the function data2LmpDataList
:
UCH2 <- data2LmpDataList(outcomes = UCH$outcomes, design = UCH$design,
formula = UCH$formula)
| dim outcomes: 34x600
| formula: ~ Hippurate + Citrate + Time + Hippurate:Citrate + Time:Hippurate + Time:Citrate + Hippurate:Citrate:Time
| design variables (5):
* Hippurate (factor)
* Citrate (factor)
* Dilution (factor)
* Day (factor)
* Time (factor)
SummarizedExperiment
:se <- SummarizedExperiment(assays = list(counts = t(UCH$outcomes)),
colData = UCH$design, metadata = list(formula = UCH$formula))
UCH3 <- data2LmpDataList(se, assay_name = "counts")
| dim outcomes: 34x600
| formula: ~ Hippurate + Citrate + Time + Hippurate:Citrate + Time:Hippurate + Time:Citrate + Hippurate:Citrate:Time
| design variables (5):
* Hippurate (factor)
* Citrate (factor)
* Dilution (factor)
* Day (factor)
* Time (factor)
SummarizedExperiment
is a generic data container that stores rectangular matrices of experimental results. See Morgan et al. (2023) for more information.
The UCH (Urine-Citrate-Hippurate) data set is described in Thiel, Féraud, and Govaerts (2017) and Guisset, Martin, and Govaerts (2019) and is issued form a metabolomics experiment. In this experiment, 36 samples of a pool of rat urine samples were spiked with two molecules Citrate and Hippurate according to a \(3^2\) full factorial design in the quantities of these two molecules. The spiked samples were analyzed by 1H NMR at two different time after defrozing and over two different days. Two of the spectra where finally missing at the end of the experiment.
The UCH data set is a list containing 2 elements:
outcomes
matrix with 34 observations of 600 response variables representing the spectra from the 1H NMR spectroscopy,design
matrix with 34 observations and 4 explanatory variablesA formula
with the general linear model to be estimated will be added to this list below.
For the purpose of this example, only 3 factors of interest will be studied : concentrations of Hippurate and Citrate and Time after defrozing.
Here are the limpca
functions that are available for data exploration.
The design matrix contains the information about each observation for the four variables: Hippurate, Citrate, Day and Time. Only 3 of these variables are used in the model. The function plotDesign
is useful to visualise the design.
pander(head(UCH$design))
Hippurate | Citrate | Dilution | Day | Time | |
---|---|---|---|---|---|
M2C00D2R1 | 0 | 0 | diluted | 2 | 1 |
M2C00D2R2 | 0 | 0 | diluted | 2 | 2 |
M2C02D2R1 | 0 | 2 | diluted | 2 | 1 |
M2C02D2R2 | 0 | 2 | diluted | 2 | 2 |
M2C04D2R1 | 0 | 4 | diluted | 2 | 1 |
M2C04D2R2 | 0 | 4 | diluted | 2 | 2 |
plotDesign(design = UCH$design, x = "Hippurate", y = "Citrate",
rows = "Time", title = "Design of the UCH dataset")
This plot confirms that the design is a full 3x3x2 factorial design replicated twice with 2 missing values. Hence, the design is not balanced.
The 600 response (outcomes
) variables represent, for each observation, the intensities of the 1H NMR spectra. These spectra can be visualized by the plotLine
function.
plotLine
functionHere, annotations (cit_peaks
and hip_peaks
) are added to the ggplot
objects in order to highlight the Hippurate (green) and Citrate (red) peaks.
p1 <- plotLine(Y = UCH$outcomes, title = "H-NMR spectrum", rows = c(3),
xlab = "ppm", ylab = "Intensity")
cit_peaks <- annotate("rect", xmin = c(2.509), xmax = c(2.709),
ymin = -Inf, ymax = Inf, alpha = 0.2, fill = c("tomato"))
hip_peaks <- annotate("rect", xmin = c(7.458, 3.881), xmax = c(7.935,
4.041), ymin = -Inf, ymax = Inf, alpha = 0.2, fill = c("yellowgreen"))
p1 <- plotLine(Y = UCH$outcomes, title = "H-NMR spectrum", rows = c(3),
xlab = "ppm", ylab = "Intensity")
p1 + cit_peaks + hip_peaks
plotScatter
functionThe plotScatter
function enables to visualize the values of two outcomes variables with different colors or markers for the values of the design factors. Here, it is used to show that the \(3^2\) factorial design can be recovered from the intensities of the Hippurate and Citrate peaks in the spectra.
# xy corresponds to citrate (453) and hippurate peaks (369)
plotScatter(Y = UCH$outcomes, xy = c(453, 369), design = UCH$design,
color = "Hippurate", shape = "Citrate")
# Or refering to the variables names (ppm values here)
plotScatter(Y = UCH$outcomes, xy = c("2.6092056", "3.9811536"),
design = UCH$design, color = "Hippurate", shape = "Citrate")