Package loading

library(pander)
library(gridExtra)
library(ggplot2)
library(SummarizedExperiment)

1 Introduction

The model used in this example is a three-way ANOVA with fixed effects. This document presents all the usual steps of the analysis, from importing the data to visualising the results. Details on the methods used and the package implementation can be found in the articles of Thiel, Féraud, and Govaerts (2017), Guisset, Martin, and Govaerts (2019) and Thiel et al. (2023).

2 Installation and loading of the limpca package

limpca can be installed from Bioconductor:

if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager")

BiocManager::install("limpca")

And then loaded into your R session:

library("limpca")

3 Data import

In order to use the limpca core functions, the data need to be formatted as a list (informally called an lmpDataList) with the following elements: outcomes (multivariate matrix), design (data.frame) and formula (character string). The UCH data set is already formatted appropriately and can be loaded from limpca with the data function.

data("UCH")

str(UCH)
List of 3
 $ design  :'data.frame':   34 obs. of  5 variables:
  ..$ Hippurate: Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 2 2 2 2 ...
  ..$ Citrate  : Factor w/ 3 levels "0","2","4": 1 1 2 2 3 3 1 1 2 2 ...
  ..$ Dilution : Factor w/ 1 level "diluted": 1 1 1 1 1 1 1 1 1 1 ...
  ..$ Day      : Factor w/ 2 levels "2","3": 1 1 1 1 1 1 1 1 1 1 ...
  ..$ Time     : Factor w/ 2 levels "1","2": 1 2 1 2 1 2 1 2 1 2 ...
 $ outcomes: num [1:34, 1:600] 0.0312 0.0581 0.027 0.0341 0.0406 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$   : chr [1:34] "M2C00D2R1" "M2C00D2R2" "M2C02D2R1" "M2C02D2R2" ...
  .. ..$ X1: chr [1:600] "9.9917004" "9.9753204" "9.9590624" "9.9427436" ...
 $ formula : chr "outcomes ~ Hippurate + Citrate + Time + Hippurate:Citrate + Time:Hippurate + Time:Citrate + Hippurate:Citrate:Time"

Alternatively, the lmpDataList can be created with the function data2LmpDataList :

  • from scratch:
UCH2 <- data2LmpDataList(outcomes = UCH$outcomes, design = UCH$design,
    formula = UCH$formula)
| dim outcomes: 34x600
| formula: ~ Hippurate + Citrate + Time + Hippurate:Citrate + Time:Hippurate + Time:Citrate + Hippurate:Citrate:Time
| design variables (5): 
* Hippurate (factor)
* Citrate (factor)
* Dilution (factor)
* Day (factor)
* Time (factor)
  • or from a SummarizedExperiment:
se <- SummarizedExperiment(assays = list(counts = t(UCH$outcomes)),
    colData = UCH$design, metadata = list(formula = UCH$formula))

UCH3 <- data2LmpDataList(se, assay_name = "counts")
| dim outcomes: 34x600
| formula: ~ Hippurate + Citrate + Time + Hippurate:Citrate + Time:Hippurate + Time:Citrate + Hippurate:Citrate:Time
| design variables (5): 
* Hippurate (factor)
* Citrate (factor)
* Dilution (factor)
* Day (factor)
* Time (factor)

SummarizedExperiment is a generic data container that stores rectangular matrices of experimental results. See Morgan et al. (2023) for more information.

4 Data exploration

The UCH (Urine-Citrate-Hippurate) data set is described in Thiel, Féraud, and Govaerts (2017) and Guisset, Martin, and Govaerts (2019) and is issued form a metabolomics experiment. In this experiment, 36 samples of a pool of rat urine samples were spiked with two molecules Citrate and Hippurate according to a \(3^2\) full factorial design in the quantities of these two molecules. The spiked samples were analyzed by 1H NMR at two different time after defrozing and over two different days. Two of the spectra where finally missing at the end of the experiment.

The UCH data set is a list containing 2 elements:

  • an outcomes matrix with 34 observations of 600 response variables representing the spectra from the 1H NMR spectroscopy,
  • a design matrix with 34 observations and 4 explanatory variables

A formula with the general linear model to be estimated will be added to this list below.

For the purpose of this example, only 3 factors of interest will be studied : concentrations of Hippurate and Citrate and Time after defrozing.

Here are the limpca functions that are available for data exploration.

limpca data exploration functions

4.1 Design

The design matrix contains the information about each observation for the four variables: Hippurate, Citrate, Day and Time. Only 3 of these variables are used in the model. The function plotDesign is useful to visualise the design.

pander(head(UCH$design))
  Hippurate Citrate Dilution Day Time
M2C00D2R1 0 0 diluted 2 1
M2C00D2R2 0 0 diluted 2 2
M2C02D2R1 0 2 diluted 2 1
M2C02D2R2 0 2 diluted 2 2
M2C04D2R1 0 4 diluted 2 1
M2C04D2R2 0 4 diluted 2 2
plotDesign(design = UCH$design, x = "Hippurate", y = "Citrate",
    rows = "Time", title = "Design of the UCH dataset")

This plot confirms that the design is a full 3x3x2 factorial design replicated twice with 2 missing values. Hence, the design is not balanced.

4.2 Outcomes visualization

The 600 response (outcomes) variables represent, for each observation, the intensities of the 1H NMR spectra. These spectra can be visualized by the plotLine function.

4.2.1 plotLine function

Here, annotations (cit_peaks and hip_peaks) are added to the ggplot objects in order to highlight the Hippurate (green) and Citrate (red) peaks.

p1 <- plotLine(Y = UCH$outcomes, title = "H-NMR spectrum", rows = c(3),
    xlab = "ppm", ylab = "Intensity")

cit_peaks <- annotate("rect", xmin = c(2.509), xmax = c(2.709),
    ymin = -Inf, ymax = Inf, alpha = 0.2, fill = c("tomato"))

hip_peaks <- annotate("rect", xmin = c(7.458, 3.881), xmax = c(7.935,
    4.041), ymin = -Inf, ymax = Inf, alpha = 0.2, fill = c("yellowgreen"))

p1 <- plotLine(Y = UCH$outcomes, title = "H-NMR spectrum", rows = c(3),
    xlab = "ppm", ylab = "Intensity")

p1 + cit_peaks + hip_peaks

4.2.2 plotScatter function

The plotScatter function enables to visualize the values of two outcomes variables with different colors or markers for the values of the design factors. Here, it is used to show that the \(3^2\) factorial design can be recovered from the intensities of the Hippurate and Citrate peaks in the spectra.

# xy corresponds to citrate (453) and hippurate peaks (369)
plotScatter(Y = UCH$outcomes, xy = c(453, 369), design = UCH$design,
    color = "Hippurate", shape = "Citrate")

# Or refering to the variables names (ppm values here)
plotScatter(Y = UCH$outcomes, xy = c("2.6092056", "3.9811536"),
    design = UCH$design, color = "Hippurate", shape = "Citrate")