Package: xcms
Authors: Johannes Rainer, Michael Witting
Modified: 2024-12-15 13:40:29.964073
Compiled: Sun Dec 15 19:33:26 2024

1 Introduction

Metabolite identification is an important step in non-targeted metabolomics and requires different steps. One involves the use of tandem mass spectrometry to generate fragmentation spectra of detected metabolites (LC-MS/MS), which are then compared to fragmentation spectra of known metabolites. Different approaches exist for the generation of these fragmentation spectra, whereas the most used is data dependent acquisition (DDA) also known as the top-n method. In this method the top N most intense ions (m/z values) from a MS1 scan are selected for fragmentation in the next N scans before the cycle starts again. This method allows to generate clean MS2 fragmentation spectra on the fly during acquisition without the need for further experiments, but suffers from poor coverage of the detected metabolites (since only a limited number of ions are fragmented) and less accurate quantification of the compounds (since fewer MS1 scans are generated).

Data independent approaches (DIA) like Bruker bbCID, Agilent AllIons or Waters MSe don’t use such a preselection, but rather fragment all detected molecules at once. They are using alternating schemes with scan of low and high collision energy to collect MS1 and MS2 data. Using this approach, there is no problem in coverage, but the relation between the precursor and fragment masses is lost leading to chimeric spectra. Sequential Window Acquisition of all Theoretical Mass Spectra (or SWATH [1]) combines both approaches through a middle-way approach. There is no precursor selection and acquisition is independent of acquired data, but rather than isolating all precusors at once, defined windows (i.e. ranges of m/z values) are used and scanned. This reduces the overlap of fragment spectra while still keeping a high coverage.

This document showcases the analysis of two small LC-MS/MS data sets using xcms. The data files used are reversed-phase LC-MS/MS runs from the Agilent Pesticide mix obtained from a Sciex 6600 Triple ToF operated in SWATH acquisition mode. For comparison a DDA file from the same sample is included.

2 Analysis of DDA data

Below we load the example DDA data set and create a total ion chromatogram of its MS1 data.

library(xcms)
library(MsExperiment)

dda_file <- system.file("TripleTOF-SWATH", "PestMix1_DDA.mzML",
                        package = "msdata")
dda_data <- readMsExperiment(dda_file)
chr <- chromatogram(dda_data, aggregationFun = "sum", msLevel = 1L)

According to the TIC most of the signal is measured between ~ 200 and 600 seconds (see plot below). We thus filter the DDA data to this retention time range.

plot(chr)
abline(v = c(230, 610))

## filter the data
dda_data <- filterRt(dda_data, rt = c(230, 610))
## Filter spectra

The variable dda_data contains now all MS1 and MS2 spectra from the specified mzML file. The number of spectra for each MS level is listed below. Note that we subset the experiment to the first data file (using [1]) and then access directly the spectra within this sample with the spectra() function (which returns a Spectra object from the Spectra package). Note that we use the pipe operator |> for better readability.

dda_data[1] |>
spectra() |>
msLevel() |>
table()
## 
##    1    2 
## 1389 2214

For the MS2 spectra we can get the m/z of the precursor ion with the precursorMz() function. Below we first subset the data set again to a single sample and filter to spectra from MS level 2 extracting then their precursor m/z values.

dda_data[1] |>
spectra() |>
filterMsLevel(2) |>
precursorMz() |>
head()
## [1] 182.18777 182.18893  55.00579 182.19032 237.12296 237.11987

With the precursorIntensity() function it is also possible to extract the intensity of the precursor ion.

dda_data[1] |>
spectra() |>
filterMsLevel(2) |>
precursorIntensity() |>
head()
## [1] 0 0 0 0 0 0

Some manufacturers (like Sciex) don’t define/export the precursor intensity and thus either NA or 0 is reported. We can however use the estimatePrecursorIntensity() function from the Spectra package to determine the precursor intensity for a MS 2 spectrum based on the intensity of the respective ion in the previous MS1 scan (note that with method = "interpolation" the precursor intensity would be defined based on interpolation between the intensity in the previous and subsequent MS1 scan). Below we estimate the precursor intensities, on the full data (for MS1 spectra a NA value is reported).

prec_int <- estimatePrecursorIntensity(spectra(dda_data))

We next set the precursor intensity in the spectrum metadata of dda_data. So that it can be extracted later with the precursorIntensity() function.

spectra(dda_data)$precursorIntensity <- prec_int

dda_data[1] |>
spectra() |>
filterMsLevel(2) |>
precursorIntensity() |>
head()
## [1]        NA  9.198155  2.773988 27.590797  3.443145  7.621923

Next we perform the chromatographic peak detection on the MS level 1 data with the findChromPeaks() method. Below we define the settings for a centWave-based peak detection and perform the analysis.

cwp <- CentWaveParam(snthresh = 5, noise = 100, ppm = 10,
                     peakwidth = c(3, 30))
dda_data <- findChromPeaks(dda_data, param = cwp, msLevel = 1L)

In total 114 peaks were identified in the present data set.

The advantage of LC-MS/MS data is that (MS1) ions are fragmented and the corresponding MS2 spectra measured. Thus, for some of the ions (identified as MS1 chromatographic peaks) MS2 spectra are available. These can facilitate the annotation of the respective MS1 chromatographic peaks (or MS1 features after a correspondence analysis). Spectra for identified chromatographic peaks can be extracted with the chromPeakSpectra() method. MS2 spectra with their precursor m/z and retention time within the rt and m/z range of the chromatographic peak are returned.

library(Spectra)
dda_spectra <- chromPeakSpectra(dda_data, msLevel = 2L)
dda_spectra
## MSn data (Spectra) with 142 spectra in a MsBackendMzR backend:
##       msLevel     rtime scanIndex
##     <integer> <numeric> <integer>
## 1           2   237.869      1812
## 2           2   241.299      1846
## 3           2   325.763      2439
## 4           2   326.583      2446
## 5           2   327.113      2450
## ...       ...       ...       ...
## 138         2   574.725      5110
## 139         2   575.255      5115
## 140         2   596.584      5272
## 141         2   592.424      5236
## 142         2   596.054      5266
##  ... 36 more variables/columns.
## 
## file(s):
## PestMix1_DDA.mzML
## Processing:
##  Filter: select retention time [230..610] on MS level(s) 1 2 [Sun Dec 15 19:33:28 2024]
##  Filter: select MS level(s) 2 [Sun Dec 15 19:33:39 2024]
##  Merge 1 Spectra into one [Sun Dec 15 19:33:39 2024]

By default chromPeakSpectra() returns all spectra associated with a MS1 chromatographic peak, but parameter method allows to choose and return only one spectrum per peak (have a look at the ?chromPeakSpectra help page for more details). Also, it would be possible to extract MS1 spectra for each peak by specifying msLevel = 1L in the call above (e.g. to evaluate the full MS1 signal at the peak’s apex position).

The returned Spectra contains also the reference to the respective chromatographic peak as additional spectra variable "chrom_peak_id" that contains the identifier for the chromatographic peak (i.e. its row name in the chromPeaks matrix).

dda_spectra$chrom_peak_id
##   [1] "CP004" "CP004" "CP005" "CP005" "CP006" "CP006" "CP008" "CP008" "CP011"
##  [10] "CP011" "CP012" "CP012" "CP013" "CP013" "CP013" "CP013" "CP014" "CP014"
##  [19] "CP014" "CP014" "CP018" "CP022" "CP022" "CP022" "CP025" "CP025" "CP025"
##  [28] "CP025" "CP026" "CP026" "CP026" "CP026" "CP033" "CP033" "CP034" "CP034"
##  [37] "CP034" "CP034" "CP034" "CP035" "CP035" "CP035" "CP041" "CP041" "CP041"
##  [46] "CP042" "CP042" "CP042" "CP043" "CP047" "CP047" "CP049" "CP049" "CP049"
##  [55] "CP049" "CP050" "CP050" "CP050" "CP051" "CP051" "CP051" "CP054" "CP055"
##  [64] "CP055" "CP055" "CP056" "CP056" "CP056" "CP056" "CP056" "CP060" "CP060"
##  [73] "CP060" "CP060" "CP064" "CP064" "CP065" "CP065" "CP066" "CP066" "CP069"
##  [82] "CP069" "CP069" "CP070" "CP070" "CP070" "CP072" "CP072" "CP072" "CP073"
##  [91] "CP074" "CP074" "CP074" "CP074" "CP075" "CP075" "CP075" "CP077" "CP077"
## [100] "CP077" "CP079" "CP079" "CP079" "CP079" "CP080" "CP080" "CP080" "CP081"
## [109] "CP086" "CP086" "CP086" "CP086" "CP086" "CP088" "CP088" "CP088" "CP089"
## [118] "CP089" "CP091" "CP091" "CP093" "CP093" "CP094" "CP094" "CP094" "CP095"
## [127] "CP095" "CP095" "CP096" "CP096" "CP096" "CP098" "CP098" "CP098" "CP098"
## [136] "CP098" "CP099" "CP099" "CP099" "CP100" "CP101" "CP101"

Some information about the chromatographic peak can also be added to the returned Spectra object using the chrompeakColumns parameter in chromPeakSpectra(). By default, the m/z and retention time of the chromatographic peak are added to the spectra metadata.

dda_spectra$chrom_peak_mz
##   [1] 219.0957 219.0957 153.0659 153.0659 235.1447 235.1447 298.2751 298.2751
##   [9] 284.0545 284.0545 306.0364 306.0364 589.0833 589.0833 589.0833 589.0833
##  [17] 227.9913 227.9913 227.9913 227.9913 300.0312 278.0916 278.0916 278.0916
##  [25] 256.1102 256.1102 256.1102 256.1102 224.0836 224.0836 224.0836 224.0836
##  [33] 309.9980 309.9980 376.0385 376.0385 376.0385 376.0385 376.0385 308.0013
##  [41] 308.0013 308.0013 286.1806 286.1806 286.1806 292.1215 292.1215 292.1215
##  [49] 313.0393 276.0817 276.0817 326.0952 326.0952 326.0952 326.0952 629.2013
##  [57] 629.2013 629.2013 607.2193 607.2193 607.2193 291.1192 289.1217 289.1217
##  [65] 289.1217 304.1133 304.1133 304.1133 304.1133 304.1133 382.9727 382.9727
##  [73] 382.9727 382.9727 249.0247 249.0247 603.1183 603.1183 291.0719 291.0719
##  [81] 313.0540 313.0540 313.0540 231.0145 231.0145 231.0145 386.0546 386.0546
##  [89] 386.0546 152.0504 194.0969 194.0969 194.0969 194.0969 364.0734 364.0734
##  [97] 364.0734 238.1221 238.1221 238.1221 349.1518 349.1518 349.1518 349.1518
## [105] 327.1698 327.1698 327.1698 205.0970 306.1636 306.1636 306.1636 306.1636
## [113] 306.1636 354.0895 354.0895 354.0895 230.9875 230.9875 216.1414 216.1414
## [121] 395.0813 395.0813 284.1641 284.1641 284.1641 331.2081 331.2081 331.2081
## [129] 330.2067 330.2067 330.2067 330.9940 330.9940 330.9940 330.9940 330.9940
## [137] 373.0414 373.0414 373.0414 313.0391 411.1123 411.1123

Note also that with return.type = "List" a list parallel to the chromPeaks matrix would be returned, i.e. each element in that list would contain the spectra for the chromatographic peak with the same index. Such data representation might eventually simplify further processing.

We next use the MS2 information to aid in the annotation of a chromatographic peak. As an example we use a chromatographic peak of an ion with an m/z of 304.1131 which we extract in the code block below.

ex_mz <- 304.1131
chromPeaks(dda_data, mz = ex_mz, ppm = 20)
##             mz    mzmin    mzmax      rt   rtmin   rtmax    into     intb
## CP056 304.1133 304.1126 304.1143 425.024 417.985 441.773 13040.8 13007.79
##           maxo  sn sample
## CP056 3978.987 232      1

A search of potential ions with a similar m/z in a reference database (e.g. Metlin) returned a large list of potential hits, most with a very small ppm. For two of the hits, Flumazenil (Metlin ID 2724) and Fenamiphos (Metlin ID 72445) experimental MS2 spectra are available. Thus, we could match the MS2 spectrum for the identified chromatographic peak against these to annotate our ion. Below we extract all MS2 spectra that were associated with the candidate chromatographic peak using the ID of the peak in the present data set.

ex_id <- rownames(chromPeaks(dda_data, mz = ex_mz, ppm = 20))
ex_spectra <- dda_spectra[dda_spectra$chrom_peak_id == ex_id]
ex_spectra
## MSn data (Spectra) with 5 spectra in a MsBackendMzR backend:
##     msLevel     rtime scanIndex
##   <integer> <numeric> <integer>
## 1         2   418.926      3505
## 2         2   419.306      3510
## 3         2   423.036      3582
## 4         2   423.966      3603
## 5         2   424.296      3609
##  ... 36 more variables/columns.
## 
## file(s):
## PestMix1_DDA.mzML
## Processing:
##  Filter: select retention time [230..610] on MS level(s) 1 2 [Sun Dec 15 19:33:28 2024]
##  Filter: select MS level(s) 2 [Sun Dec 15 19:33:39 2024]
##  Merge 1 Spectra into one [Sun Dec 15 19:33:39 2024]

There are 5 MS2 spectra representing fragmentation of the ion(s) measured in our candidate chromatographic peak. We next reduce this to a single MS2 spectrum using the combineSpectra() method employing the combinePeaks() function to determine which peaks to keep in the resulting spectrum (have a look at the ?combinePeaks help page for details). Parameter f allows to specify which spectra in the input object should be combined into one. Note that this combination of multiple fragment spectra into a single spectrum might not be generally the best approach or suggested for all types of data.

ex_spectrum <- combineSpectra(ex_spectra, FUN = combinePeaks, ppm = 20,
                              peaks = "intersect", minProp = 0.8,
                              intensityFun = median, mzFun = median,
                              f = ex_spectra$chrom_peak_id)
## Warning in FUN(X[[i]], ...): 'combinePeaks' for lists of peak matrices is
## deprecated; please use 'combinePeaksData' instead.
ex_spectrum
## MSn data (Spectra) with 1 spectra in a MsBackendMemory backend:
##     msLevel     rtime scanIndex
##   <integer> <numeric> <integer>
## 1         2   418.926      3505
##  ... 36 more variables/columns.
## Processing:
##  Filter: select retention time [230..610] on MS level(s) 1 2 [Sun Dec 15 19:33:28 2024]
##  Filter: select MS level(s) 2 [Sun Dec 15 19:33:39 2024]
##  Merge 1 Spectra into one [Sun Dec 15 19:33:39 2024]
##  ...1 more processings. Use 'processingLog' to list all.

Mass peaks from all input spectra with a difference in m/z smaller 20 ppm (parameter ppm) were combined into one peak and the median m/z and intensity is reported for these. Due to parameter minProp = 0.8, the resulting MS2 spectrum contains only peaks that were present in 80% of the input spectra.

A plot of this consensus spectrum is shown below.

plotSpectra(ex_spectrum)