1 Introduction

Mitochondria are cellular organelles whose main role is related to the aerobic respiration to supply energy to the cell. In addition to this, they are also involved in other tasks, such as signaling, cellular differentiation, and cell death, as well as maintaining control of the cell cycle and cell growth. The dysfunction of mitochondria activity has been implicated in several human disorders and conditions. Studying mitochondrial behavior in health and disease conditions is opening new possibilities of understanding disease mechanisms and providing new treatments. Indeed, the possibility to target mitochondria for cancer therapies is increasingly becoming a reality in biomedical research.

The analysis of high resolution transcriptomic data can help in better understanding mitochondrial activity in relation to the gene expression dynamics. Even if a curated mitochondrial specific pathway resource exists (MitoCarta3.0, which provides a list of mitochondrial genes organized into mitochondrial pathways [1]) computational tools for mitochondrial-focused pathway analysis are still lacking. In the field of gene set analysis, general purpose resources are pathway databases, such as Reactome [5] and the Gene Ontology (GO) project [4], which aim at providing gene set categories for high-throughput data analysis for all the cellular contexts. In the vast majority of pathways contained in these public databases, the mitochondrial-specific component of cellular signaling constitutes only a small portion, which is often overshadowed by the non-mitochondrial signaling. And, even if the information on protein localization is present, pathway analysis typically does not account for whether the activated signal involves the mitochondrion or not.

Thus, we thought that a tool able to focus specifically on the mitochondrial part of pathways could allow tailored and more specific analysis on this interesting organelle and we started to develop mitology to provide actionable and fast mitochondrial phenodata interpretation of high-throughput data.

These vignete shows some ways to perform mitochondrial pathway analyses using mitology. Also, some visualization functions are implemented to visualize the scores. These can help in the result interpretations.

2 Installation

To install this package:

if (!require("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("mitology")

3 Gene sets collected in mitology

3.1 Gene sets from MitoCarta3.0

The whole and original structure from the MitoCarta3.0 database was kept and included in mitology organized in a three-tier hierarchy.

In addition to MitoCarta3.0, we decided to propose three more comprehensive mitochondrial-oriented gene set resources exploiting Reactome, GO Biological Processes (GO-BP), and GO Cellular Components (GO-CC).

3.2 Gene sets from GO

We explored the GO-CC and GO-BP terms. To filter the GO terms we performed a four-steps selection. Firstly, terms were filtered by size, keeping all the ones with at least 10 mitochondrial genes. Then, we pruned the GO trees of CC and BP by the graph levels, filtering out the 0°, 1°, 2° and 3° levels and keeping the terms from the 4° level. Then, we tested the enrichment of the remaining sets for our mitochondrial gene list with the enrichGO function from the clusterProfiler R package (v4.14.3). Only sets with FDR lower than 0.05 have been passed to the next step. Finally, the last selection was topological, we exploited the GO hierarchical organization selecting the more general enriched gene sets filtering out their offspring terms.

3.3 Gene sets from Reactome

The Reactome pathways were retrieved with the graphite R package (v1.52.0). Reactome has a simpler structure with far less level of nested pathways, thus we applied only three filtering steps: number of mitochondrial genes over 10; FDR under 0.05 (enrichment computed with the phyper function) and filtering of the offspring pathways.

Once we obtained the final terms and pathways respectively from GO and Reactome, we extracted the mitochondrial gene sets by keeping only the genes included in the mitochondrial list. The same names of the original terms/pathways were kept for the gene sets included in mitology. The final gene sets and the corresponding tree-structures of the four databases were kept and included in the mitology package.

4 How to use mitology

4.1 Access mitochondrial gene sets

The getGeneSets function allows to get the mitochondrial gene sets. It returns them for one of the four possible databases (MitoCarta, Reactome, GO-CC and GO-BP) by the database argument. The nametype argument says the type of gene name ID, either one of SYMBOL, ENTREZID or ENSEMBL. The objectType argument can be set to return the gene sets in form of a list or a data frame.

library(mitology)
MC_df <- getGeneSets(
    database = "MitoCarta", nametype = "SYMBOL", objectType = "dataframe")

MC_list <- getGeneSets(
    database = "MitoCarta", nametype = "SYMBOL", objectType = "list")

4.2 Enrichment analyses

In the following section, we use an example bulk expression dataset of ovarian cancer to show how to use mitology to perform an analysis of the mitochondrial activity.

# loading packages
library(SummarizedExperiment)
library(clusterProfiler)
library(GSVA)
library(Biobase)
# load data
data(ovse)
ovse
## class: SummarizedExperiment 
## dim: 2388 40 
## metadata(0):
## assays(1): norm_expr
## rownames(2388): ABCA4 ABCB10 ... USP9X WDR45
## rowData names(2): PROvsIMR_logFC PROvsIMR_FDR
## colnames(40): sample1 sample2 ... sample39 sample40
## colData names(1): OV_subtype

4.2.1 Enrichment analyses of the mitochondrial gene sets

We can perform an over representation analysis with clusterProfiler.

genes <- rownames(ovse)[elementMetadata(ovse)$PROvsIMR_FDR < 0.001]
res_ora <- enricher(gene = genes, TERM2GENE = MC_df)

Or also a single sample gene set enrichment analysis (ssGSEA) with GSVA.

gsvaPar <- ssgseaParam(exprData = ovse, geneSets = MC_list)
## ℹ No assay name provided; using default assay 'norm_expr'
res_ssGSEA <- gsva(gsvaPar)
## ℹ GSVA version 2.2.0
## ! 5 genes with constant values throughout the samples
## ! Some gene sets have size one. Consider setting minSize > 1
## ℹ Calculating  ssGSEA scores for 119 gene sets
## ℹ Calculating ranks
## ℹ Calculating rank weights
## ℹ Normalizing ssGSEA scores
## ✔ Calculations finished

4.3 Visualization

When we obtain a matrix of scores for each sample in each gene set we can visualize it by plotting a circular heatmap on the gene set tree hierarchy.

We can summarize the information of samples by taking the mean value for each OV subtype.

res_ssGSEA_subtype <- do.call(
    cbind, lapply(unique(ovse$OV_subtype), function(x){
        rowMeans(assay(res_ssGSEA)[,ovse$OV_subtype==x])
    }))
colnames(res_ssGSEA_subtype) <- unique(ovse$OV_subtype)
rownames(res_ssGSEA_subtype) <- rownames(res_ssGSEA)
res_ssGSEA_subtype <- t(scale(t(res_ssGSEA_subtype)))
mitoTreeHeatmap(
    data = res_ssGSEA_subtype, database = "MitoCarta", 
    labelNames = "leaves", font.size = 1)

It is possible to plot only the main section arguments instead of the single gene set names.

mitoTreeHeatmap(
    data = res_ssGSEA_subtype, database = "MitoCarta",
    labelNames = "sections", font.size = 3)

These analysis can be done with the mitochondrial gene sets from Reactome, GO-CC and GO-BP.

5 Additional functionalities

More tutorials and examples of other plots are available on an extended tutorial on GitHub.

6 Bibliography

[1] Rath S, Sharma R, Gupta R, et al. MitoCarta3.0: an updated mitochondrial proteome now with sub-organelle localization and pathway annotations. Nucleic Acids Res 2020; 49:D1541–D1547

[2] Peck P. IMPI. 2021

[3] Shen L, Diroma MA, Gonzalez M, et al. MSeqDR: A Centralized Knowledge Repository and Bioinformatics Web Resource to Facilitate Genomic Investigations in Mitochondrial Disease. Hum Mutat 2016; 37:540–548

[4] The Gene Ontology Resource: 20 years and still GOing strong. Nucleic Acids Res 2019; 47:D330–D338

[5] Rothfels K, Milacic M, Matthews L, et al. Using the Reactome Database. Curr Protoc 2023; 3:e722

7 Session Info

Here is the output of sessionInfo() on the system on which this document was compiled.

sessionInfo()
## R version 4.5.0 RC (2025-04-04 r88126)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.2 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.21-bioc/R/lib/libRblas.so 
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0  LAPACK version 3.12.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB              LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: America/New_York
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] GSVA_2.2.0                  clusterProfiler_4.16.0     
##  [3] SummarizedExperiment_1.38.0 Biobase_2.68.0             
##  [5] GenomicRanges_1.60.0        GenomeInfoDb_1.44.0        
##  [7] IRanges_2.42.0              S4Vectors_0.46.0           
##  [9] BiocGenerics_0.54.0         generics_0.1.3             
## [11] MatrixGenerics_1.20.0       matrixStats_1.5.0          
## [13] mitology_1.0.0              BiocStyle_2.36.0           
## 
## loaded via a namespace (and not attached):
##   [1] RColorBrewer_1.1-3          jsonlite_2.0.0             
##   [3] shape_1.4.6.1               magrittr_2.0.3             
##   [5] magick_2.8.6                ggtangle_0.0.6             
##   [7] farver_2.1.2                rmarkdown_2.29             
##   [9] GlobalOptions_0.1.2         fs_1.6.6                   
##  [11] vctrs_0.6.5                 memoise_2.0.1              
##  [13] ggtree_3.16.0               tinytex_0.57               
##  [15] htmltools_0.5.8.1           S4Arrays_1.8.0             
##  [17] Rhdf5lib_1.30.0             rhdf5_2.52.0               
##  [19] SparseArray_1.8.0           gridGraphics_0.5-1         
##  [21] sass_0.4.10                 bslib_0.9.0                
##  [23] plyr_1.8.9                  cachem_1.1.0               
##  [25] igraph_2.1.4                lifecycle_1.0.4            
##  [27] iterators_1.0.14            pkgconfig_2.0.3            
##  [29] rsvd_1.0.5                  Matrix_1.7-3               
##  [31] R6_2.6.1                    fastmap_1.2.0              
##  [33] gson_0.1.0                  GenomeInfoDbData_1.2.14    
##  [35] clue_0.3-66                 digest_0.6.37              
##  [37] aplot_0.2.5                 enrichplot_1.28.0          
##  [39] colorspace_2.1-1            patchwork_1.3.0            
##  [41] AnnotationDbi_1.70.0        irlba_2.3.5.1              
##  [43] RSQLite_2.3.9               org.Hs.eg.db_3.21.0        
##  [45] beachmat_2.24.0             labeling_0.4.3             
##  [47] httr_1.4.7                  abind_1.4-8                
##  [49] compiler_4.5.0              withr_3.0.2                
##  [51] bit64_4.6.0-1               doParallel_1.0.17          
##  [53] BiocParallel_1.42.0         DBI_1.2.3                  
##  [55] HDF5Array_1.36.0            R.utils_2.13.0             
##  [57] DelayedArray_0.34.0         rjson_0.2.23               
##  [59] tools_4.5.0                 ape_5.8-1                  
##  [61] R.oo_1.27.0                 glue_1.8.0                 
##  [63] h5mread_1.0.0               rhdf5filters_1.20.0        
##  [65] nlme_3.1-168                GOSemSim_2.34.0            
##  [67] grid_4.5.0                  cluster_2.1.8.1            
##  [69] reshape2_1.4.4              fgsea_1.34.0               
##  [71] gtable_0.3.6                R.methodsS3_1.8.2          
##  [73] tidyr_1.3.1                 data.table_1.17.0          
##  [75] BiocSingular_1.24.0         ScaledMatrix_1.16.0        
##  [77] XVector_0.48.0              ggrepel_0.9.6              
##  [79] foreach_1.5.2               pillar_1.10.2              
##  [81] stringr_1.5.1               yulab.utils_0.2.0          
##  [83] circlize_0.4.16             splines_4.5.0              
##  [85] dplyr_1.1.4                 treeio_1.32.0              
##  [87] lattice_0.22-7              bit_4.6.0                  
##  [89] annotate_1.86.0             tidyselect_1.2.1           
##  [91] SingleCellExperiment_1.30.0 GO.db_3.21.0               
##  [93] ComplexHeatmap_2.24.0       Biostrings_2.76.0          
##  [95] knitr_1.50                  bookdown_0.43              
##  [97] xfun_0.52                   stringi_1.8.7              
##  [99] UCSC.utils_1.4.0            lazyeval_0.2.2             
## [101] ggfun_0.1.8                 yaml_2.3.10                
## [103] evaluate_1.0.3              codetools_0.2-20           
## [105] tibble_3.2.1                qvalue_2.40.0              
## [107] graph_1.86.0                BiocManager_1.30.25        
## [109] ggplotify_0.1.2             cli_3.6.4                  
## [111] xtable_1.8-4                munsell_0.5.1              
## [113] jquerylib_0.1.4             Rcpp_1.0.14                
## [115] png_0.1-8                   XML_3.99-0.18              
## [117] parallel_4.5.0              ggplot2_3.5.2              
## [119] blob_1.2.4                  DOSE_4.2.0                 
## [121] sparseMatrixStats_1.20.0    SpatialExperiment_1.18.0   
## [123] tidytree_0.4.6              GSEABase_1.70.0            
## [125] scales_1.3.0                purrr_1.0.4                
## [127] crayon_1.5.3                GetoptLong_1.0.5           
## [129] rlang_1.1.6                 cowplot_1.1.3              
## [131] fastmatch_1.1-6             KEGGREST_1.48.0