This package aims to import, parse, and analyze KEGG data such as KEGG PATHWAY and KEGG MODULE. The package supports visualizing KEGG information using ggplot2 and ggraph through using the grammar of graphics. The package enables the direct visualization of the results from various omics analysis packages and the connection to the other tidy manipulation packages. In this documentation, the basic usage of ggkegg
is presented. Please refer to the documentation for the detailed usage.
There are many great packages performing KEGG PATHWAY analysis in R. pathview fetches KEGG PATHWAY information, enabling the output of images reflecting various user-defined values on the map. KEGGlincs can overlay LINCS data to KEGG PATHWAY, and examine the map using Cytoscape. graphite acquires pathways including KEGG and Reactome, convert them into graphNEL format, and provides an interface for topological analysis. KEGGgraph also downloads KEGG PATHWAY information and converts it into a format analyzable in R. Extending to these packages, the purpose of developing this package, ggkegg
, is to allow for tidy manipulation of KEGG information by the power of tidygraph
, to plot the relevant information in flexible and customizable ways using grammar of graphics, to examine the global and overview maps consisting of compounds and reactions.
The users can obtain a KEGG PATHWAY tbl_graph
by pathway
function. If you want to cache the file, please specify use_cache=TRUE
, and if you already have the XML files of the pathway, please specify the directory of the file with directory
argument. Here, we obtain Cell cycle
pathway (hsa04110
) using cache. pathway_id
column is inserted to node and edge by default, which allows for the identification of the pathway ID in the other functions.
library(ggkegg)
library(tidygraph)
library(dplyr)
graph <- ggkegg::pathway("hsa04110", use_cache=TRUE)
graph
## # A tbl_graph: 134 nodes and 157 edges
## #
## # A directed acyclic multigraph with 40 components
## #
## # Node Data: 134 × 18 (active)
## name type reaction graphics_name x y width height fgcolor bgcolor
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
## 1 hsa:10… gene <NA> CDKN2A, ARF,… 532 -218 46 17 #000000 #BFFFBF
## 2 hsa:51… gene <NA> FZR1, CDC20C… 981 -630 46 17 #000000 #BFFFBF
## 3 hsa:41… gene <NA> MCM2, BM28, … 553 -681 46 17 #000000 #BFFFBF
## 4 hsa:23… gene <NA> ORC6, ORC6L.… 494 -681 46 17 #000000 #BFFFBF
## 5 hsa:10… gene <NA> ANAPC10, APC… 981 -392 46 17 #000000 #BFFFBF
## 6 hsa:10… gene <NA> ANAPC10, APC… 981 -613 46 17 #000000 #BFFFBF
## 7 hsa:65… gene <NA> SKP1, EMC19,… 188 -613 46 17 #000000 #BFFFBF
## 8 hsa:65… gene <NA> SKP1, EMC19,… 432 -285 46 17 #000000 #BFFFBF
## 9 hsa:983 gene <NA> CDK1, CDC2, … 780 -562 46 17 #000000 #BFFFBF
## 10 hsa:701 gene <NA> BUB1B, BUB1b… 873 -392 46 17 #000000 #BFFFBF
## # ℹ 124 more rows
## # ℹ 8 more variables: graphics_type <chr>, coords <chr>, xmin <dbl>,
## # xmax <dbl>, ymin <dbl>, ymax <dbl>, orig.id <chr>, pathway_id <chr>
## #
## # Edge Data: 157 × 6
## from to type subtype_name subtype_value pathway_id
## <int> <int> <chr> <chr> <chr> <chr>
## 1 118 39 GErel expression --> hsa04110
## 2 50 61 PPrel inhibition --| hsa04110
## 3 50 61 PPrel phosphorylation +p hsa04110
## # ℹ 154 more rows
The output can be analysed readily using tidygraph
and dplyr
verbs. For example, centrality calculations can be performed as follows.
graph |>
mutate(degree=centrality_degree(mode="all"),
betweenness=centrality_betweenness()) |>
activate(nodes) |>
filter(type=="gene") |>
arrange(desc(degree)) |>
as_tibble() |>
relocate(degree, betweenness)
## # A tibble: 112 × 20
## degree betweenness name type reaction graphics_name x y width
## <dbl> <dbl> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
## 1 11 144 hsa:7157 gene <NA> TP53, BCC7, … 590 -337 46
## 2 10 8 hsa:993 gene <NA> CDC25A, CDC2… 614 -496 46
## 3 9 0 hsa:983 gene <NA> CDK1, CDC2, … 689 -562 46
## 4 9 78.7 hsa:5925 gene <NA> RB1, OSRC, P… 353 -630 46
## 5 8 15 hsa:5347 gene <NA> PLK1, PLK, S… 862 -562 46
## 6 8 7 hsa:1111 h… gene <NA> CHEK1, CHK1,… 696 -393 46
## 7 7 0 hsa:983 gene <NA> CDK1, CDC2, … 780 -562 46
## 8 7 161. hsa:1026 gene <NA> CDKN1A, CAP2… 459 -407 46
## 9 7 5 hsa:994 hs… gene <NA> CDC25B... 830 -496 46
## 10 6 7 hsa:9088 gene <NA> PKMYT1, MYT1… 763 -622 46
## # ℹ 102 more rows
## # ℹ 11 more variables: height <dbl>, fgcolor <chr>, bgcolor <chr>,
## # graphics_type <chr>, coords <chr>, xmin <dbl>, xmax <dbl>, ymin <dbl>,
## # ymax <dbl>, orig.id <chr>, pathway_id <chr>
ggraph
The parsed tbl_graph
can be used to plot the information by ggraph
using the grammar of graphics. The components in the graph such as nodes, edges, and text can be plotted layer by layer.
graph <- graph |> mutate(showname=strsplit(graphics_name, ",") |>
vapply("[", 1, FUN.VALUE="a"))
ggraph(graph, layout="manual", x=x, y=y)+
geom_edge_parallel(aes(linetype=subtype_name),
arrow=arrow(length=unit(1,"mm"), type="closed"),
end_cap=circle(1,"cm"),
start_cap=circle(1,"cm"))+
geom_node_rect(aes(fill=I(bgcolor),
filter=type == "gene"),
color="black")+
geom_node_text(aes(label=showname,
filter=type == "gene"),
size=2)+
theme_void()