1 Introduction

Dot plots of single-cell RNA-seq data allow for an examination of the relationships between cell groupings (e.g. clusters) and marker gene expression. The scDotPlot package offers a unified approach to perform a hierarchical clustering analysis and add annotations to the columns and/or rows of a scRNA-seq dot plot. It works with SingleCellExperiment and Seurat objects as well as data frames. The scDotPlot() function uses data from scater::plotDots() or Seurat::DotPlot() along with the aplot package to add dendrograms from ggtree and optional annotations.

2 Installation

if (!requireNamespace("BiocManager", quietly = TRUE)) {
    install.packages("BiocManager")
}

BiocManager::install("scDotPlot")

To install the development version directly from GitHub:

if (!requireNamespace("remotes", quietly = TRUE)) {
    install.packages("remotes")
}

remotes::install_github("ben-laufer/scDotPlot")

3 SingleCellExperiment

3.1 Prepare object

First, we normalize the object and then, for the purpose of this example, subset to remove cells without cell-type labels.

library(scRNAseq)
library(scuttle)

sce <- ZeiselBrainData()

sce <- sce |> 
    logNormCounts() |>  
    subset(x = _, , level2class != "(none)")

3.2 Get features

The features argument accepts a character vector with the gene IDs. For this example, we quickly obtain the top markers of for each cell type and then add them to the rowData of the object.

library(scran)
library(purrr)
library(dplyr)
library(AnnotationDbi)

features <- sce |>
    scoreMarkers(sce$level1class) |>
    map(~ .x |>
            as.data.frame() |>
            arrange(desc(mean.AUC))|>
            dplyr::slice(1:6) |>
            rownames()) |> 
    unlist2()

rowData(sce)$Marker <- features[match(rownames(sce), features)] |>
    names()

3.3 Plot logcounts

Finally, we create the plot. The group arguments utilize the colData, while the features arguments use the rowData. The paletteList argument can be used to manually specify the colors for the annotations specified through groupAnno and featureAnno. The clustering of the columns shows that cell the cell sub-types cluster by cell-type, while the clustering of the rows shows that most of the markers clusters by their cell type.

library(scDotPlot)
library(ggsci)

sce |>
    scDotPlot(features = features,
              group = "level2class",
              groupAnno = "level1class",
              featureAnno = "Marker",
              groupLegends = FALSE,
              annoColors = list("level1class" = pal_d3()(7),
                                "Marker" = pal_d3()(7)),
              annoWidth = 0.02)
scDotPlot of SingleCellExperiment logcounts

Figure 1: scDotPlot of SingleCellExperiment logcounts

3.4 Plot Z-scores

Plotting by Z-score through scale = TRUE improves the clustering result for the rows.

sce |>
    scDotPlot(scale = TRUE,
              features = features,
              group = "level2class",
              groupAnno = "level1class",
              featureAnno = "Marker",
              groupLegends = FALSE,
              annoColors = list("level1class" = pal_d3()(7),
                                "Marker" = pal_d3()(7)),
              annoWidth = 0.02)
scDotPlot of SingleCellExperiment Z-scores

Figure 2: scDotPlot of SingleCellExperiment Z-scores

4 Seurat

4.1 Get features

After loading the example Seurat object, we find the top markers for each cluster and add them to the assay of interest.

library(Seurat)
library(SeuratObject)
library(tibble)

data("pbmc_small")

features <- pbmc_small |>
    FindAllMarkers(only.pos = TRUE, verbose = FALSE) |>
    group_by(cluster) |>
    dplyr::slice(1:6) |>
    dplyr::select(cluster, gene)

pbmc_small[[DefaultAssay(pbmc_small)]][[]] <- pbmc_small[[DefaultAssay(pbmc_small)]][[]] |>
    rownames_to_column("gene") |> 
    left_join(features, by = "gene") |> 
    column_to_rownames("gene")

features <- features |> 
    deframe()

4.2 Plot logcounts

Plotting a Seurat object is similar to plotting a SingleCellExperiment object.

pbmc_small |>
    scDotPlot(features = features,
              group = "RNA_snn_res.1",
              groupAnno = "RNA_snn_res.1",
              featureAnno = "cluster",
              annoColors = list("RNA_snn_res.1" = pal_d3()(7),
                                "cluster" = pal_d3()(7)),
              groupLegends = FALSE,
              annoWidth = 0.075)
scDotPlot of Seurat logcounts

Figure 3: scDotPlot of Seurat logcounts

4.3 Plot Z-scores

Again, we see that plotting by Z-score improves the clustering result for the rows.

pbmc_small |>
    scDotPlot(scale = TRUE,
              features = features,
              group = "RNA_snn_res.1",
              groupAnno = "RNA_snn_res.1",
              featureAnno = "cluster",
              annoColors = list("RNA_snn_res.1" = pal_d3()(7),
                                "cluster" = pal_d3()(7)),
              groupLegends = FALSE,
              annoWidth = 0.075)
scDotPlot of Seurat Z-scores

Figure 4: scDotPlot of Seurat Z-scores

5 Package support

The Bioconductor support site is the preferred method to ask for help. Before posting, it’s recommended to check previous posts for the answer and look over the posting guide. For the post, it’s important to use the scDotPlot tag and provide both a minimal reproducible example and session information.

6 Acknowledgement

This package was inspired by the single-cell example from aplot.

7 Session info

## R Under development (unstable) (2024-10-21 r87258)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.21-bioc/R/lib/libRblas.so 
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB              LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: America/New_York
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] tibble_3.2.1                Seurat_5.1.0               
##  [3] SeuratObject_5.0.2          sp_2.1-4                   
##  [5] ggsci_3.2.0                 scDotPlot_1.1.0            
##  [7] AnnotationDbi_1.69.0        dplyr_1.1.4                
##  [9] purrr_1.0.2                 scran_1.35.0               
## [11] scuttle_1.17.0              scRNAseq_2.19.1            
## [13] SingleCellExperiment_1.29.0 SummarizedExperiment_1.37.0
## [15] Biobase_2.67.0              GenomicRanges_1.59.0       
## [17] GenomeInfoDb_1.43.0         IRanges_2.41.0             
## [19] S4Vectors_0.45.0            BiocGenerics_0.53.0        
## [21] MatrixGenerics_1.19.0       matrixStats_1.4.1          
## [23] BiocStyle_2.35.0           
## 
## loaded via a namespace (and not attached):
##   [1] fs_1.6.4                 ProtGenerics_1.39.0      spatstat.sparse_3.1-0   
##   [4] bitops_1.0-9             httr_1.4.7               RColorBrewer_1.1-3      
##   [7] tools_4.5.0              sctransform_0.4.1        alabaster.base_1.7.0    
##  [10] utf8_1.2.4               R6_2.5.1                 HDF5Array_1.35.0        
##  [13] uwot_0.2.2               lazyeval_0.2.2           rhdf5filters_1.19.0     
##  [16] withr_3.0.2              gridExtra_2.3            progressr_0.15.0        
##  [19] cli_3.6.3                spatstat.explore_3.3-3   fastDummies_1.7.4       
##  [22] labeling_0.4.3           alabaster.se_1.7.0       sass_0.4.9              
##  [25] spatstat.data_3.1-2      ggridges_0.5.6           pbapply_1.7-2           
##  [28] yulab.utils_0.1.7        Rsamtools_2.23.0         scater_1.35.0           
##  [31] parallelly_1.38.0        limma_3.63.0             RSQLite_2.3.7           
##  [34] gridGraphics_0.5-1       generics_0.1.3           BiocIO_1.17.0           
##  [37] spatstat.random_3.3-2    ica_1.0-3                Matrix_1.7-1            
##  [40] ggbeeswarm_0.7.2         fansi_1.0.6              abind_1.4-8             
##  [43] lifecycle_1.0.4          yaml_2.3.10              edgeR_4.5.0             
##  [46] rhdf5_2.51.0             SparseArray_1.7.0        BiocFileCache_2.15.0    
##  [49] Rtsne_0.17               grid_4.5.0               blob_1.2.4              
##  [52] promises_1.3.0           dqrng_0.4.1              ExperimentHub_2.15.0    
##  [55] crayon_1.5.3             miniUI_0.1.1.1           lattice_0.22-6          
##  [58] beachmat_2.23.0          cowplot_1.1.3            GenomicFeatures_1.59.0  
##  [61] KEGGREST_1.47.0          pillar_1.9.0             knitr_1.48              
##  [64] metapod_1.15.0           rjson_0.2.23             future.apply_1.11.3     
##  [67] codetools_0.2-20         leiden_0.4.3.1           glue_1.8.0              
##  [70] ggfun_0.1.7              spatstat.univar_3.0-1    data.table_1.16.2       
##  [73] treeio_1.31.0            vctrs_0.6.5              png_0.1-8               
##  [76] gypsum_1.3.0             spam_2.11-0              gtable_0.3.6            
##  [79] cachem_1.1.0             xfun_0.48                S4Arrays_1.7.0          
##  [82] mime_0.12                survival_3.7-0           statmod_1.5.0           
##  [85] bluster_1.17.0           fitdistrplus_1.2-1       ROCR_1.0-11             
##  [88] nlme_3.1-166             ggtree_3.15.0            bit64_4.5.2             
##  [91] alabaster.ranges_1.7.0   filelock_1.0.3           RcppAnnoy_0.0.22        
##  [94] bslib_0.8.0              irlba_2.3.5.1            vipor_0.4.7             
##  [97] KernSmooth_2.23-24       colorspace_2.1-1         DBI_1.2.3               
## [100] tidyselect_1.2.1         bit_4.5.0                compiler_4.5.0          
## [103] curl_5.2.3               httr2_1.0.5              BiocNeighbors_2.1.0     
## [106] DelayedArray_0.33.0      plotly_4.10.4            bookdown_0.41           
## [109] rtracklayer_1.67.0       scales_1.3.0             lmtest_0.9-40           
## [112] rappdirs_0.3.3           goftest_1.2-3            stringr_1.5.1           
## [115] digest_0.6.37            spatstat.utils_3.1-0     alabaster.matrix_1.7.0  
## [118] rmarkdown_2.28           XVector_0.47.0           htmltools_0.5.8.1       
## [121] pkgconfig_2.0.3          highr_0.11               dbplyr_2.5.0            
## [124] fastmap_1.2.0            ensembldb_2.31.0         rlang_1.1.4             
## [127] htmlwidgets_1.6.4        UCSC.utils_1.3.0         shiny_1.9.1             
## [130] farver_2.1.2             jquerylib_0.1.4          zoo_1.8-12              
## [133] jsonlite_1.8.9           BiocParallel_1.41.0      BiocSingular_1.23.0     
## [136] RCurl_1.98-1.16          magrittr_2.0.3           ggplotify_0.1.2         
## [139] GenomeInfoDbData_1.2.13  dotCall64_1.2            patchwork_1.3.0         
## [142] Rhdf5lib_1.29.0          munsell_0.5.1            Rcpp_1.0.13             
## [145] viridis_0.6.5            ape_5.8                  reticulate_1.39.0       
## [148] stringi_1.8.4            alabaster.schemas_1.7.0  zlibbioc_1.53.0         
## [151] MASS_7.3-61              AnnotationHub_3.15.0     plyr_1.8.9              
## [154] parallel_4.5.0           listenv_0.9.1            ggrepel_0.9.6           
## [157] deldir_2.0-4             Biostrings_2.75.0        splines_4.5.0           
## [160] tensor_1.5               locfit_1.5-9.10          igraph_2.1.1            
## [163] spatstat.geom_3.3-3      RcppHNSW_0.6.0           reshape2_1.4.4          
## [166] ScaledMatrix_1.15.0      BiocVersion_3.21.1       XML_3.99-0.17           
## [169] evaluate_1.0.1           BiocManager_1.30.25      httpuv_1.6.15           
## [172] RANN_2.6.2               tidyr_1.3.1              polyclip_1.10-7         
## [175] future_1.34.0            scattermore_1.2          alabaster.sce_1.7.0     
## [178] ggplot2_3.5.1            rsvd_1.0.5               xtable_1.8-4            
## [181] restfulr_0.0.15          AnnotationFilter_1.31.0  tidytree_0.4.6          
## [184] RSpectra_0.16-2          later_1.3.2              viridisLite_0.4.2       
## [187] aplot_0.2.3              beeswarm_0.4.0           memoise_2.0.1           
## [190] GenomicAlignments_1.43.0 cluster_2.1.6            globals_0.16.3