1 Introduction

Expression quantitative trait loci (eQTL) analysis links variations in gene expression levels to genotypes. This package, scQTLtools, attempts to identify genetic variants that affect the gene expression at a single-cell level. It also supports cis-eQTL analysis, and provides visualization of the results.

1.1 Rationale for Bioconductor Submission

By seeking inclusion in Bioconductor, we aim to integrate scQTLtools into a well-established ecosystem that is widely used by researchers in bioinformatics. Bioconductor’s rigorous standards for package quality and its focus on reproducibility will enhance the credibility of scQTLtools. Additionally, being part of Bioconductor will provide access to a broader user base and foster collaboration with other developers, contributing to the ongoing improvement and validation of the package.

1.2 Citation

If you find this tool useful, please cite:


https://github.com/XFWu/scQTLtools


2 Installation

if (!require("BiocManager"))
    install.packages("BiocManager")
BiocManager::install("scQTLtools")

3 Overview of the package

The functions in scQTLtools can be categorized into data pre-process, sc-eQTL calling and visualization modules.

3.1 General Workflow

Each module is summarized as shown below.

scQTLtools supports input from both gene expression matrices, Seurat objects, and SingleCellExperiment objects. It offers three data fitting models and enables the analysis of genotype matrices in two forms: ALT and REF, or AA, Aa, and aa. Additionally, the package includes functionality to filter Gene-SNP pairs that are closely positioned in the genome, as nearby SNPs are more likely to influence gene expression. Moreover, visualization at the single-cell level demonstrates the specificity of eQTLs across distinct cell types or cellular states.

3.2 Comparison and advantages compared to similar works

We compared scQTLtools to other packages with similar functionality, including eQTLsingle, SCeQTL, MatrixEQTL, and iBMQ, as shown in the table below.

Among these tools, scQTLtools stands out for its comprehensive features:

  1. scQTLtools accepts SingleCellExperiment objects and Seurat objects as input data formats, which are particularly beneficial for users working with single-cell RNA-seq data, and promote the interoperability with the current Bioconductor ecosystem.

  2. scQTLtools supports both binary and triple classification genotype matrices, enhancing its applicability across different genetic studies.

  3. scQTLtools offers extensive data pre-processing capabilities, including quality control filtering for SNPs and genes, normalization of expression data, and customization of SNP-gene pair distances. This ensures high-quality and well-prepared input data for subsequent analysis.

  4. scQTLtools provides three kinds of fitting models, which cater to various data distributions and analysis needs. This diversity allows users to select the most appropriate model for their specific dataset.

  5. scQTLtools includes a range of visualization tools, these options facilitate detailed exploration and interpretation of eQTL results at the single-cell level.

  6. scQTLtools supports grouping by both cell type and cell state, which is crucial for analyzing the nuanced effects of genetic variants on gene expression within heterogeneous cell populations.

Overall, scQTLtools offers a comprehensive suite of features that enhance the analysis and interpretation of eQTLs.

4 Required input files

The input file requires genotype data, as well as a gene expression matrix or a Seurat object, or a SingleCellExperiment object.

  • gene expression matrix: describes gene expressions, the row names represent gene IDs or SYMBOL and the column names represent cell IDs.
  • Seurat object: a Seurat object, yourSeurat@assays$RNA@data is the gene expression matrix after normalizing.
  • SingleCellExperiment object: a SingleCellExperiment object, assays(yoursce)$counts) is raw gene expression matrix, and assays(sce)$normcountsis the gene expression matrix after normalizing.
  • genotype matrix: A genotype matrix where each row is one variant and each column is one sample, and the scoring method is 0/1/2/3, 0 represents missing values, 1 represents ref/ref, 2 represents alt/alt, and 3 represents ref/alt.

The columns of the genotype matrix should correspond to the columns of the gene expression matrix.

Example

library(scQTLtools)
# gene expression matrix
data(testGene)
# SeuratObject
data(testSeurat)
# load the genotype data
data(testSNP)
data(testSNP2)

5 Create eqtl object

The createQTLObject class is an R object designed to store data related to eQTL analysis, encompassing data lists, result data frames, and slots for biClassify, species, and group information.

Example

eqtl_matrix <- createQTLObject(
    snpMatrix = testSNP,
    genedata = testGene,
    biClassify = FALSE,
    species = 'human',
    group = NULL)

Users can set biClassify to TRUE to change the genotype coding method.

Example

eqtl_matrix_bi <- createQTLObject(
    snpMatrix = testSNP,
    genedata = testGene,
    biClassify = TRUE,
    species = 'human',
    group = NULL)

Users can use Seurat object instead of gene expression matrix.

Example

eqtl_seurat <- createQTLObject(
    snpMatrix = testSNP2,
    genedata = testSeurat,
    biClassify = FALSE,
    species = 'human',
    group = "celltype")

Users can also take SingleCellExperiment object as input.

Example

# create a sce
library(SingleCellExperiment)
sce <- SingleCellExperiment(assays = list(counts = testGene))
eqtl_sce <- createQTLObject(
    snpMatrix = testSNP,
    genedata = sce,
    biClassify = FALSE,
    species = 'human',
    group = NULL)

6 Normalize gene expression matrix

Use normalizeGene() to normalize the gene expression matrix.

Example

eqtl_matrix  <- normalizeGene(
    eQTLObject = eqtl_matrix, 
    method = "logNormalize")
eqtl_sce  <- normalizeGene(
    eQTLObject = eqtl_sce, 
    method = "logNormalize")

7 Identify the valid gene snp pairs

Here we use filterGeneSNP() to filter snp gene pairs.

Example

eqtl_matrix <- filterGeneSNP(
    eQTLObject = eqtl_matrix,
    snpNumOfCellsPercent = 2,
    expressionMin = 0,
    expressionNumOfCellsPercent = 2)
eqtl_seurat <- filterGeneSNP(
    eQTLObject = eqtl_seurat,
    snpNumOfCellsPercent = 2,
    expressionMin = 0,
    expressionNumOfCellsPercent = 2)
eqtl_sce <- filterGeneSNP(
    eQTLObject = eqtl_sce,
    snpNumOfCellsPercent = 2,
    expressionMin = 0,
    expressionNumOfCellsPercent = 2)

8 Call single cell eQTL

Here we use callQTL() to do single cell eQTL analysis.

Example

eqtl1_matrix <- callQTL(
    eQTLObject = eqtl_matrix,
    gene_ids = NULL,
    downstream = NULL,
    upstream = NULL,
    gene_mart = NULL,
    snp_mart = NULL,
    pAdjustMethod = "bonferroni",
    useModel = "linear",
    pAdjustThreshold = 0.05,
    logfcThreshold = 0.1)
eqtl1_seurat <- callQTL(
    eQTLObject = eqtl_seurat,
    gene_ids = NULL,
    downstream = NULL,
    upstream = NULL,
    gene_mart = NULL,
    snp_mart = NULL,
    pAdjustMethod = "bonferroni",
    useModel = "linear",
    pAdjustThreshold = 0.05,
    logfcThreshold = 0.025)
eqtl1_sce <- callQTL(
    eQTLObject = eqtl_sce,
    gene_ids = NULL,
    downstream = NULL,
    upstream = NULL,
    gene_mart = NULL,
    snp_mart = NULL,
    pAdjustMethod = "bonferroni",
    useModel = "linear",
    pAdjustThreshold = 0.05,
    logfcThreshold = 0.025)

Users can use the parameter gene_ids to select one or several genes of interest for identifying sc-eQTLs.

Example

eqtl2_matrix <- callQTL(
    eQTLObject = eqtl_matrix,
    gene_ids = c("CNN2", 
                "RNF113A", 
                "SH3GL1", 
                "INTS13", 
                "PLAU"),
    downstream = NULL,
    upstream = NULL,
    gene_mart = NULL,
    snp_mart = NULL,
    pAdjustMethod = "bonferroni",
    useModel = "poisson",
    pAdjustThreshold = 0.05,
    logfcThreshold = 0.1)

Users can also use upstream and downstream to specify SNPs proximal to the gene in the genome.

Example

eqtl3_matrix <- callQTL(
    eQTLObject = eqtl_matrix,
    gene_ids = NULL,
    downstream = -9e7,
    upstream = 2e8,
    gene_mart = NULL,
    snp_mart = NULL,
    pAdjustMethod = "bonferroni",
    useModel = "poisson",
    pAdjustThreshold = 0.05,
    logfcThreshold = 0.05)

9 Visualize the result.

Here we use visualizeQTL() to visualize the result. There are four types of plots available to visualize sc-eQTL results. Users can choose “histplot”, “violin”, “boxplot”, or “QTLplot”.

Example

violin plot:

visualizeQTL(
    eQTLObject = eqtl1_matrix,
    SNPid = "1:632647",
    Geneid = "RPS27",
    groupName = NULL,
    plottype = "QTLplot",
    removeoutlier = TRUE)

QTLplot:

With cell annotation information of Seurat object or SingleCellExperiment object, we can discover that the eQTL varies across different cell types.

visualizeQTL(
    eQTLObject = eqtl1_seurat,
    SNPid = "1:632647",
    Geneid = "RPS27",
    groupName = NULL,
    plottype = "QTLplot",
    removeoutlier = TRUE)

In addition, the parameter groupName is used to specify a particular single-cell group of interest.

visualizeQTL(
    eQTLObject = eqtl1_seurat,
    SNPid = "1:632647",
    Geneid = "RPS27",
    groupName = "GMP",
    plottype = "QTLplot",
    removeoutlier = TRUE)

boxplot:

visualizeQTL(
    eQTLObject = eqtl1_sce,
    SNPid = "1:632647",
    Geneid = "RPS27",
    groupName = NULL,
    plottype = "boxplot",
    removeoutlier = TRUE)

10 References

Appendix

  1. Miao, Z., Deng, K.e., Wang, X., Zhang, X., Berger, B., 2018. DEsingle for detecting three types of differential expression in single-cell RNA-seq data. Bioinformatics 34 (18), 3223–3224. https://doi.org/10.1093/bioinformatics/bty332
  2. Hu, Y., Xi, X., Yang, Q., Zhang, X., 2020. SCeQTL: An R package for identifying eQTL from single-cell parallel sequencing data. BMC Bioinformatics 21, 1–12. https://doi.org/10.1186/s12859-020-3534-6
  3. Nathan, A., Asgari, S., Ishigaki, K. et al. Single-cell eQTL models reveal dynamic T cell state dependence of disease loci. Nature 606, 120–128 (2022). https://doi.org/10.1038/s41586-022-04713-1
  4. Ma, T. , Li, H. , & Zhang, X. . (2021). Discovering single-cell eQTLs from scRNA-seq data only. Cold Spring Harbor Laboratory.
  5. Schmiedel, B. J., Gonzalez-Colin, C., Fajardo, V., Rocha, J., Madrigal, A., Ramírez-Suástegui, C., Bhattacharyya, S., Simon, H., Greenbaum, J. A., Peters, B., Seumois, G., Ay, F., Chandra, V., & Vijayanand, P. (2022). Single-cell eQTL analysis of activated T cell subsets reveals activation and cell type-dependent effects of disease-risk variants. Science immunology, 7(68), eabm2508. https://doi.org/10.1126/sciimmunol.abm2508
  6. Imholte, G. C. , Marie-Pier, S. B. , Labbe Aurélie,
    Deschepper, C. F. , & Raphael, G. . (2013). Ibmq: a r/bioconductor package for integrated bayesian modeling of eqtl data. Bioinformatics(21), 2797-2798. https://doi.org/10.1093/bioinformatics/btt485

A Session Info

sessionInfo()
## R Under development (unstable) (2025-01-20 r87609)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.21-bioc/R/lib/libRblas.so 
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0  LAPACK version 3.12.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB              LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: America/New_York
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] SingleCellExperiment_1.29.1 SummarizedExperiment_1.37.0
##  [3] Biobase_2.67.0              GenomicRanges_1.59.1       
##  [5] GenomeInfoDb_1.43.4         IRanges_2.41.2             
##  [7] S4Vectors_0.45.2            BiocGenerics_0.53.6        
##  [9] generics_0.1.3              MatrixGenerics_1.19.1      
## [11] matrixStats_1.5.0           scQTLtools_0.99.11         
## [13] BiocStyle_2.35.0           
## 
## loaded via a namespace (and not attached):
##   [1] DBI_1.2.3               httr2_1.1.0             biomaRt_2.63.1         
##   [4] rlang_1.1.5             magrittr_2.0.3          gamlss_5.4-22          
##   [7] compiler_4.5.0          RSQLite_2.3.9           png_0.1-8              
##  [10] vctrs_0.6.5             stringr_1.5.1           pkgconfig_2.0.3        
##  [13] crayon_1.5.3            fastmap_1.2.0           magick_2.8.5           
##  [16] dbplyr_2.5.0            XVector_0.47.2          labeling_0.4.3         
##  [19] rmarkdown_2.29          UCSC.utils_1.3.1        tinytex_0.54           
##  [22] purrr_1.0.2             bit_4.5.0.1             xfun_0.50              
##  [25] cachem_1.1.0            jsonlite_1.8.9          progress_1.2.3         
##  [28] blob_1.2.4              DelayedArray_0.33.4     BiocParallel_1.41.0    
##  [31] VGAM_1.1-12             parallel_4.5.0          prettyunits_1.2.0      
##  [34] R6_2.5.1                RColorBrewer_1.1-3      bslib_0.8.0            
##  [37] stringi_1.8.4           limma_3.63.3            parallelly_1.41.0      
##  [40] jquerylib_0.1.4         GOSemSim_2.33.0         Rcpp_1.0.14            
##  [43] bookdown_0.42           knitr_1.49              future.apply_1.11.3    
##  [46] R.utils_2.12.3          Matrix_1.7-2            splines_4.5.0          
##  [49] tidyselect_1.2.1        abind_1.4-8             yaml_2.3.10            
##  [52] gamlss.dist_6.1-1       codetools_0.2-20        curl_6.2.0             
##  [55] listenv_0.9.1           lattice_0.22-6          tibble_3.2.1           
##  [58] withr_3.0.2             KEGGREST_1.47.0         evaluate_1.0.3         
##  [61] survival_3.8-3          future_1.34.0           BiocFileCache_2.15.1   
##  [64] xml2_1.3.6              Biostrings_2.75.3       filelock_1.0.3         
##  [67] pillar_1.10.1           BiocManager_1.30.25     sp_2.1-4               
##  [70] hms_1.1.3               ggplot2_3.5.1           munsell_0.5.1          
##  [73] scales_1.3.0            globals_0.16.3          glue_1.8.0             
##  [76] tools_4.5.0             locfit_1.5-9.10         fs_1.6.5               
##  [79] dotCall64_1.2           grid_4.5.0              AnnotationDbi_1.69.0   
##  [82] colorspace_2.1-1        patchwork_1.3.0         nlme_3.1-167           
##  [85] GenomeInfoDbData_1.2.13 cli_3.6.3               rappdirs_0.3.3         
##  [88] spam_2.11-1             S4Arrays_1.7.1          dplyr_1.1.4            
##  [91] gtable_0.3.6            R.methodsS3_1.8.2       yulab.utils_0.2.0      
##  [94] DESeq2_1.47.1           sass_0.4.9              digest_0.6.37          
##  [97] gamlss.data_6.0-6       progressr_0.15.1        SparseArray_1.7.4      
## [100] farver_2.1.2            SeuratObject_5.0.2      memoise_2.0.1          
## [103] htmltools_0.5.8.1       R.oo_1.27.0             lifecycle_1.0.4        
## [106] httr_1.4.7              statmod_1.5.0           GO.db_3.20.0           
## [109] MASS_7.3-64             bit64_4.6.0-1