Contents

1 Introduction

Differential expression (DE) analysis, aiming to identify genes with significant expression changes across conditions, provides insights into molecular mechanisms underlying aging, disease, and other biological processes. Traditional DE methods primarily detect changes in centrality (e.g., mean or median), but often lack power against alternative hypotheses characterized by changes in spread (e.g. variance or dispersion). Variance shifts, however, are critical in understanding regulatory dynamics and stochasticity in gene expression, particularly in contexts like aging and cellular differentiation. Moreover, in DE analysis, there is often a trade-off between statistical power and control over the false discovery rate (FDR): parametric approaches may inflate FDRs, while nonparametric methods frequently lack sufficient power.

The QRscore package addresses these two limitations by providing a robust framework for two-sample and K-sample tests that detect shifts in both centrality and spread. Built upon rigorous theoretical foundations,QRscore extends the Mann-Whitney test to an adaptive, rank-based approach that combines non-parametric tests with weights informed by (zero-inflated) negative binomial models, ensuring both high power and strictly controlled FDR.

This package is designed to complement existing tools in Bioconductor by offering enhanced capabilities for detecting distributional shifts. By integrating with widely-used Bioconductor packages such as DESeq2 and leveraging parallelization (BiocParallel), QRscore seamlessly integrates into genomics workflows for differential expression and differential dispersion analysis. This vignette demonstrates the utility of QRscore through a detailed example.

2 Installation

Check dependencies are installed.

if (!requireNamespace("BiocManager", quietly = TRUE)) {
    install.packages("BiocManager")
}
if (!requireNamespace("DESeq2", quietly = TRUE)) {
    BiocManager::install("DESeq2")
}

You can install the QRscore package from Bioconductor with the following command:

if (!requireNamespace("QRscore", quietly = TRUE)) {
    BiocManager::install("QRscore")
}

Alternatively, you can install the QRscore package from github:

if (!requireNamespace("devtools", quietly = TRUE)) {
    install.packages("devtools")
}
devtools::install_github("Fanding-Zhou/QRscore")

3 Overview

This vignette illustrates how to:

  1. Preprocess and normalize bulk RNA-seq data.

  2. Perform two-sample and three-sample differential tests using QRscore.

  3. Obtain results for DEGs and DDGs.

4 Analysis

Here are detailed steps of using the package for analysis.

4.1 Package Loading

library(QRscore)
library(DESeq2)

4.2 Load the data

The example dataset contains 3000 randomly selected genes in RNA-seq counts data from whole blood samples in the GTEx project, along with associated metadata for age groups.

data("example_dataset_raw_3000_genes")

Age groups are aggregated for larger sample size and simplified analysis.

bulk_sparse_mat <- example_dataset_raw_3000_genes$COUNTS
ages <- example_dataset_raw_3000_genes$METADATA$AGE
ages[ages %in% c("20-29", "30-39")] <- "20-39"
ages[ages %in% c("40-49", "50-59")] <- "40-59"
ages[ages %in% c("60-69", "70-79")] <- "60-79"

4.3 Data Prefiltering and Normalization

  • Genes with low expression or high dropout rates are excluded.

  • The DESeq2 package is used to normalize the filtered gene expression matrix.

col_means <- colMeans(bulk_sparse_mat, na.rm = TRUE)
col_zeros <- colMeans(bulk_sparse_mat == 0, na.rm = TRUE)
# The following threshold can be modified
col_ids <- which(col_means > 5 & col_zeros < 0.2)

bulk_df <- bulk_sparse_mat[, col_ids]
bulk_df_inv <- t(bulk_df)
coldata <- data.frame(age = ages)
dds <- DESeqDataSetFromMatrix(
    countData = bulk_df_inv,
    colData = coldata,
    design = ~age
)
dds <- estimateSizeFactors(dds)
normalized_mat <- counts(dds, normalized = TRUE)
print("Number of kept genes after prefiltering:")
#> [1] "Number of kept genes after prefiltering:"
print(length(col_ids))
#> [1] 1213

4.4 Two-sample test

4.4.1 Define Age Groups for Testing

This example compares the 20-39 and 60-79 age groups.

## define age groups and subset dataset
kept_samples <- coldata$age %in% c("20-39", "60-79")
normalized_mat_1 <- normalized_mat[, kept_samples]
coldata_1 <- coldata[kept_samples, ]

4.4.2 Run QRscore Test

Both mean and dispersion shifts are tested in parallel. The outputs includes ranked p-values of differential variance and differential mean tests for all genes, together with log ratio of mean shifts and variance shifts.

results_2_sample <- QRscoreGenetest(normalized_mat_1,
    coldata_1,
    pairwise_test = TRUE,
    pairwise_logFC = TRUE, test_mean = TRUE,
    test_dispersion = TRUE, num_cores = 2,
    approx = "asymptotic"
)

4.4.3 Differentially Expressed Genes (DEGs)

Genes with significant mean shifts are identified.

results_2_sample_mean <- results_2_sample$mean_test
results_2_sample_DEG <-
    results_2_sample_mean[
        results_2_sample_mean$QRscore_Mean_adj_p_value < 0.05,
    ]
head(results_2_sample_DEG)
#>                    QRscore_Mean_p_value QRscore_Mean_adj_p_value
#> ENSG00000272455.1          1.374823e-19             1.667660e-16
#> ENSG00000261326.2          9.497493e-19             4.800520e-16
#> ENSG00000249087.6          1.187268e-18             4.800520e-16
#> ENSG00000271895.2          1.857504e-18             5.632881e-16
#> ENSG00000116205.12         5.029803e-18             1.220230e-15
#> ENSG00000163155.11         7.198495e-18             1.367567e-15
#>                    Log_FC_Mean_60-79_vs_20-39
#> ENSG00000272455.1                  -1.0379163
#> ENSG00000261326.2                  -0.9286823
#> ENSG00000249087.6                  -0.8174845
#> ENSG00000271895.2                  -0.8803063
#> ENSG00000116205.12                 -0.7140029
#> ENSG00000163155.11                 -1.1847951

4.4.4 Differentially Dispersed Genes (DDGs)

Genes with significant variance shifts are identified.

results_2_sample_var <- results_2_sample$var_test
results_2_sample_DDG <-
    results_2_sample_var[results_2_sample_var$QRscore_Var_adj_p_value < 0.05, ]
head(results_2_sample_DDG)
#>                    QRscore_Var_p_value QRscore_Var_adj_p_value
#> ENSG00000162692.10        7.876529e-13            9.554229e-10
#> ENSG00000177606.6         1.536161e-10            9.316815e-08
#> ENSG00000189409.13        5.901446e-10            2.037544e-07
#> ENSG00000162383.11        8.134168e-10            2.037544e-07
#> ENSG00000188290.10        8.398778e-10            2.037544e-07
#> ENSG00000121753.12        8.647941e-09            1.624944e-06
#>                    Log_FC_Var_60-79_vs_20-39
#> ENSG00000162692.10                 2.4332081
#> ENSG00000177606.6                  0.2094172
#> ENSG00000189409.13                 0.8072178
#> ENSG00000162383.11                 2.1536069
#> ENSG00000188290.10                 0.3225796
#> ENSG00000121753.12                -2.9834417

4.5 Three-Sample Test

This example compares the 20-39, 40-59, and 60-79 age groups.

The output includes 3-sample test p-values as well as pairwise fold changes. To get a more comprehensive result, one can set pairwise_test = TRUE to additionally obtain the pairwise test p-values.

results_3_sample <- QRscoreGenetest(normalized_mat, coldata$age,
    pairwise_test = FALSE,
    pairwise_logFC = TRUE, test_mean = TRUE,
    test_dispersion = TRUE, num_cores = 2,
    approx = "asymptotic"
)

For detecting DDGs and DEGs, it’s recommended to use 3-sample test p-values (namely QRscore_Mean_adj_p_value and QRscore_Var_adj_p_value) with certain cutoffs (e.g. 0.05).

### DEGs
results_3_sample_mean <- results_3_sample$mean_test
results_3_sample_DEG <- results_3_sample_mean[
    results_3_sample_mean$QRscore_Mean_adj_p_value < 0.05,
]
head(results_3_sample_DEG)
#>                    QRscore_Mean_p_value QRscore_Mean_adj_p_value
#> ENSG00000272455.1          2.729276e-19             2.544765e-16
#> ENSG00000116205.12         4.822300e-19             2.544765e-16
#> ENSG00000271895.2          6.293732e-19             2.544765e-16
#> ENSG00000261326.2          1.099706e-18             3.334857e-16
#> ENSG00000272078.1          2.464001e-18             5.977667e-16
#> ENSG00000162526.6          3.368610e-18             6.810207e-16
#>                    Log_FC_Mean_40-59_vs_20-39 Log_FC_Mean_60-79_vs_20-39
#> ENSG00000272455.1                  -0.4483026                 -1.0379163
#> ENSG00000116205.12                 -0.3581835                 -0.7140029
#> ENSG00000271895.2                  -0.4010168                 -0.8803063
#> ENSG00000261326.2                  -0.3939637                 -0.9286823
#> ENSG00000272078.1                   0.6447943                  1.1566804
#> ENSG00000162526.6                  -0.3306444                 -0.6571025
#>                    Log_FC_Mean_60-79_vs_40-59
#> ENSG00000272455.1                  -0.5896137
#> ENSG00000116205.12                 -0.3558193
#> ENSG00000271895.2                  -0.4792895
#> ENSG00000261326.2                  -0.5347186
#> ENSG00000272078.1                   0.5118861
#> ENSG00000162526.6                  -0.3264581
### DDGs
results_3_sample_var <- results_3_sample$var_test
results_3_sample_DDG <-
    results_3_sample_var[results_3_sample_var$QRscore_Var_adj_p_value < 0.05, ]
head(results_3_sample_DDG)
#>                    QRscore_Var_p_value QRscore_Var_adj_p_value
#> ENSG00000189409.13        5.080165e-10            2.334420e-07
#> ENSG00000162692.10        5.207999e-10            2.334420e-07
#> ENSG00000177606.6         5.773504e-10            2.334420e-07
#> ENSG00000188290.10        1.737933e-08            5.270282e-06
#> ENSG00000162383.11        2.226853e-08            5.402345e-06
#> ENSG00000233184.6         5.256418e-08            1.062672e-05
#>                    Log_FC_Var_40-59_vs_20-39 Log_FC_Var_60-79_vs_20-39
#> ENSG00000189409.13                 0.4795484                 0.8072178
#> ENSG00000162692.10                 0.3628677                 2.4332081
#> ENSG00000177606.6                  0.3483889                 0.2094172
#> ENSG00000188290.10                -1.8237979                 0.3225796
#> ENSG00000162383.11                 1.8681958                 2.1536069
#> ENSG00000233184.6                  0.1407786                 0.4660539
#>                    Log_FC_Var_60-79_vs_40-59
#> ENSG00000189409.13                 0.3276693
#> ENSG00000162692.10                 2.0703404
#> ENSG00000177606.6                 -0.1389717
#> ENSG00000188290.10                 2.1463775
#> ENSG00000162383.11                 0.2854110
#> ENSG00000233184.6                  0.3252753

5 Session Info

sessionInfo()
#> R version 4.5.0 RC (2025-04-04 r88126)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.2 LTS
#> 
#> Matrix products: default
#> BLAS:   /home/biocbuild/bbs-3.21-bioc/R/lib/libRblas.so 
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0  LAPACK version 3.12.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_GB              LC_COLLATE=C              
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: America/New_York
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats4    stats     graphics  grDevices utils     datasets  methods  
#> [8] base     
#> 
#> other attached packages:
#>  [1] DESeq2_1.48.0               SummarizedExperiment_1.38.0
#>  [3] Biobase_2.68.0              MatrixGenerics_1.20.0      
#>  [5] matrixStats_1.5.0           GenomicRanges_1.60.0       
#>  [7] GenomeInfoDb_1.44.0         IRanges_2.42.0             
#>  [9] S4Vectors_0.46.0            BiocGenerics_0.54.0        
#> [11] generics_0.1.3              QRscore_1.0.0              
#> [13] BiocStyle_2.36.0           
#> 
#> loaded via a namespace (and not attached):
#>  [1] gtable_0.3.6            xfun_0.52               bslib_0.9.0            
#>  [4] ggplot2_3.5.2           lattice_0.22-7          vctrs_0.6.5            
#>  [7] tools_4.5.0             parallel_4.5.0          tibble_3.2.1           
#> [10] pkgconfig_2.0.3         Matrix_1.7-3            arrangements_1.1.9     
#> [13] assertthat_0.2.1        lifecycle_1.0.4         GenomeInfoDbData_1.2.14
#> [16] compiler_4.5.0          munsell_0.5.1           codetools_0.2-20       
#> [19] htmltools_0.5.8.1       sass_0.4.10             yaml_2.3.10            
#> [22] gmp_0.7-5               pillar_1.10.2           crayon_1.5.3           
#> [25] jquerylib_0.1.4         MASS_7.3-65             BiocParallel_1.42.0    
#> [28] cachem_1.1.0            DelayedArray_0.34.0     hitandrun_0.5-6        
#> [31] abind_1.4-8             locfit_1.5-9.12         tidyselect_1.2.1       
#> [34] digest_0.6.37           dplyr_1.1.4             bookdown_0.43          
#> [37] fastmap_1.2.0           grid_4.5.0              colorspace_2.1-1       
#> [40] cli_3.6.4               SparseArray_1.8.0       magrittr_2.0.3         
#> [43] S4Arrays_1.8.0          UCSC.utils_1.4.0        scales_1.3.0           
#> [46] rmarkdown_2.29          XVector_0.48.0          httr_1.4.7             
#> [49] evaluate_1.0.3          knitr_1.50              pscl_1.5.9             
#> [52] rlang_1.1.6             Rcpp_1.0.14             glue_1.8.0             
#> [55] BiocManager_1.30.25     jsonlite_2.0.0          R6_2.6.1