Differential expression (DE) analysis, aiming to identify genes with significant expression changes across conditions, provides insights into molecular mechanisms underlying aging, disease, and other biological processes. Traditional DE methods primarily detect changes in centrality (e.g., mean or median), but often lack power against alternative hypotheses characterized by changes in spread (e.g. variance or dispersion). Variance shifts, however, are critical in understanding regulatory dynamics and stochasticity in gene expression, particularly in contexts like aging and cellular differentiation. Moreover, in DE analysis, there is often a trade-off between statistical power and control over the false discovery rate (FDR): parametric approaches may inflate FDRs, while nonparametric methods frequently lack sufficient power.
The QRscore
package addresses these two limitations by providing a robust
framework for two-sample and K-sample tests that detect shifts in both
centrality and spread. Built upon rigorous theoretical foundations,QRscore
extends the Mann-Whitney test to an adaptive, rank-based approach that combines
non-parametric tests with weights informed by (zero-inflated) negative binomial
models, ensuring both high power and strictly controlled FDR.
This package is designed to complement existing tools in Bioconductor by
offering enhanced capabilities for detecting distributional shifts. By
integrating with widely-used Bioconductor packages such as DESeq2
and
leveraging parallelization (BiocParallel
), QRscore
seamlessly integrates
into genomics workflows for differential expression and differential dispersion
analysis. This vignette demonstrates the utility of QRscore
through a
detailed example.
Check dependencies are installed.
if (!requireNamespace("BiocManager", quietly = TRUE)) {
install.packages("BiocManager")
}
if (!requireNamespace("DESeq2", quietly = TRUE)) {
BiocManager::install("DESeq2")
}
You can install the QRscore package from Bioconductor with the following command:
if (!requireNamespace("QRscore", quietly = TRUE)) {
BiocManager::install("QRscore")
}
Alternatively, you can install the QRscore package from github:
if (!requireNamespace("devtools", quietly = TRUE)) {
install.packages("devtools")
}
devtools::install_github("Fanding-Zhou/QRscore")
This vignette illustrates how to:
Preprocess and normalize bulk RNA-seq data.
Perform two-sample and three-sample differential tests using QRscore
.
Obtain results for DEGs and DDGs.
Here are detailed steps of using the package for analysis.
library(QRscore)
library(DESeq2)
The example dataset contains 3000 randomly selected genes in RNA-seq counts data from whole blood samples in the GTEx project, along with associated metadata for age groups.
data("example_dataset_raw_3000_genes")
Age groups are aggregated for larger sample size and simplified analysis.
bulk_sparse_mat <- example_dataset_raw_3000_genes$COUNTS
ages <- example_dataset_raw_3000_genes$METADATA$AGE
ages[ages %in% c("20-29", "30-39")] <- "20-39"
ages[ages %in% c("40-49", "50-59")] <- "40-59"
ages[ages %in% c("60-69", "70-79")] <- "60-79"
Genes with low expression or high dropout rates are excluded.
The DESeq2
package is used to normalize the filtered gene expression matrix.
col_means <- colMeans(bulk_sparse_mat, na.rm = TRUE)
col_zeros <- colMeans(bulk_sparse_mat == 0, na.rm = TRUE)
# The following threshold can be modified
col_ids <- which(col_means > 5 & col_zeros < 0.2)
bulk_df <- bulk_sparse_mat[, col_ids]
bulk_df_inv <- t(bulk_df)
coldata <- data.frame(age = ages)
dds <- DESeqDataSetFromMatrix(
countData = bulk_df_inv,
colData = coldata,
design = ~age
)
dds <- estimateSizeFactors(dds)
normalized_mat <- counts(dds, normalized = TRUE)
print("Number of kept genes after prefiltering:")
#> [1] "Number of kept genes after prefiltering:"
print(length(col_ids))
#> [1] 1213
This example compares the 20-39 and 60-79 age groups.
## define age groups and subset dataset
kept_samples <- coldata$age %in% c("20-39", "60-79")
normalized_mat_1 <- normalized_mat[, kept_samples]
coldata_1 <- coldata[kept_samples, ]
Both mean and dispersion shifts are tested in parallel. The outputs includes ranked p-values of differential variance and differential mean tests for all genes, together with log ratio of mean shifts and variance shifts.
results_2_sample <- QRscoreGenetest(normalized_mat_1,
coldata_1,
pairwise_test = TRUE,
pairwise_logFC = TRUE, test_mean = TRUE,
test_dispersion = TRUE, num_cores = 2,
approx = "asymptotic"
)
Genes with significant mean shifts are identified.
results_2_sample_mean <- results_2_sample$mean_test
results_2_sample_DEG <-
results_2_sample_mean[
results_2_sample_mean$QRscore_Mean_adj_p_value < 0.05,
]
head(results_2_sample_DEG)
#> QRscore_Mean_p_value QRscore_Mean_adj_p_value
#> ENSG00000272455.1 1.374823e-19 1.667660e-16
#> ENSG00000261326.2 9.497493e-19 4.800520e-16
#> ENSG00000249087.6 1.187268e-18 4.800520e-16
#> ENSG00000271895.2 1.857504e-18 5.632881e-16
#> ENSG00000116205.12 5.029803e-18 1.220230e-15
#> ENSG00000163155.11 7.198495e-18 1.367567e-15
#> Log_FC_Mean_60-79_vs_20-39
#> ENSG00000272455.1 -1.0379163
#> ENSG00000261326.2 -0.9286823
#> ENSG00000249087.6 -0.8174845
#> ENSG00000271895.2 -0.8803063
#> ENSG00000116205.12 -0.7140029
#> ENSG00000163155.11 -1.1847951
Genes with significant variance shifts are identified.
results_2_sample_var <- results_2_sample$var_test
results_2_sample_DDG <-
results_2_sample_var[results_2_sample_var$QRscore_Var_adj_p_value < 0.05, ]
head(results_2_sample_DDG)
#> QRscore_Var_p_value QRscore_Var_adj_p_value
#> ENSG00000162692.10 7.876529e-13 9.554229e-10
#> ENSG00000177606.6 1.536161e-10 9.316815e-08
#> ENSG00000189409.13 5.901446e-10 2.037544e-07
#> ENSG00000162383.11 8.134168e-10 2.037544e-07
#> ENSG00000188290.10 8.398778e-10 2.037544e-07
#> ENSG00000121753.12 8.647941e-09 1.624944e-06
#> Log_FC_Var_60-79_vs_20-39
#> ENSG00000162692.10 2.4332081
#> ENSG00000177606.6 0.2094172
#> ENSG00000189409.13 0.8072178
#> ENSG00000162383.11 2.1536069
#> ENSG00000188290.10 0.3225796
#> ENSG00000121753.12 -2.9834417
This example compares the 20-39, 40-59, and 60-79 age groups.
The output includes 3-sample test p-values as well as pairwise fold changes.
To get a more comprehensive result, one can set pairwise_test = TRUE
to
additionally obtain the pairwise test p-values.
results_3_sample <- QRscoreGenetest(normalized_mat, coldata$age,
pairwise_test = FALSE,
pairwise_logFC = TRUE, test_mean = TRUE,
test_dispersion = TRUE, num_cores = 2,
approx = "asymptotic"
)
For detecting DDGs and DEGs, it’s recommended to use 3-sample test p-values
(namely QRscore_Mean_adj_p_value
and QRscore_Var_adj_p_value
) with certain
cutoffs (e.g. 0.05).
### DEGs
results_3_sample_mean <- results_3_sample$mean_test
results_3_sample_DEG <- results_3_sample_mean[
results_3_sample_mean$QRscore_Mean_adj_p_value < 0.05,
]
head(results_3_sample_DEG)
#> QRscore_Mean_p_value QRscore_Mean_adj_p_value
#> ENSG00000272455.1 2.729276e-19 2.544765e-16
#> ENSG00000116205.12 4.822300e-19 2.544765e-16
#> ENSG00000271895.2 6.293732e-19 2.544765e-16
#> ENSG00000261326.2 1.099706e-18 3.334857e-16
#> ENSG00000272078.1 2.464001e-18 5.977667e-16
#> ENSG00000162526.6 3.368610e-18 6.810207e-16
#> Log_FC_Mean_40-59_vs_20-39 Log_FC_Mean_60-79_vs_20-39
#> ENSG00000272455.1 -0.4483026 -1.0379163
#> ENSG00000116205.12 -0.3581835 -0.7140029
#> ENSG00000271895.2 -0.4010168 -0.8803063
#> ENSG00000261326.2 -0.3939637 -0.9286823
#> ENSG00000272078.1 0.6447943 1.1566804
#> ENSG00000162526.6 -0.3306444 -0.6571025
#> Log_FC_Mean_60-79_vs_40-59
#> ENSG00000272455.1 -0.5896137
#> ENSG00000116205.12 -0.3558193
#> ENSG00000271895.2 -0.4792895
#> ENSG00000261326.2 -0.5347186
#> ENSG00000272078.1 0.5118861
#> ENSG00000162526.6 -0.3264581
### DDGs
results_3_sample_var <- results_3_sample$var_test
results_3_sample_DDG <-
results_3_sample_var[results_3_sample_var$QRscore_Var_adj_p_value < 0.05, ]
head(results_3_sample_DDG)
#> QRscore_Var_p_value QRscore_Var_adj_p_value
#> ENSG00000189409.13 5.080165e-10 2.334420e-07
#> ENSG00000162692.10 5.207999e-10 2.334420e-07
#> ENSG00000177606.6 5.773504e-10 2.334420e-07
#> ENSG00000188290.10 1.737933e-08 5.270282e-06
#> ENSG00000162383.11 2.226853e-08 5.402345e-06
#> ENSG00000233184.6 5.256418e-08 1.062672e-05
#> Log_FC_Var_40-59_vs_20-39 Log_FC_Var_60-79_vs_20-39
#> ENSG00000189409.13 0.4795484 0.8072178
#> ENSG00000162692.10 0.3628677 2.4332081
#> ENSG00000177606.6 0.3483889 0.2094172
#> ENSG00000188290.10 -1.8237979 0.3225796
#> ENSG00000162383.11 1.8681958 2.1536069
#> ENSG00000233184.6 0.1407786 0.4660539
#> Log_FC_Var_60-79_vs_40-59
#> ENSG00000189409.13 0.3276693
#> ENSG00000162692.10 2.0703404
#> ENSG00000177606.6 -0.1389717
#> ENSG00000188290.10 2.1463775
#> ENSG00000162383.11 0.2854110
#> ENSG00000233184.6 0.3252753
sessionInfo()
#> R version 4.5.0 RC (2025-04-04 r88126)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.2 LTS
#>
#> Matrix products: default
#> BLAS: /home/biocbuild/bbs-3.21-bioc/R/lib/libRblas.so
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0 LAPACK version 3.12.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_GB LC_COLLATE=C
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: America/New_York
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats4 stats graphics grDevices utils datasets methods
#> [8] base
#>
#> other attached packages:
#> [1] DESeq2_1.48.0 SummarizedExperiment_1.38.0
#> [3] Biobase_2.68.0 MatrixGenerics_1.20.0
#> [5] matrixStats_1.5.0 GenomicRanges_1.60.0
#> [7] GenomeInfoDb_1.44.0 IRanges_2.42.0
#> [9] S4Vectors_0.46.0 BiocGenerics_0.54.0
#> [11] generics_0.1.3 QRscore_1.0.0
#> [13] BiocStyle_2.36.0
#>
#> loaded via a namespace (and not attached):
#> [1] gtable_0.3.6 xfun_0.52 bslib_0.9.0
#> [4] ggplot2_3.5.2 lattice_0.22-7 vctrs_0.6.5
#> [7] tools_4.5.0 parallel_4.5.0 tibble_3.2.1
#> [10] pkgconfig_2.0.3 Matrix_1.7-3 arrangements_1.1.9
#> [13] assertthat_0.2.1 lifecycle_1.0.4 GenomeInfoDbData_1.2.14
#> [16] compiler_4.5.0 munsell_0.5.1 codetools_0.2-20
#> [19] htmltools_0.5.8.1 sass_0.4.10 yaml_2.3.10
#> [22] gmp_0.7-5 pillar_1.10.2 crayon_1.5.3
#> [25] jquerylib_0.1.4 MASS_7.3-65 BiocParallel_1.42.0
#> [28] cachem_1.1.0 DelayedArray_0.34.0 hitandrun_0.5-6
#> [31] abind_1.4-8 locfit_1.5-9.12 tidyselect_1.2.1
#> [34] digest_0.6.37 dplyr_1.1.4 bookdown_0.43
#> [37] fastmap_1.2.0 grid_4.5.0 colorspace_2.1-1
#> [40] cli_3.6.4 SparseArray_1.8.0 magrittr_2.0.3
#> [43] S4Arrays_1.8.0 UCSC.utils_1.4.0 scales_1.3.0
#> [46] rmarkdown_2.29 XVector_0.48.0 httr_1.4.7
#> [49] evaluate_1.0.3 knitr_1.50 pscl_1.5.9
#> [52] rlang_1.1.6 Rcpp_1.0.14 glue_1.8.0
#> [55] BiocManager_1.30.25 jsonlite_2.0.0 R6_2.6.1