Breast cancer (BC) is a highly heterogeneous disease characterized by distinct molecular intrinsic subtypes (IS) with unique clinical, biological, and prognostic profiles. These subtypes—such as Luminal A, Luminal B, HER2-Enriched, Basal-like, and Normal-like—are instrumental in guiding treatment strategies and prognostic evaluations. While clinical assays like Prosigna® provide standardized subtyping for patient care, the research community still lacks consensus due to fragmented methods and difficulties adapting them across diverse datasets. This inconsistency undermines the reproducibility and reliability of scientific findings.
Current methods, such as the original PAM50 (Parker et al., J Clin Oncol, 2009) and AIMS (Paquet et al., J Natl Cancer Inst, 2015), have significantly advanced BC subtyping but suffer from challenges such as limited adaptability to varying datasets. These limitations often lead to difficulties in reproducing results across independent studies, especially when datasets come from different platforms or research environments. Furthermore, there is no centralized, accessible framework that integrates multiple subtyping methods with a focus on consistency and reliability.
Existing IS tools, like the BiocStyle::Biocpkg("genefu")
package, are limited to a small subset of PAM50 variations and BiocStyle::Biocpkg("AIMS")
, restricting their use to a narrow range of studies. High-performing methods, such as subgroup-specific gene-centering (ssBC), perform well across various datasets but are not readily available as R packages; instead, they are distributed as standalone scripts. This makes it difficult for many researchers, particularly those without advanced computational skills, to implement these methods. Additionally, traditional IHC-based strategies, such as the conventional estrogen receptor (ER)-balancing via immunohistochemistry (cIHC), remain inaccessible to most, limiting adoption without specialized expertise.
To address these challenges, BreastSubtypeR was developed as a comprehensive solution. This R package integrates multiple molecular subtyping methods into a single, cohesive framework. By doing so, it allows researchers to perform robust, reproducible subtyping analyses on BC datasets of various sizes and platforms. The inclusion of the AUTO mode enables the dynamic selection of the most appropriate method based on the dataset’s characteristics, improving adaptability and accuracy. Furthermore, BreastSubtypeR incorporates optimized gene mapping techniques to overcome inconsistencies in gene sets, further enhancing reproducibility.
Importantly, BreastSubtypeR is designed to be an accessible tool. The package includes an interactive Shiny app (iBreastSubtypeR), offering a user-friendly interface for both bioinformaticians and researchers with limited R programming experience. This makes subtyping analyses more accessible to researchers across diverse fields, from bioinformatics to clinical research, without requiring deep technical knowledge of the underlying methods. By bridging the gap between computational expertise and clinical application, BreastSubtypeR facilitates BC research and ultimately contributes to advancing our understanding of this complex disease.
Approach | Description | Group | Citation |
---|---|---|---|
parker.original |
Original PAM50 by Parker et al., 2009 | NC-based | Parker et al., 2009 |
genefu.scale |
PAM50 implementation as in the genefu R package (scaled version) | NC-based | Gendoo et al., 2016 |
genefu.robust |
PAM50 implementation as in the genefu R package (robust version) | NC-based | Gendoo et al., 2016 |
cIHC |
Conventional estrogen receptor (ER)-balancing via immunohistochemistry (IHC) | NC-based | Ciriello et al., 2015 |
cIHC.itr |
Iterative version of cIHC | NC-based | Curtis et al., 2012 |
PCAPAM50 |
PCA-based iterative PAM50 (ER-balancing using ESR1 gene expression) | NC-based | Raj-Kumar et al., 2019 |
ssBC |
Subgroup-specific gene-centering PAM50 | NC-based | Zhao et al., 2015 |
ssBC.v2 |
Updated subgroup-specific gene-centering PAM50 with refined quantiles | NC-based | Fernandez-Martinez et al., 2020 |
AIMS |
Absolute Intrinsic Molecular Subtyping (AIMS) method | SSP-based | Paquet & Hallett, 2015 |
sspbc |
Single-Sample Predictors for Breast Cancer (AIMS adaptation) | SSP-based | Staaf et al., 2022 |
Approach | Description |
---|---|
User-defined Multi-Method | Allows users to select multiple subtyping methods for comparative analysis. |
AUTO Mode Multi-Method | Automatically selects subtyping methods based on the ER/HER2 distribution of the test cohort. |
To install BreastSubtypeR from Biocondunctor, run:
if (!require("BiocManager", quietly = TRUE)) {
install.packages("BiocManager")
}
BiocManager::install("BreastSubtypeR")
To install BreastSubtypeR from GitHub, run:
# Install devtools package if you haven't already
install.packages("devtools")
# Install BreastSubtypeR from GitHub
devtools::install_github("yqkiuo/BreastSubtypeR")
Here’s an example of how to use BreastSubtypeR for multi-method breast cancer subtyping. The user manually selects the methods to be used:
library(BreastSubtypeR)
# Load example data
data("BreastSubtypeRobj")
data("OSLO2EMIT0obj")
# Perform gene mapping before subtyping
data_input <- Mapping(OSLO2EMIT0obj$se_obj, method = "max", impute = TRUE, verbose = FALSE)
# Perform multi-method subtyping
methods <- c("parker.original", "PCAPAM50", "sspbc")
result <- BS_Multi(
data_input = data_input,
methods = methods,
Subtype = FALSE,
hasClinical = FALSE
)
## parker.original is running!
## PCAPAM50 is running!
## sspbc is running!
## Current k = 24
# View the results
head(result$res_subtypes[, 1:min(5, ncol(result$res_subtypes))], 5)
## parker.original PCAPAM50 sspbc entropy
## OSLO2EMIT0.001 LumA LumA LumB 0.9182958
## OSLO2EMIT0.002 Basal Basal Basal 0.0000000
## OSLO2EMIT0.003 LumA LumA LumA 0.0000000
## OSLO2EMIT0.004 LumA LumA LumA 0.0000000
## OSLO2EMIT0.005 Normal LumA Normal 0.9182958
# Visualize results
plot <- Vis_Multi(result$res_subtypes)
plot(plot)
Here’s how to use BreastSubtypeR for multi-method subtyping with AUTO mode:
library(BreastSubtypeR)
# Load example data
data("BreastSubtypeRobj")
data("OSLO2EMIT0obj")
# Perform gene mapping before subtyping
data_input <- Mapping(OSLO2EMIT0obj$se_obj, method = "max", impute = TRUE, verbose = FALSE)
# Run subtyping with AUTO mode
result <- BS_Multi(
data_input = data_input,
methods = "AUTO",
Subtype = FALSE,
hasClinical = FALSE
)
## Running AUTO mode for subtyping.
## The ER+/ER- ratio in the current dataset differs from that observed in the UNC232 training cohort.
## Running methods:
## genefu.robust, ssBC, ssBC.v2, cIHC, cIHC.itr, PCAPAM50, AIMS & sspbc
## ssBC for samples: ERpos, ERneg
## ssBC.v2 for samples: ERnegHER2neg, ERposHER2neg
## genefu.robust is running!
## ssBC is running!
## ssBC.v2 is running!
## cIHC is running!
## cIHC.itr is running!
## PCAPAM50 is running!
## AIMS is running!
## Current k = 20
## sspbc is running!
## Current k = 24
# View the results
head(result$res_subtypes[, 1:min(5, ncol(result$res_subtypes))], 5)
## genefu.robust ssBC ssBC.v2 cIHC cIHC.itr
## OSLO2EMIT0.001 LumA LumA LumA LumA LumA
## OSLO2EMIT0.002 Basal Basal Basal Basal Basal
## OSLO2EMIT0.003 LumA LumB LumA LumA LumA
## OSLO2EMIT0.004 LumA LumA LumA LumA LumA
## OSLO2EMIT0.005 LumA LumA Normal LumA LumA
# Visualize results
plot <- Vis_Multi(result$res_subtypes)
plot(plot)
For using BreastSubtypeR with the parker.original
method:
library(BreastSubtypeR)
# Load example data
data("BreastSubtypeRobj")
data("OSLO2EMIT0obj")
# Perform subtyping with the `parker.original` method
res <- BS_parker(
se_obj = OSLO2EMIT0obj$data_input$se_NC,
calibration = "Internal",
internal = "medianCtr",
Subtype = FALSE,
hasClinical = FALSE
)
For using BreastSubtypeR with the AIMS
method:
library(BreastSubtypeR)
# Load example data
data("BreastSubtypeRobj")
data("OSLO2EMIT0obj")
# Perform subtyping with the `AIMS` method
res <- BS_AIMS(OSLO2EMIT0obj$data_input$se_SSP)
## Current k = 20
Approach | Usage |
---|---|
parker.original |
BS_parker(calibration = "Internal", internal = "medianCtr", ...) |
genefu.scale |
BS_parker(calibration = "Internal", internal = "meanCtr", ...) |
genefu.robust |
BS_parker(calibration = "Internal", internal = "qCtr", ...) |
cIHC |
BS_cIHC(...) |
cIHC.itr |
BS_cIHC.itr(...) |
PCAPAM50 |
BS_PCAPAM50(...) |
ssBC |
BS_ssBC(s = "ER", ...) |
ssBC.v2 |
BS_ssBC(s = "ER.v2", ...) |
AIMS |
BS_AIMS(...) |
sspbc |
BS_sspbc(...) |
Mode | Usage |
---|---|
User-defined | BS_Multi(methods = c("parker.original", "ssBC.v2", "sspbc", ...), ...) |
AUTO Mode | BS_Multi(methods = "AUTO", ...) |
For users new to R, we offer an intuitive Shiny app for interactive molecular subtyping.
To run iBreastSubtypeR locally with your data, first install and load the package as described above. Afterward, you can interactively access the Shiny app to visualize and analyze your dataset. Here’s an example of how to launch it:
# Launch iBreastSubtypeR for interactive analysis
library(BreastSubtypeR)
library(tidyverse)
library(shiny)
library(bslib)
iBreastSubtypeR()
The Shiny app allows you to:
- Upload gene expression, clinical, and annotation data.
- Perform subtyping using a preferred method.
- Visualize the results in real-time.
- Download results directly to your local machine.
We welcome contributions to the package. If you find any bugs or have feature requests, feel free to open an issue here.
If you use BreastSubtypeR in your work, please cite:
library(BreastSubtypeR)
sessionInfo()
## R version 4.5.0 RC (2025-04-04 r88126)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.2 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.21-bioc/R/lib/libRblas.so
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0 LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: America/New_York
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] BreastSubtypeR_1.0.0 BiocStyle_2.36.0
##
## loaded via a namespace (and not attached):
## [1] SummarizedExperiment_1.38.0 gtable_0.3.6
## [3] impute_1.82.0 circlize_0.4.16
## [5] shape_1.4.6.1 rjson_0.2.23
## [7] xfun_0.52 bslib_0.9.0
## [9] ggplot2_3.5.2 GlobalOptions_0.1.2
## [11] ggrepel_0.9.6 lattice_0.22-7
## [13] Biobase_2.68.0 Cairo_1.6-2
## [15] vctrs_0.6.5 tools_4.5.0
## [17] generics_0.1.3 stats4_4.5.0
## [19] parallel_4.5.0 tibble_3.2.1
## [21] proxy_0.4-27 cluster_2.1.8.1
## [23] pkgconfig_2.0.3 Matrix_1.7-3
## [25] data.table_1.17.0 RColorBrewer_1.1-3
## [27] S4Vectors_0.46.0 lifecycle_1.0.4
## [29] GenomeInfoDbData_1.2.14 compiler_4.5.0
## [31] stringr_1.5.1 tinytex_0.57
## [33] munsell_0.5.1 codetools_0.2-20
## [35] ComplexHeatmap_2.24.0 clue_0.3-66
## [37] GenomeInfoDb_1.44.0 htmltools_0.5.8.1
## [39] class_7.3-23 sass_0.4.10
## [41] yaml_2.3.10 pillar_1.10.2
## [43] crayon_1.5.3 jquerylib_0.1.4
## [45] DelayedArray_0.34.0 cachem_1.1.0
## [47] magick_2.8.6 iterators_1.0.14
## [49] abind_1.4-8 foreach_1.5.2
## [51] tidyselect_1.2.1 digest_0.6.37
## [53] stringi_1.8.7 dplyr_1.1.4
## [55] bookdown_0.43 fastmap_1.2.0
## [57] grid_4.5.0 SparseArray_1.8.0
## [59] colorspace_2.1-1 cli_3.6.4
## [61] magrittr_2.0.3 S4Arrays_1.8.0
## [63] e1071_1.7-16 withr_3.0.2
## [65] scales_1.3.0 UCSC.utils_1.4.0
## [67] XVector_0.48.0 rmarkdown_2.29
## [69] httr_1.4.7 matrixStats_1.5.0
## [71] png_0.1-8 GetoptLong_1.0.5
## [73] evaluate_1.0.3 knitr_1.50
## [75] GenomicRanges_1.60.0 IRanges_2.42.0
## [77] doParallel_1.0.17 rlang_1.1.6
## [79] Rcpp_1.0.14 glue_1.8.0
## [81] BiocManager_1.30.25 BiocGenerics_0.54.0
## [83] jsonlite_2.0.0 R6_2.6.1
## [85] MatrixGenerics_1.20.0