Install the released version from Bioconductor:
# Requires R >= 4.5.0
if (!require("BiocManager", quietly = TRUE)) install.packages("BiocManager")
BiocManager::install("BreastSubtypeR")
If you use BreastSubtypeR, please cite:
For BibTeX/LaTeX, run in R:
citation("BreastSubtypeR")
Breast cancer (BC) is a biologically heterogeneous disease with intrinsic molecular subtypes (e.g., Luminal A, Luminal B, HER2-enriched, Basal-like, Normal-like) that inform biological interpretation and clinical decision-making. While clinical assays such as Prosigna provide standardised subtyping in the clinic, research implementations have proliferated and diverge in pre-processing, gene mapping, and algorithmic assumptions—reducing reproducibility and complicating cross-cohort analyses.
BreastSubtypeR consolidates multiple published gene-expression signature classifiers into a unified, assumption-aware Bioconductor package with: - a unified multi-method API (run many classifiers in one call), - AUTO mode for cohort-aware method selection, - standardised, method-specific pre-processing for multiple input types (raw counts, FPKM, log2-processed arrays), - Entrez ID–based probe/gene mapping, - and a local Shiny app (iBreastSubtypeR
) for non-programmers.
BS_Multi
): execute several classifiers in a single call and compare results side by side.iBreastSubtypeR
): point-and-click analysis; data stay on your machine.SummarizedExperiment
compatibility.The package includes implementations of commonly used subtyping methods (NC-based and SSP-based):
Method id | Short description | Group | Reference |
---|---|---|---|
parker.original |
Original PAM50 by Parker et al., 2009 | NC-based | Parker et al., 2009 |
genefu.scale |
PAM50 implementation as in the genefu R package (scaled version) | NC-based | Gendoo et al., 2016 |
genefu.robust |
PAM50 implementation as in the genefu R package (robust version) | NC-based | Gendoo et al., 2016 |
cIHC |
Conventional ER-balancing using immunohistochemistry (IHC) | NC-based | Ciriello et al., 2015 |
cIHC.itr |
Iterative version of cIHC | NC-based | Curtis et al., 2012 |
PCAPAM50 |
Selects IHC-defined ER subsets, then uses Principal Component Analysis (PCA) to create ESR1 expression-based ER-balancing | NC-based | Raj-Kumar et al., 2019 |
ssBC |
Subgroup-specific gene-centering PAM50 | NC-based | Zhao et al., 2015 |
ssBC.v2 |
Updated subgroup-specific gene-centering PAM50 with refined quantiles | NC-based | Fernandez-Martinez et al., 2020 |
AIMS |
Absolute Intrinsic Molecular Subtyping (AIMS) method | SSP-based | Paquet & Hallett, 2015 |
sspbc |
Single-Sample Predictors for Breast Cancer (AIMS adaptation) | SSP-based | Staaf et al., 2022 |
The examples below use small example datasets shipped with the package. For your own data, provide a SummarizedExperiment
with clinical metadata in colData
(e.g., PatientID
, ER/HER2; for ROR: TSIZE
, NODE
).
library(BreastSubtypeR)
# Example data
data("BreastSubtypeRobj")
data("OSLO2EMIT0obj")
1) Map & prepare (method-specific pre-processing + mapping)
# Pre-processing: automatically apply tailored normalisation, map probes/IDs to Entrez,
# and (optionally) impute missing values
data_input <- Mapping(
OSLO2EMIT0obj$se_obj,
method = "max", # mapping strategy (example)
RawCounts = FALSE,
impute = TRUE,
verbose = FALSE
)
Mapping()
prepares expression inputs for downstream subtyping functions by:
2^x
) for SSP-based methods.2^x
) for SSP-based methods.method
argument),BS_Multi
or single-method callers.?Mapping
for the full parameter list (e.g., RawCounts
, method
, impute
, verbose
) and Methods (Sections 2.3–2.4) in the paper for a complete description of the input/normalisation pipeline.2) Multi-method run (user-defined)
methods <- c("parker.original", "PCAPAM50", "sspbc")
res <- BS_Multi(
data_input = data_input,
methods = methods,
Subtype = FALSE,
hasClinical = FALSE
)
# Per-sample calls (methods × samples)
head(res$res_subtypes, 5)
#> parker.original PCAPAM50 sspbc entropy
#> OSLO2EMIT0.001 LumA LumA LumB 0.9182958
#> OSLO2EMIT0.002 Basal Basal Basal 0.0000000
#> OSLO2EMIT0.003 LumA LumA LumA 0.0000000
#> OSLO2EMIT0.004 LumA LumA LumA 0.0000000
#> OSLO2EMIT0.005 Normal LumA Normal 0.9182958
3) AUTO mode (cohort-aware selection) + visualize
AUTO evaluates cohort diagnostics (for example, ER/HER2 distribution, subtype purity, and subgroup sizes) and selects methods compatible with the cohort. It disables classifiers whose distributional assumptions would likely be violated.
res_auto <- BS_Multi(
data_input = data_input,
methods = "AUTO",
Subtype = FALSE,
hasClinical = FALSE
)
# Visualise multi-method output and concordance
Vis_Multi(res_auto$res_subtypes)
4) Single-method run
PAM50 (NC-based)
res_pam <- BS_parker(
se_obj = data_input$se_NC, # object prepared for NC-based methods
calibration = "Internal",
internal = "medianCtr",
Subtype = FALSE,
hasClinical = FALSE
)
AIMS (SSP-based)
res_aims <- BS_AIMS(data_input$se_SSP)
BreastSubtypeR
routes the supplied input to the appropriate, method-specific pre-processing pipeline automatically — see ?BS_Multi
and Methods (Section 2.3) in the paper for details.AUTO
methods = "AUTO"
(i.e. BS_Multi(methods = "AUTO", ...)
) for exploratory datasets or cohorts of unknown / skewed composition.AUTO
when you want the package to select only classifiers compatible with the cohort (it disables methods whose assumptions appear violated).BS_parker()
).AUTO
is designed to avoid misapplication of NC-based classifiers when cohort assumptions are violated; it does not produce a forced consensus label.For users new to R, we offer an intuitive Shiny app for interactive molecular subtyping.
BreastSubtypeR::iBreastSubtypeR() # interactive GUI (local)
If needed, install UI dependencies and re-run:
install.packages(c("shiny", "bslib"))
The app runs locally; no data leave your machine.
What you can do:
- Upload expression, clinical, and feature-annotation tables (clinical lives in colData
). - Run single methods, or run multiple classifiers at once with BS_Multi
and AUTO
enabled for cohort-aware selection.
- Choose 5-class (incl. Normal-like) or 4-class (AIMS is 5-class only).
- Inspect per-sample concordance (entropy), heatmap and pie summaries.
- Export Calls-only or Full metrics. ROR is available for NC methods when TSIZE
/NODE
are present and numeric.
BreastSubtypeR harmonises many published, signature-based classifiers but has known limitations:
It is not a clinical-grade replacement for assays like Prosigna;
clinical validation requires paired clinical assay data.
AUTO selects compatible methods; it does not perform consensus voting by default.
Yang Q., Hartman J., Sifakis E.G. (2025). BreastSubtypeR: a unified R/Bioconductor package for intrinsic molecular subtyping in breast cancer research. NAR Genomics and Bioinformatics, 7(4):131. https://doi.org/10.1093/nargab/lqaf131
Parker JS, Mullins M, Cheang MCU, Leung S, Voduc D, Vickery T, et al. (2009). Supervised risk predictor of breast cancer based on intrinsic subtypes. J Clin Oncol, 27(8):1160–1167. https://doi.org/10.1200/JCO.2008.18.1370
Gendoo DMA, Ratanasirigulchai N, Schröder MS, Pare L, Parker JS, Prat A, Haibe-Kains B. (2016). Genefu: an R/Bioconductor package for computation of gene expression-based signatures in breast cancer. Bioinformatics, 32(7):1097–1099. https://doi.org/10.1093/bioinformatics/btv693
Ciriello G, Gatza ML, Beck AH, Wilkerson MD, Rhie SK, Pastore A, et al. (2015). Comprehensive molecular portraits of invasive lobular breast cancer. Cell, 163(2):506–519. https://doi.org/10.1016/j.cell.2015.09.033
Curtis C, Shah SP, Chin S-F, Turashvili G, Rueda OM, Dunning MJ, et al. (2012). The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups. Nature, 486:346–352. https://doi.org/10.1038/nature10983
Raj-Kumar PK, Liu J, Hooke JA, Kovatich AJ, Kvecher L, Shriver CD, Hu H. (2019). PCA-PAM50 improves subtype assignment in ER-positive breast cancer. Sci Rep, 9:14386. https://doi.org/10.1038/s41598-019-44339-4
Zhao X, Rodland EA, Tibshirani R, Edvardsen H, Sauer T, Hovig E. (2015). Systematic evaluation of subtype prediction using gene expression profiles and intrinsic subtyping methods. Breast Cancer Res, 17:55. https://doi.org/10.1186/s13058-015-0520-4
Fernandez-Martinez A, Krop IE, Hillman DW, Polley M-YC, Parker JS, Huebner L, et al. (2020). Survival, pathologic response, and PAM50 subtype in stage II–III HER2-positive breast cancer treated with neoadjuvant chemotherapy and trastuzumab ± lapatinib. J Clin Oncol, 38(19):2140–2150. https://doi.org/10.1200/JCO.20.01276
Paquet ER, Hallett MT. (2015). Absolute assignment of breast cancer intrinsic molecular subtype. J Natl Cancer Inst, 107(1):357. https://doi.org/10.1093/jnci/dju357
Staaf J, Ringnér M, Vallon-Christersson J. (2022). Simple single-sample predictors for breast cancer subtype identification using gene expression data. npj Breast Cancer, 8:104. https://doi.org/10.1038/s41523-022-00465-3
sessionInfo()
#> R version 4.5.1 Patched (2025-08-23 r88802)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.3 LTS
#>
#> Matrix products: default
#> BLAS: /home/biocbuild/bbs-3.22-bioc/R/lib/libRblas.so
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0 LAPACK version 3.12.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_GB LC_COLLATE=C
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: America/New_York
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] BreastSubtypeR_1.1.6 BiocStyle_2.37.1
#>
#> loaded via a namespace (and not attached):
#> [1] SummarizedExperiment_1.39.2 gtable_0.3.6
#> [3] impute_1.83.0 circlize_0.4.16
#> [5] shape_1.4.6.1 rjson_0.2.23
#> [7] xfun_0.53 bslib_0.9.0
#> [9] ggplot2_4.0.0 GlobalOptions_0.1.2
#> [11] ggrepel_0.9.6 Biobase_2.69.1
#> [13] lattice_0.22-7 Cairo_1.6-5
#> [15] vctrs_0.6.5 tools_4.5.1
#> [17] generics_0.1.4 stats4_4.5.1
#> [19] parallel_4.5.1 proxy_0.4-27
#> [21] tibble_3.3.0 cluster_2.1.8.1
#> [23] pkgconfig_2.0.3 Matrix_1.7-4
#> [25] data.table_1.17.8 RColorBrewer_1.1-3
#> [27] S7_0.2.0 S4Vectors_0.47.4
#> [29] lifecycle_1.0.4 compiler_4.5.1
#> [31] farver_2.1.2 stringr_1.5.2
#> [33] tinytex_0.57 Seqinfo_0.99.2
#> [35] codetools_0.2-20 ComplexHeatmap_2.25.2
#> [37] clue_0.3-66 class_7.3-23
#> [39] htmltools_0.5.8.1 sass_0.4.10
#> [41] yaml_2.3.10 pillar_1.11.1
#> [43] crayon_1.5.3 jquerylib_0.1.4
#> [45] DelayedArray_0.35.3 cachem_1.1.0
#> [47] magick_2.9.0 iterators_1.0.14
#> [49] abind_1.4-8 foreach_1.5.2
#> [51] tidyselect_1.2.1 digest_0.6.37
#> [53] stringi_1.8.7 dplyr_1.1.4
#> [55] bookdown_0.45 fastmap_1.2.0
#> [57] grid_4.5.1 SparseArray_1.9.1
#> [59] colorspace_2.1-2 cli_3.6.5
#> [61] magrittr_2.0.4 S4Arrays_1.9.1
#> [63] dichromat_2.0-0.1 e1071_1.7-16
#> [65] withr_3.0.2 scales_1.4.0
#> [67] rmarkdown_2.30 XVector_0.49.1
#> [69] matrixStats_1.5.0 png_0.1-8
#> [71] GetoptLong_1.0.5 evaluate_1.0.5
#> [73] knitr_1.50 GenomicRanges_1.61.5
#> [75] IRanges_2.43.5 doParallel_1.0.17
#> [77] rlang_1.1.6 Rcpp_1.1.0
#> [79] glue_1.8.0 BiocManager_1.30.26
#> [81] BiocGenerics_0.55.1 jsonlite_2.0.0
#> [83] R6_2.6.1 MatrixGenerics_1.21.0