Seqinfo is an R/Bioconductor infrastructure package that implements a simple S4 class, the Seqinfo class, for storing the following information about a collection of genomic sequences:
the sequence names a.k.a. the seqnames;
the sequence lengths a.k.a. the seqlengths;
the sequence circularity flags, that is, whether the sequences are circular or not;
the sequence genomes, that is, the genome assembly that each sequence is coming from, if any.
The sequences described in a Seqinfo object are typically, but not necessarily, the chromosomes and/or scaffolds of a specific genome assembly of a given organism.
Note that Seqinfo objects are rarely used as standalone objects.
Instead, they are used as part of higher-level objects to represent
their seqinfo() component. Examples of such higher-level
objects are GRanges, RangedSummarizedExperiment, VCF, GAlignments, TxDb,
etc… defined in other Bioconductor infrastructure packages.
Like any Bioconductor package, Seqinfo
should be installed with BiocManager::install():
if (!require("BiocManager", quietly=TRUE))
install.packages("BiocManager")
BiocManager::install("Seqinfo")BiocManager::install() will take care of installing the
package dependencies that are missing.
Load the package:
Seqinfo objects can be created with the Seqinfo()
constructor function:
## Note that all the arguments (except 'genome') must have the
## same length. 'genome' can be of length 1, whatever the lengths
## of the other arguments are.
si1 <- Seqinfo(seqnames=c("chr1", "chr2", "chr3", "chrM"),
seqlengths=c(100, 200, NA, 15),
isCircular=c(NA, FALSE, FALSE, TRUE),
genome="toy")
si1## Seqinfo object with 4 sequences (1 circular) from toy genome:
## seqnames seqlengths isCircular genome
## chr1 100 NA toy
## chr2 200 FALSE toy
## chr3 NA FALSE toy
## chrM 15 TRUE toy
One special form of calling the Seqinfo() constructor
function is to specify only the genome argument and set it
to the name of an NCBI assembly
(e.g. Seqinfo(genome="GRCh38.p14")) or UCSC genome (e.g.
Seqinfo(genome="hg38")), in which case the sequence
information is fetched from NCBI or UCSC. This requires the GenomeInfoDb
package (the package only needs to be installed):
## Seqinfo object with 709 sequences (1 circular) from GRCh38.p14 genome:
## seqnames seqlengths isCircular genome
## 1 248956422 FALSE GRCh38.p14
## 2 242193529 FALSE GRCh38.p14
## 3 198295559 FALSE GRCh38.p14
## 4 190214555 FALSE GRCh38.p14
## 5 181538259 FALSE GRCh38.p14
## ... ... ... ...
## HSCHR22_8_CTG1 145162 FALSE GRCh38.p14
## HSCHRX_3_CTG7 188004 FALSE GRCh38.p14
## HSCHRX_1_CTG14 619716 FALSE GRCh38.p14
## HSCHRX_2_CTG14 294119 FALSE GRCh38.p14
## HSCHRX_3_CTG3 330493 FALSE GRCh38.p14
## Seqinfo object with 711 sequences (1 circular) from hg38 genome:
## seqnames seqlengths isCircular genome
## chr1 248956422 FALSE hg38
## chr2 242193529 FALSE hg38
## chr3 198295559 FALSE hg38
## chr4 190214555 FALSE hg38
## chr5 181538259 FALSE hg38
## ... ... ... ...
## chr22_KQ759761v1_alt 145162 FALSE hg38
## chrX_KV766199v1_alt 188004 FALSE hg38
## chrX_MU273395v1_alt 619716 FALSE hg38
## chrX_MU273396v1_alt 294119 FALSE hg38
## chrX_MU273397v1_alt 330493 FALSE hg38
See ?Seqinfo for more information.
Various accessor functions are provided:
## [1] 4
## [1] "chr1" "chr2" "chr3" "chrM"
## [1] "chr1" "chr2" "chr3" "chrM"
## [1] "chr1" "chr2" "chr3" "chrM"
## chr1 chr2 chr3 chrM
## 100 200 NA 15
## chr1 chr2 chr3 chrM
## NA FALSE FALSE TRUE
## chr1 chr2 chr3 chrM
## "toy" "toy" "toy" "toy"
See ?Seqinfo for more information.
A Seqinfo object can be subsetted by seqnames:
## Seqinfo object with 3 sequences from 2 genomes (NA, toy):
## seqnames seqlengths isCircular genome
## chrY NA NA <NA>
## chr3 NA FALSE toy
## chr1 100 NA toy
Rename:
## Seqinfo object with 4 sequences (1 circular) from toy genome:
## seqnames seqlengths isCircular genome
## ch1 100 NA toy
## ch2 200 FALSE toy
## ch3 NA FALSE toy
## chM 15 TRUE toy
Reorder:
## Seqinfo object with 4 sequences (1 circular) from toy genome:
## seqnames seqlengths isCircular genome
## chM 15 TRUE toy
## ch3 NA FALSE toy
## ch2 200 FALSE toy
## ch1 100 NA toy
Drop/add/reorder:
## Seqinfo object with 3 sequences from 2 genomes (toy, NA):
## seqnames seqlengths isCircular genome
## ch1 100 NA toy
## ch2 200 FALSE toy
## chY NA NA <NA>
Rename/reorder/drop/add:
## Seqinfo object with 3 sequences from 2 genomes (NA, toy):
## seqnames seqlengths isCircular genome
## Y NA NA <NA>
## 1 100 NA toy
## 22 NA NA <NA>
See ?Seqinfo for more information.
Two Seqinfo objects can be merged if they are compatible:
## Seqinfo object with 3 sequences from an unspecified genome:
## seqnames seqlengths isCircular genome
## chr3 300 NA <NA>
## chr4 NA NA <NA>
## chrM 15 NA <NA>
## Warning in .merge_two_Seqinfo_objects(x, y): Each of the 2 combined objects has sequence levels not in the other:
## - in 'x': chr1, chr2
## - in 'y': chr4
## Make sure to always combine/compare objects based on the same reference
## genome (use suppressWarnings() to suppress this warning).
## Seqinfo object with 5 sequences (1 circular) from 2 genomes (toy, NA):
## seqnames seqlengths isCircular genome
## chr1 100 NA toy
## chr2 200 FALSE toy
## chr3 300 FALSE toy
## chrM 15 TRUE toy
## chr4 NA NA <NA>
## Seqinfo object with 5 sequences (1 circular) from 2 genomes (toy, NA):
## seqnames seqlengths isCircular genome
## chr1 100 NA toy
## chr2 200 FALSE toy
## chr3 300 FALSE toy
## chrM 15 TRUE toy
## chr4 NA NA <NA>
Note that, strictly speaking, merging two Seqinfo objects is not a
commutative operation, i.e., in general
z1 <- merge(x, y) is not identical to
z2 <- merge(y, x). However z1 and
z2 are guaranteed to contain the same information (i.e. the
same rows, but typically not in the same order):
## Seqinfo object with 5 sequences (1 circular) from 2 genomes (toy, NA):
## seqnames seqlengths isCircular genome
## chr3 300 FALSE toy
## chr4 NA NA <NA>
## chrM 15 TRUE toy
## chr1 100 NA toy
## chr2 200 FALSE toy
Trying to merge Seqinfo objects that are not compatible will raise an error:
## This contradicts what 'x' says about circularity of chr3 and chrM:
isCircular(si2)[c("chr3", "chrM")] <- c(TRUE, FALSE)
si2## Seqinfo object with 3 sequences (1 circular) from an unspecified genome:
## seqnames seqlengths isCircular genome
## chr3 300 TRUE <NA>
## chr4 NA NA <NA>
## chrM 15 FALSE <NA>
merge(si1, si2) # ERROR!
## Error in mergeNamedAtomicVectors(isCircular(x), isCircular(y), what = c("sequence", :
## sequences chr3, chrM have incompatible circularity flags:
## - in 'x': FALSE, TRUE
## - in 'y': TRUE, FALSESee ?Seqinfo for more information.
Seqinfo objects are typically found as part of higher-level objects
to represent their seqinfo() component. For example, in
TxDb objects:
## [1] "TxDb"
## attr(,"package")
## [1] "GenomicFeatures"
## Seqinfo object with 711 sequences (1 circular) from hg38 genome:
## seqnames seqlengths isCircular genome
## chr1 248956422 FALSE hg38
## chr2 242193529 FALSE hg38
## chr3 198295559 FALSE hg38
## chr4 190214555 FALSE hg38
## chr5 181538259 FALSE hg38
## ... ... ... ...
## chr22_KQ759761v1_alt 145162 FALSE hg38
## chrX_KV766199v1_alt 188004 FALSE hg38
## chrX_MU273395v1_alt 619716 FALSE hg38
## chrX_MU273396v1_alt 294119 FALSE hg38
## chrX_MU273397v1_alt 330493 FALSE hg38
and in BSgenome objects:
## [1] "BSgenome"
## attr(,"package")
## [1] "BSgenome"
## Seqinfo object with 711 sequences (1 circular) from hg38 genome:
## seqnames seqlengths isCircular genome
## chr1 248956422 FALSE hg38
## chr2 242193529 FALSE hg38
## chr3 198295559 FALSE hg38
## chr4 190214555 FALSE hg38
## chr5 181538259 FALSE hg38
## ... ... ... ...
## chr22_KQ759761v1_alt 145162 FALSE hg38
## chrX_KV766199v1_alt 188004 FALSE hg38
## chrX_MU273395v1_alt 619716 FALSE hg38
## chrX_MU273396v1_alt 294119 FALSE hg38
## chrX_MU273397v1_alt 330493 FALSE hg38
Sanity checks:
Here is the output of sessionInfo() on the system on
which this document was compiled:
## R version 4.5.2 (2025-10-31)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.3 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Etc/UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] BSgenome.Hsapiens.UCSC.hg38_1.4.5
## [2] TxDb.Hsapiens.UCSC.hg38.knownGene_3.22.0
## [3] BSgenome_1.79.1
## [4] rtracklayer_1.71.3
## [5] BiocIO_1.21.0
## [6] Biostrings_2.79.4
## [7] XVector_0.51.0
## [8] GenomicFeatures_1.63.1
## [9] AnnotationDbi_1.73.0
## [10] Biobase_2.71.0
## [11] GenomicRanges_1.63.1
## [12] GenomeInfoDb_1.47.2
## [13] IRanges_2.45.0
## [14] S4Vectors_0.49.0
## [15] Seqinfo_1.1.0
## [16] BiocGenerics_0.57.0
## [17] generics_0.1.4
## [18] BiocStyle_2.39.0
##
## loaded via a namespace (and not attached):
## [1] KEGGREST_1.51.1 SummarizedExperiment_1.41.0
## [3] rjson_0.2.23 xfun_0.56
## [5] bslib_0.9.0 lattice_0.22-7
## [7] vctrs_0.7.0 tools_4.5.2
## [9] bitops_1.0-9 curl_7.0.0
## [11] parallel_4.5.2 RSQLite_2.4.5
## [13] blob_1.3.0 pkgconfig_2.0.3
## [15] Matrix_1.7-4 cigarillo_1.1.0
## [17] lifecycle_1.0.5 compiler_4.5.2
## [19] Rsamtools_2.27.0 codetools_0.2-20
## [21] htmltools_0.5.9 sys_3.4.3
## [23] buildtools_1.0.0 sass_0.4.10
## [25] RCurl_1.98-1.17 yaml_2.3.12
## [27] crayon_1.5.3 jquerylib_0.1.4
## [29] BiocParallel_1.45.0 DelayedArray_0.37.0
## [31] cachem_1.1.0 abind_1.4-8
## [33] digest_0.6.39 restfulr_0.0.16
## [35] maketools_1.3.2 fastmap_1.2.0
## [37] grid_4.5.2 SparseArray_1.11.10
## [39] cli_3.6.5 S4Arrays_1.11.1
## [41] XML_3.99-0.20 UCSC.utils_1.7.1
## [43] bit64_4.6.0-1 rmarkdown_2.30
## [45] httr_1.4.7 matrixStats_1.5.0
## [47] bit_4.6.0 png_0.1-8
## [49] memoise_2.0.1 evaluate_1.0.5
## [51] knitr_1.51 rlang_1.1.7
## [53] DBI_1.2.3 BiocManager_1.30.27
## [55] jsonlite_2.0.0 R6_2.6.1
## [57] MatrixGenerics_1.23.0 GenomicAlignments_1.47.0