RaggedExperiment 1.32.2
The RaggedExperiment package provides a flexible data representation for copy number, mutation and other ragged array schema for genomic location data. It aims to provide a framework for a set of samples that have differing numbers of genomic ranges.
The RaggedExperiment class derives from a GRangesList representation and
provides a semblance of a rectangular dataset. The row and column dimensions of
the RaggedExperiment correspond to the number of ranges in the entire dataset
and the number of samples represented in the data, respectively.
if (!require("BiocManager"))
    install.packages("BiocManager")
BiocManager::install("RaggedExperiment")Loading the package:
library(RaggedExperiment)
library(GenomicRanges)Your citations are crucial in keeping our software free and open source. To cite our package see the citation (Ramos et al. (2023)) in the Reference section. You may also browse to the publication at the link here.
RaggedExperiment class overviewA schematic showing the class geometry and supported transformations of the
RaggedExperiment class is show below. There are three main operations for
transforming the RaggedExperiment representation:
sparseAssaycompactAssayqreduceAssay
Figure 1: RaggedExperiment object schematic
Rows and columns represent genomic ranges and samples, respectively. Assay operations can be performed with (from left to right) compactAssay, qreduceAssay, and sparseAssay.
RaggedExperiment objectWe start with a couple of GRanges objects, each representing an individual
sample:
sample1 <- GRanges(
    c(A = "chr1:1-10:-", B = "chr1:8-14:+", C = "chr2:15-18:+"),
    score = 3:5)
sample2 <- GRanges(
    c(D = "chr1:1-10:-", E = "chr2:11-18:+"),
    score = 1:2)Include column data colData to describe the samples:
colDat <- DataFrame(id = 1:2)GRanges objectsragexp <- RaggedExperiment(
    sample1 = sample1,
    sample2 = sample2,
    colData = colDat
)
ragexp## class: RaggedExperiment 
## dim: 5 2 
## assays(1): score
## rownames(5): A B C D E
## colnames(2): sample1 sample2
## colData names(1): idGRangesList instancegrl <- GRangesList(sample1 = sample1, sample2 = sample2)
RaggedExperiment(grl, colData = colDat)## class: RaggedExperiment 
## dim: 5 2 
## assays(1): score
## rownames(5): A B C D E
## colnames(2): sample1 sample2
## colData names(1): idlist of GRangesrangeList <- list(sample1 = sample1, sample2 = sample2)
RaggedExperiment(rangeList, colData = colDat)## class: RaggedExperiment 
## dim: 5 2 
## assays(1): score
## rownames(5): A B C D E
## colnames(2): sample1 sample2
## colData names(1): idList of GRanges with metadataNote: In cases where a SimpleGenomicRangesList is provided along with
accompanying metadata (accessed by mcols), the metadata is used as the
colData for the RaggedExperiment.
grList <- List(sample1 = sample1, sample2 = sample2)
mcols(grList) <- colDat
RaggedExperiment(grList)## class: RaggedExperiment 
## dim: 5 2 
## assays(1): score
## rownames(5): A B C D E
## colnames(2): sample1 sample2
## colData names(1): idrowRanges(ragexp)## GRanges object with 5 ranges and 0 metadata columns:
##     seqnames    ranges strand
##        <Rle> <IRanges>  <Rle>
##   A     chr1      1-10      -
##   B     chr1      8-14      +
##   C     chr2     15-18      +
##   D     chr1      1-10      -
##   E     chr2     11-18      +
##   -------
##   seqinfo: 2 sequences from an unspecified genome; no seqlengthsdimnames(ragexp)## [[1]]
## [1] "A" "B" "C" "D" "E"
## 
## [[2]]
## [1] "sample1" "sample2"colDatacolData(ragexp)## DataFrame with 2 rows and 1 column
##                id
##         <integer>
## sample1         1
## sample2         2Subsetting a RaggedExperiment is akin to subsetting a matrix object. Rows
correspond to genomic ranges and columns to samples or specimen. It is possible
to subset using integer, character, and logical indices.
The overlapsAny and subsetByOverlaps functionalities are available for use
for RaggedExperiment. Please see the corresponding documentation in
RaggedExperiment and GenomicRanges.
RaggedExperiment package provides several different functions for representing
ranged data in a rectangular matrix via the *Assay methods.
The most straightforward matrix representation of a RaggedExperiment will
return a matrix of dimensions equal to the product of the number of ranges and
samples.
dim(ragexp)## [1] 5 2Reduce(`*`, dim(ragexp))## [1] 10sparseAssay(ragexp)##   sample1 sample2
## A       3      NA
## B       4      NA
## C       5      NA
## D      NA       1
## E      NA       2length(sparseAssay(ragexp))## [1] 10We provide sparse matrix representations with the help of the Matrix package.
To obtain a sparse representation, the user can use the sparse = TRUE
argument.
sparseAssay(ragexp, sparse = TRUE)## 5 x 2 sparse Matrix of class "dgCMatrix"
##   sample1 sample2
## A       3       .
## B       4       .
## C       5       .
## D       .       1
## E       .       2This representation is of class dgCMatrix see the Matrix documentation for
more details.
Samples with identical ranges are placed in the same row. Non-disjoint ranges are not collapsed.
compactAssay(ragexp)##              sample1 sample2
## chr1:8-14:+        4      NA
## chr1:1-10:-        3       1
## chr2:11-18:+      NA       2
## chr2:15-18:+       5      NASimilarly, to sparseAssay the compactAssay function provides the option to
obtain a sparse matrix representation with the sparse = TRUE argument. This
will return a dgCMatrix class from the Matrix package.
compactAssay(ragexp, sparse = TRUE)## 4 x 2 sparse Matrix of class "dgCMatrix"
##              sample1 sample2
## chr1:8-14:+        4       .
## chr1:1-10:-        3       1
## chr2:11-18:+       .       2
## chr2:15-18:+       5       .This function returns a matrix of disjoint ranges across all samples. Elements
of the matrix are summarized by applying the simplifyDisjoin functional
argument to assay values of overlapping ranges.
disjoinAssay(ragexp, simplifyDisjoin = mean)##              sample1 sample2
## chr1:8-14:+        4      NA
## chr1:1-10:-        3       1
## chr2:11-14:+      NA       2
## chr2:15-18:+       5       2The qreduceAssay function works with a query parameter that highlights
a window of ranges for the resulting matrix. The returned matrix will have
dimensions length(query) by ncol(x). Elements contain assay values for the
i th query range and the j th sample, summarized according to the
simplifyReduce functional argument.
For demonstration purposes, we can have a look at the original GRangesList
and the associated scores from which the current ragexp object is derived:
unlist(grl, use.names = FALSE)## GRanges object with 5 ranges and 1 metadata column:
##     seqnames    ranges strand |     score
##        <Rle> <IRanges>  <Rle> | <integer>
##   A     chr1      1-10      - |         3
##   B     chr1      8-14      + |         4
##   C     chr2     15-18      + |         5
##   D     chr1      1-10      - |         1
##   E     chr2     11-18      + |         2
##   -------
##   seqinfo: 2 sequences from an unspecified genome; no seqlengthsThis data is represented as rowRanges and assays in RaggedExperiment:
rowRanges(ragexp)## GRanges object with 5 ranges and 0 metadata columns:
##     seqnames    ranges strand
##        <Rle> <IRanges>  <Rle>
##   A     chr1      1-10      -
##   B     chr1      8-14      +
##   C     chr2     15-18      +
##   D     chr1      1-10      -
##   E     chr2     11-18      +
##   -------
##   seqinfo: 2 sequences from an unspecified genome; no seqlengthsassay(ragexp, "score")##   sample1 sample2
## A       3      NA
## B       4      NA
## C       5      NA
## D      NA       1
## E      NA       2Here we provide the “query” region of interest:
(query <- GRanges(c("chr1:1-14:-", "chr2:11-18:+")))## GRanges object with 2 ranges and 0 metadata columns:
##       seqnames    ranges strand
##          <Rle> <IRanges>  <Rle>
##   [1]     chr1      1-14      -
##   [2]     chr2     11-18      +
##   -------
##   seqinfo: 2 sequences from an unspecified genome; no seqlengthsThe simplifyReduce argument in qreduceAssay allows the user to summarize
overlapping regions with a custom method for the given “query” region of
interest. We provide one for calculating a weighted average score per
query range, where the weight is proportional to the overlap widths between
overlapping ranges and a query range.
Note that there are three arguments to this function. See the documentation for additional details.
weightedmean <- function(scores, ranges, qranges)
{
    isects <- pintersect(ranges, qranges)
    sum(scores * width(isects)) / sum(width(isects))
}A call to qreduceAssay involves the RaggedExperiment, the GRanges query
and the simplifyReduce functional argument.
qreduceAssay(ragexp, query, simplifyReduce = weightedmean)##              sample1 sample2
## chr1:1-14:-        3       1
## chr2:11-18:+       5       2The RaggedExperiment provides a family of parallel functions for coercing to
the SummarizedExperiment class. By selecting a particular assay index (i),
a parallel assay coercion method can be achieved.
Here is the list of function names:
sparseSummarizedExperimentcompactSummarizedExperimentdisjoinSummarizedExperimentqreduceSummarizedExperimentSee the documentation for details.
In the special case where the rownames of a sparseMatrix are coercible to
GRanges, RaggedExperiment provides the facility to convert sparse matrices
into RaggedExperiment. This can be done using the as coercion method. The
example below first creates an example sparse dgCMatrix class and then shows
the as method usage to this end.
library(Matrix)## 
## Attaching package: 'Matrix'## The following object is masked from 'package:S4Vectors':
## 
##     expandsm <- Matrix::sparseMatrix(
    i = c(2, 3, 4, 3, 4, 3, 4),
    j = c(1, 1, 1, 3, 3, 4, 4),
    x = c(2L, 4L, 2L, 2L, 2L, 4L, 2L),
    dims = c(4, 4),
    dimnames = list(
        c("chr2:1-10", "chr2:2-10", "chr2:3-10", "chr2:4-10"),
        LETTERS[1:4]
    )
)
as(sm, "RaggedExperiment")## class: RaggedExperiment 
## dim: 7 3 
## assays(1): counts
## rownames: NULL
## colnames(3): A C D
## colData names(0):sessionInfo()## R version 4.5.0 RC (2025-04-04 r88126)
## Platform: x86_64-apple-darwin20
## Running under: macOS Monterey 12.7.6
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/lib/libRblas.0.dylib 
## LAPACK: /Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.1
## 
## locale:
## [1] C/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## time zone: America/New_York
## tzcode source: internal
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
## [1] Matrix_1.7-3            RaggedExperiment_1.32.2 GenomicRanges_1.60.0   
## [4] GenomeInfoDb_1.44.0     IRanges_2.42.0          S4Vectors_0.46.0       
## [7] BiocGenerics_0.54.0     generics_0.1.3          BiocStyle_2.36.0       
## 
## loaded via a namespace (and not attached):
##  [1] jsonlite_2.0.0              compiler_4.5.0             
##  [3] BiocManager_1.30.25         crayon_1.5.3               
##  [5] BiocBaseUtils_1.10.0        SummarizedExperiment_1.38.1
##  [7] Biobase_2.68.0              jquerylib_0.1.4            
##  [9] yaml_2.3.10                 fastmap_1.2.0              
## [11] lattice_0.22-7              R6_2.6.1                   
## [13] XVector_0.48.0              S4Arrays_1.8.0             
## [15] knitr_1.50                  DelayedArray_0.34.1        
## [17] bookdown_0.43               MatrixGenerics_1.20.0      
## [19] GenomeInfoDbData_1.2.14     bslib_0.9.0                
## [21] rlang_1.1.6                 cachem_1.1.0               
## [23] xfun_0.52                   sass_0.4.10                
## [25] SparseArray_1.8.0           cli_3.6.5                  
## [27] digest_0.6.37               grid_4.5.0                 
## [29] lifecycle_1.0.4             evaluate_1.0.3             
## [31] abind_1.4-8                 rmarkdown_2.29             
## [33] httr_1.4.7                  matrixStats_1.5.0          
## [35] tools_4.5.0                 htmltools_0.5.8.1          
## [37] UCSC.utils_1.4.0Ramos, Marcel, Martin Morgan, Ludwig Geistlinger, Vincent J Carey, and Levi Waldron. 2023. “RaggedExperiment: the missing link between genomic ranges and matrices in Bioconductor.” Bioinformatics 39 (6): btad330. https://doi.org/10.1093/bioinformatics/btad330.