TileDBArray 1.19.1
TileDB implements a framework for local and remote storage of dense and sparse arrays.
We can use this as a DelayedArray
backend to provide an array-level abstraction,
thus allowing the data to be used in many places where an ordinary array or matrix might be used.
The TileDBArray package implements the necessary wrappers around TileDB-R
to support read/write operations on TileDB arrays within the DelayedArray framework.
TileDBArray
Creating a TileDBArray
is as easy as:
X <- matrix(rnorm(1000), ncol=10)
library(TileDBArray)
writeTileDBArray(X)
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] 0.83702823 -0.14914758 0.64166918 . -0.9119227 -1.0268426
## [2,] -0.81051007 -0.85000295 1.16324368 . 0.8675362 -1.1749122
## [3,] -0.56920653 -0.85624215 0.12387899 . -0.6154622 1.4724778
## [4,] 2.58735380 0.46453051 -0.39067613 . 3.1255805 1.4224766
## [5,] 0.02987906 -2.13288999 0.71611918 . -0.6443183 0.3087106
## ... . . . . . .
## [96,] 2.07844778 -0.54829062 -0.76359321 . 1.2110512 -1.3289043
## [97,] -0.61921035 0.03968039 1.21490684 . -0.5885459 0.1158050
## [98,] 0.22201402 0.52684229 -0.10614421 . 0.2518441 1.3138046
## [99,] -0.29793193 -0.39014612 -0.82614468 . 1.5758671 -0.1376110
## [100,] -0.65287291 0.73865067 1.58778337 . 1.2052013 -0.9490587
Alternatively, we can use coercion methods:
as(X, "TileDBArray")
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] 0.83702823 -0.14914758 0.64166918 . -0.9119227 -1.0268426
## [2,] -0.81051007 -0.85000295 1.16324368 . 0.8675362 -1.1749122
## [3,] -0.56920653 -0.85624215 0.12387899 . -0.6154622 1.4724778
## [4,] 2.58735380 0.46453051 -0.39067613 . 3.1255805 1.4224766
## [5,] 0.02987906 -2.13288999 0.71611918 . -0.6443183 0.3087106
## ... . . . . . .
## [96,] 2.07844778 -0.54829062 -0.76359321 . 1.2110512 -1.3289043
## [97,] -0.61921035 0.03968039 1.21490684 . -0.5885459 0.1158050
## [98,] 0.22201402 0.52684229 -0.10614421 . 0.2518441 1.3138046
## [99,] -0.29793193 -0.39014612 -0.82614468 . 1.5758671 -0.1376110
## [100,] -0.65287291 0.73865067 1.58778337 . 1.2052013 -0.9490587
This process works also for sparse matrices:
Y <- Matrix::rsparsematrix(1000, 1000, density=0.01)
writeTileDBArray(Y)
## <1000 x 1000> sparse TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,999] [,1000]
## [1,] 0 0 0 . 0.00 0.00
## [2,] 0 0 0 . 0.00 0.00
## [3,] 0 0 0 . 0.00 0.00
## [4,] 0 0 0 . 0.61 0.00
## [5,] 0 0 0 . 0.00 0.00
## ... . . . . . .
## [996,] 0 0 0 . 0 0
## [997,] 0 0 0 . 0 0
## [998,] 0 0 0 . 0 0
## [999,] 0 0 0 . 0 0
## [1000,] 0 0 0 . 0 0
Logical and integer matrices are supported:
writeTileDBArray(Y > 0)
## <1000 x 1000> sparse TileDBMatrix object of type "logical":
## [,1] [,2] [,3] ... [,999] [,1000]
## [1,] FALSE FALSE FALSE . FALSE FALSE
## [2,] FALSE FALSE FALSE . FALSE FALSE
## [3,] FALSE FALSE FALSE . FALSE FALSE
## [4,] FALSE FALSE FALSE . TRUE FALSE
## [5,] FALSE FALSE FALSE . FALSE FALSE
## ... . . . . . .
## [996,] FALSE FALSE FALSE . FALSE FALSE
## [997,] FALSE FALSE FALSE . FALSE FALSE
## [998,] FALSE FALSE FALSE . FALSE FALSE
## [999,] FALSE FALSE FALSE . FALSE FALSE
## [1000,] FALSE FALSE FALSE . FALSE FALSE
As are matrices with dimension names:
rownames(X) <- sprintf("GENE_%i", seq_len(nrow(X)))
colnames(X) <- sprintf("SAMP_%i", seq_len(ncol(X)))
writeTileDBArray(X)
## <100 x 10> TileDBMatrix object of type "double":
## SAMP_1 SAMP_2 SAMP_3 ... SAMP_9 SAMP_10
## GENE_1 0.83702823 -0.14914758 0.64166918 . -0.9119227 -1.0268426
## GENE_2 -0.81051007 -0.85000295 1.16324368 . 0.8675362 -1.1749122
## GENE_3 -0.56920653 -0.85624215 0.12387899 . -0.6154622 1.4724778
## GENE_4 2.58735380 0.46453051 -0.39067613 . 3.1255805 1.4224766
## GENE_5 0.02987906 -2.13288999 0.71611918 . -0.6443183 0.3087106
## ... . . . . . .
## GENE_96 2.07844778 -0.54829062 -0.76359321 . 1.2110512 -1.3289043
## GENE_97 -0.61921035 0.03968039 1.21490684 . -0.5885459 0.1158050
## GENE_98 0.22201402 0.52684229 -0.10614421 . 0.2518441 1.3138046
## GENE_99 -0.29793193 -0.39014612 -0.82614468 . 1.5758671 -0.1376110
## GENE_100 -0.65287291 0.73865067 1.58778337 . 1.2052013 -0.9490587
TileDBArray
sTileDBArray
s are simply DelayedArray
objects and can be manipulated as such.
The usual conventions for extracting data from matrix-like objects work as expected:
out <- as(X, "TileDBArray")
dim(out)
## [1] 100 10
head(rownames(out))
## [1] "GENE_1" "GENE_2" "GENE_3" "GENE_4" "GENE_5" "GENE_6"
head(out[,1])
## GENE_1 GENE_2 GENE_3 GENE_4 GENE_5 GENE_6
## 0.83702823 -0.81051007 -0.56920653 2.58735380 0.02987906 0.77117420
We can also perform manipulations like subsetting and arithmetic.
Note that these operations do not affect the data in the TileDB backend;
rather, they are delayed until the values are explicitly required,
hence the creation of the DelayedMatrix
object.
out[1:5,1:5]
## <5 x 5> DelayedMatrix object of type "double":
## SAMP_1 SAMP_2 SAMP_3 SAMP_4 SAMP_5
## GENE_1 0.83702823 -0.14914758 0.64166918 -2.60074522 1.39441643
## GENE_2 -0.81051007 -0.85000295 1.16324368 0.50104697 -0.76655204
## GENE_3 -0.56920653 -0.85624215 0.12387899 0.27937171 0.69922525
## GENE_4 2.58735380 0.46453051 -0.39067613 -1.52492459 0.42930338
## GENE_5 0.02987906 -2.13288999 0.71611918 -0.06698247 -0.97313905
out * 2
## <100 x 10> DelayedMatrix object of type "double":
## SAMP_1 SAMP_2 SAMP_3 ... SAMP_9 SAMP_10
## GENE_1 1.67405646 -0.29829516 1.28333836 . -1.8238453 -2.0536852
## GENE_2 -1.62102014 -1.70000591 2.32648737 . 1.7350724 -2.3498243
## GENE_3 -1.13841307 -1.71248431 0.24775799 . -1.2309245 2.9449555
## GENE_4 5.17470759 0.92906102 -0.78135227 . 6.2511609 2.8449531
## GENE_5 0.05975812 -4.26577998 1.43223835 . -1.2886365 0.6174212
## ... . . . . . .
## GENE_96 4.15689555 -1.09658123 -1.52718642 . 2.4221023 -2.6578085
## GENE_97 -1.23842070 0.07936078 2.42981367 . -1.1770919 0.2316101
## GENE_98 0.44402804 1.05368457 -0.21228843 . 0.5036883 2.6276092
## GENE_99 -0.59586386 -0.78029224 -1.65228937 . 3.1517342 -0.2752221
## GENE_100 -1.30574583 1.47730134 3.17556673 . 2.4104027 -1.8981175
We can also do more complex matrix operations that are supported by DelayedArray:
colSums(out)
## SAMP_1 SAMP_2 SAMP_3 SAMP_4 SAMP_5 SAMP_6 SAMP_7
## 5.169397 -13.998913 3.994051 11.221690 -7.760145 -13.618917 3.951580
## SAMP_8 SAMP_9 SAMP_10
## -27.982109 9.610906 -2.995404
out %*% runif(ncol(out))
## [,1]
## GENE_1 -2.516672712
## GENE_2 -3.264132134
## GENE_3 -0.618664669
## GENE_4 0.246542563
## GENE_5 -1.592631357
## GENE_6 -0.206243033
## GENE_7 -1.719528394
## GENE_8 1.383614408
## GENE_9 1.233912454
## GENE_10 -2.455756772
## GENE_11 0.441199394
## GENE_12 2.876847904
## GENE_13 -3.679928595
## GENE_14 1.724562958
## GENE_15 -1.490689408
## GENE_16 2.550544713
## GENE_17 -2.951104552
## GENE_18 -0.320317245
## GENE_19 -4.892250159
## GENE_20 -0.072539281
## GENE_21 -1.701848732
## GENE_22 -0.831532111
## GENE_23 -0.511538001
## GENE_24 -1.996907234
## GENE_25 0.004932874
## GENE_26 -2.828867856
## GENE_27 0.295856542
## GENE_28 2.086948317
## GENE_29 -1.266852569
## GENE_30 -2.008898544
## GENE_31 -2.215289639
## GENE_32 0.551087969
## GENE_33 -0.777756829
## GENE_34 1.470018268
## GENE_35 1.959672201
## GENE_36 -2.043743367
## GENE_37 0.239134429
## GENE_38 1.017703684
## GENE_39 -1.053261374
## GENE_40 -2.901748871
## GENE_41 -3.070394815
## GENE_42 -1.996480917
## GENE_43 1.061779732
## GENE_44 2.219526235
## GENE_45 -1.633839264
## GENE_46 0.597264593
## GENE_47 -2.405438283
## GENE_48 1.413604880
## GENE_49 3.140556681
## GENE_50 0.214948300
## GENE_51 -1.337607355
## GENE_52 -1.597686724
## GENE_53 -2.049052473
## GENE_54 -0.620144226
## GENE_55 0.440911642
## GENE_56 1.300755385
## GENE_57 -0.274773299
## GENE_58 -0.241296600
## GENE_59 0.027454252
## GENE_60 -1.707060826
## GENE_61 0.204255186
## GENE_62 0.108733375
## GENE_63 0.115434132
## GENE_64 1.653357865
## GENE_65 0.927923229
## GENE_66 -2.502330288
## GENE_67 -2.856826591
## GENE_68 2.187359688
## GENE_69 -1.337562534
## GENE_70 -2.713720186
## GENE_71 -1.879584537
## GENE_72 -0.903037754
## GENE_73 0.346671381
## GENE_74 -0.456638830
## GENE_75 -0.756790886
## GENE_76 -0.042165788
## GENE_77 2.179790908
## GENE_78 1.097363958
## GENE_79 0.572315281
## GENE_80 -0.574857379
## GENE_81 -0.362518861
## GENE_82 1.117858676
## GENE_83 -2.659699953
## GENE_84 1.885349308
## GENE_85 -1.987734716
## GENE_86 1.993231570
## GENE_87 3.122625167
## GENE_88 -0.156150461
## GENE_89 -0.011508771
## GENE_90 1.155349091
## GENE_91 1.915767066
## GENE_92 -1.910385025
## GENE_93 3.035399839
## GENE_94 -2.725035367
## GENE_95 1.818657467
## GENE_96 -1.564782462
## GENE_97 1.590486850
## GENE_98 0.579355819
## GENE_99 -1.709536592
## GENE_100 1.722165297
We can adjust some parameters for creating the backend with appropriate arguments to writeTileDBArray()
.
For example, the example below allows us to control the path to the backend
as well as the name of the attribute containing the data.
X <- matrix(rnorm(1000), ncol=10)
path <- tempfile()
writeTileDBArray(X, path=path, attr="WHEE")
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] -0.02960329 -1.29540158 1.15759206 . 0.81104791 0.63175236
## [2,] 0.17768923 -0.31868901 -0.62769622 . 0.04611615 1.31884148
## [3,] -1.77680722 -1.50645969 -0.25286785 . 1.12623296 0.23034884
## [4,] 0.34516790 -0.61698476 0.33633404 . -0.61304188 0.60868828
## [5,] 0.09845622 0.30568173 -0.67337900 . 0.33893165 -0.50958961
## ... . . . . . .
## [96,] -1.30802518 -1.01379374 1.54673867 . -0.04002928 0.82523671
## [97,] 1.40892228 0.49084067 -0.37236372 . -0.47192240 -0.42555338
## [98,] 0.09899860 -2.00073775 -0.43591968 . -0.17250494 0.07242069
## [99,] 0.03710682 -0.19182842 0.37467350 . -1.09016340 0.05797839
## [100,] -0.56508292 0.53755507 0.63980864 . 0.80973587 0.28954248
As these arguments cannot be passed during coercion, we instead provide global variables that can be set or unset to affect the outcome.
path2 <- tempfile()
setTileDBPath(path2)
as(X, "TileDBArray") # uses path2 to store the backend.
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] -0.02960329 -1.29540158 1.15759206 . 0.81104791 0.63175236
## [2,] 0.17768923 -0.31868901 -0.62769622 . 0.04611615 1.31884148
## [3,] -1.77680722 -1.50645969 -0.25286785 . 1.12623296 0.23034884
## [4,] 0.34516790 -0.61698476 0.33633404 . -0.61304188 0.60868828
## [5,] 0.09845622 0.30568173 -0.67337900 . 0.33893165 -0.50958961
## ... . . . . . .
## [96,] -1.30802518 -1.01379374 1.54673867 . -0.04002928 0.82523671
## [97,] 1.40892228 0.49084067 -0.37236372 . -0.47192240 -0.42555338
## [98,] 0.09899860 -2.00073775 -0.43591968 . -0.17250494 0.07242069
## [99,] 0.03710682 -0.19182842 0.37467350 . -1.09016340 0.05797839
## [100,] -0.56508292 0.53755507 0.63980864 . 0.80973587 0.28954248
sessionInfo()
## R version 4.5.1 (2025-06-13)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.2 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.22-bioc/R/lib/libRblas.so
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0 LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: America/New_York
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] RcppSpdlog_0.0.22 TileDBArray_1.19.1 DelayedArray_0.35.2
## [4] SparseArray_1.9.0 S4Arrays_1.9.1 IRanges_2.43.0
## [7] abind_1.4-8 S4Vectors_0.47.0 MatrixGenerics_1.21.0
## [10] matrixStats_1.5.0 BiocGenerics_0.55.0 generics_0.1.4
## [13] Matrix_1.7-3 BiocStyle_2.37.0
##
## loaded via a namespace (and not attached):
## [1] bit_4.6.0 jsonlite_2.0.0 compiler_4.5.1
## [4] BiocManager_1.30.26 crayon_1.5.3 Rcpp_1.0.14
## [7] nanoarrow_0.6.0-1 jquerylib_0.1.4 yaml_2.3.10
## [10] fastmap_1.2.0 lattice_0.22-7 R6_2.6.1
## [13] RcppCCTZ_0.2.13 XVector_0.49.0 tiledb_0.32.0
## [16] knitr_1.50 bookdown_0.43 bslib_0.9.0
## [19] rlang_1.1.6 cachem_1.1.0 xfun_0.52
## [22] sass_0.4.10 bit64_4.6.0-1 cli_3.6.5
## [25] spdl_0.0.5 digest_0.6.37 grid_4.5.1
## [28] lifecycle_1.0.4 data.table_1.17.6 evaluate_1.0.4
## [31] nanotime_0.3.12 zoo_1.8-14 rmarkdown_2.29
## [34] tools_4.5.1 htmltools_0.5.8.1