TileDBArray 1.18.0
TileDB implements a framework for local and remote storage of dense and sparse arrays.
We can use this as a DelayedArray
backend to provide an array-level abstraction,
thus allowing the data to be used in many places where an ordinary array or matrix might be used.
The TileDBArray package implements the necessary wrappers around TileDB-R
to support read/write operations on TileDB arrays within the DelayedArray framework.
TileDBArray
Creating a TileDBArray
is as easy as:
X <- matrix(rnorm(1000), ncol=10)
library(TileDBArray)
writeTileDBArray(X)
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] -0.42208041 -0.81998193 -0.31739264 . -0.6914245 -0.4157741
## [2,] -1.72925273 -1.59879259 0.02826441 . 0.4791934 -0.8046053
## [3,] -0.49556322 -0.40791395 -0.62274378 . -1.2345380 -0.5625844
## [4,] -0.96987481 -0.49127324 0.21907759 . -1.5015192 1.3354336
## [5,] -1.27884582 -2.43228191 1.38720873 . -0.6098997 -0.6162940
## ... . . . . . .
## [96,] -1.04051850 0.83898939 0.88980784 . 0.28247424 -0.97331333
## [97,] 2.05575769 1.41655911 -0.70551665 . -0.01776516 1.17446481
## [98,] -1.93208126 -0.21641396 -0.09102164 . 0.26335305 -0.51995360
## [99,] 1.20352729 -0.16616049 -0.03638747 . 1.20963371 2.15773292
## [100,] 0.28830716 0.74449826 -1.21697664 . 0.04222004 -0.43073308
Alternatively, we can use coercion methods:
as(X, "TileDBArray")
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] -0.42208041 -0.81998193 -0.31739264 . -0.6914245 -0.4157741
## [2,] -1.72925273 -1.59879259 0.02826441 . 0.4791934 -0.8046053
## [3,] -0.49556322 -0.40791395 -0.62274378 . -1.2345380 -0.5625844
## [4,] -0.96987481 -0.49127324 0.21907759 . -1.5015192 1.3354336
## [5,] -1.27884582 -2.43228191 1.38720873 . -0.6098997 -0.6162940
## ... . . . . . .
## [96,] -1.04051850 0.83898939 0.88980784 . 0.28247424 -0.97331333
## [97,] 2.05575769 1.41655911 -0.70551665 . -0.01776516 1.17446481
## [98,] -1.93208126 -0.21641396 -0.09102164 . 0.26335305 -0.51995360
## [99,] 1.20352729 -0.16616049 -0.03638747 . 1.20963371 2.15773292
## [100,] 0.28830716 0.74449826 -1.21697664 . 0.04222004 -0.43073308
This process works also for sparse matrices:
Y <- Matrix::rsparsematrix(1000, 1000, density=0.01)
writeTileDBArray(Y)
## <1000 x 1000> sparse TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,999] [,1000]
## [1,] 0 0 0 . 0.000 -0.710
## [2,] 0 0 0 . -0.082 0.000
## [3,] 0 0 0 . 0.000 0.000
## [4,] 0 0 0 . 0.000 0.000
## [5,] 0 0 0 . 0.000 0.000
## ... . . . . . .
## [996,] 0.00 0.25 0.00 . 0.00 0.00
## [997,] 0.00 0.00 0.00 . 0.00 0.00
## [998,] 0.00 0.00 0.00 . 0.00 0.00
## [999,] 0.00 0.00 0.00 . 0.00 0.00
## [1000,] 0.00 0.00 0.00 . -0.59 0.00
Logical and integer matrices are supported:
writeTileDBArray(Y > 0)
## <1000 x 1000> sparse TileDBMatrix object of type "logical":
## [,1] [,2] [,3] ... [,999] [,1000]
## [1,] FALSE FALSE FALSE . FALSE FALSE
## [2,] FALSE FALSE FALSE . FALSE FALSE
## [3,] FALSE FALSE FALSE . FALSE FALSE
## [4,] FALSE FALSE FALSE . FALSE FALSE
## [5,] FALSE FALSE FALSE . FALSE FALSE
## ... . . . . . .
## [996,] FALSE TRUE FALSE . FALSE FALSE
## [997,] FALSE FALSE FALSE . FALSE FALSE
## [998,] FALSE FALSE FALSE . FALSE FALSE
## [999,] FALSE FALSE FALSE . FALSE FALSE
## [1000,] FALSE FALSE FALSE . FALSE FALSE
As are matrices with dimension names:
rownames(X) <- sprintf("GENE_%i", seq_len(nrow(X)))
colnames(X) <- sprintf("SAMP_%i", seq_len(ncol(X)))
writeTileDBArray(X)
## <100 x 10> TileDBMatrix object of type "double":
## SAMP_1 SAMP_2 SAMP_3 ... SAMP_9 SAMP_10
## GENE_1 -0.42208041 -0.81998193 -0.31739264 . -0.6914245 -0.4157741
## GENE_2 -1.72925273 -1.59879259 0.02826441 . 0.4791934 -0.8046053
## GENE_3 -0.49556322 -0.40791395 -0.62274378 . -1.2345380 -0.5625844
## GENE_4 -0.96987481 -0.49127324 0.21907759 . -1.5015192 1.3354336
## GENE_5 -1.27884582 -2.43228191 1.38720873 . -0.6098997 -0.6162940
## ... . . . . . .
## GENE_96 -1.04051850 0.83898939 0.88980784 . 0.28247424 -0.97331333
## GENE_97 2.05575769 1.41655911 -0.70551665 . -0.01776516 1.17446481
## GENE_98 -1.93208126 -0.21641396 -0.09102164 . 0.26335305 -0.51995360
## GENE_99 1.20352729 -0.16616049 -0.03638747 . 1.20963371 2.15773292
## GENE_100 0.28830716 0.74449826 -1.21697664 . 0.04222004 -0.43073308
TileDBArray
sTileDBArray
s are simply DelayedArray
objects and can be manipulated as such.
The usual conventions for extracting data from matrix-like objects work as expected:
out <- as(X, "TileDBArray")
dim(out)
## [1] 100 10
head(rownames(out))
## [1] "GENE_1" "GENE_2" "GENE_3" "GENE_4" "GENE_5" "GENE_6"
head(out[,1])
## GENE_1 GENE_2 GENE_3 GENE_4 GENE_5 GENE_6
## -0.4220804 -1.7292527 -0.4955632 -0.9698748 -1.2788458 1.9354267
We can also perform manipulations like subsetting and arithmetic.
Note that these operations do not affect the data in the TileDB backend;
rather, they are delayed until the values are explicitly required,
hence the creation of the DelayedMatrix
object.
out[1:5,1:5]
## <5 x 5> DelayedMatrix object of type "double":
## SAMP_1 SAMP_2 SAMP_3 SAMP_4 SAMP_5
## GENE_1 -0.42208041 -0.81998193 -0.31739264 -0.77864795 -2.12317035
## GENE_2 -1.72925273 -1.59879259 0.02826441 -0.49942406 -0.12051080
## GENE_3 -0.49556322 -0.40791395 -0.62274378 2.07874602 0.33843563
## GENE_4 -0.96987481 -0.49127324 0.21907759 0.23322130 -0.40666493
## GENE_5 -1.27884582 -2.43228191 1.38720873 -1.51607801 -0.48665113
out * 2
## <100 x 10> DelayedMatrix object of type "double":
## SAMP_1 SAMP_2 SAMP_3 ... SAMP_9 SAMP_10
## GENE_1 -0.84416081 -1.63996386 -0.63478529 . -1.3828490 -0.8315482
## GENE_2 -3.45850547 -3.19758519 0.05652881 . 0.9583868 -1.6092107
## GENE_3 -0.99112644 -0.81582790 -1.24548755 . -2.4690760 -1.1251689
## GENE_4 -1.93974962 -0.98254648 0.43815517 . -3.0030383 2.6708673
## GENE_5 -2.55769163 -4.86456383 2.77441745 . -1.2197995 -1.2325879
## ... . . . . . .
## GENE_96 -2.08103700 1.67797879 1.77961568 . 0.56494848 -1.94662666
## GENE_97 4.11151537 2.83311822 -1.41103330 . -0.03553032 2.34892962
## GENE_98 -3.86416252 -0.43282793 -0.18204327 . 0.52670609 -1.03990720
## GENE_99 2.40705459 -0.33232098 -0.07277493 . 2.41926742 4.31546584
## GENE_100 0.57661432 1.48899652 -2.43395328 . 0.08444008 -0.86146616
We can also do more complex matrix operations that are supported by DelayedArray:
colSums(out)
## SAMP_1 SAMP_2 SAMP_3 SAMP_4 SAMP_5 SAMP_6 SAMP_7
## 8.317195 -7.295107 3.420845 12.134166 -2.774949 3.946618 9.833469
## SAMP_8 SAMP_9 SAMP_10
## -11.246291 -14.306348 5.382222
out %*% runif(ncol(out))
## [,1]
## GENE_1 -2.862336262
## GENE_2 -1.486860315
## GENE_3 -0.762394569
## GENE_4 -2.333332880
## GENE_5 -4.139541154
## GENE_6 1.130064446
## GENE_7 0.603940411
## GENE_8 1.371111819
## GENE_9 -0.404649609
## GENE_10 1.519117336
## GENE_11 0.149070060
## GENE_12 -1.141147197
## GENE_13 1.543042847
## GENE_14 2.757353672
## GENE_15 0.787361158
## GENE_16 2.133765219
## GENE_17 1.742619890
## GENE_18 2.786061340
## GENE_19 0.510542207
## GENE_20 -0.861016392
## GENE_21 1.509929674
## GENE_22 -2.341675116
## GENE_23 -0.364609090
## GENE_24 -0.496749743
## GENE_25 1.057919469
## GENE_26 -0.101875281
## GENE_27 -0.232245916
## GENE_28 1.561076774
## GENE_29 1.173465898
## GENE_30 0.008762422
## GENE_31 -2.345439933
## GENE_32 2.083953616
## GENE_33 -0.380315154
## GENE_34 -1.051053209
## GENE_35 -0.556541483
## GENE_36 -1.023836872
## GENE_37 3.219199371
## GENE_38 1.073030178
## GENE_39 0.906215850
## GENE_40 -0.237298629
## GENE_41 -2.100307635
## GENE_42 1.573443524
## GENE_43 0.418662962
## GENE_44 -3.280889224
## GENE_45 -0.606036481
## GENE_46 1.716556963
## GENE_47 -2.650309548
## GENE_48 -1.437885891
## GENE_49 2.539851990
## GENE_50 0.872974841
## GENE_51 -2.181501118
## GENE_52 -0.228025439
## GENE_53 1.552188722
## GENE_54 -1.784278690
## GENE_55 0.925502221
## GENE_56 0.802466778
## GENE_57 0.777043762
## GENE_58 0.889454425
## GENE_59 1.324499458
## GENE_60 -0.631886740
## GENE_61 0.192179741
## GENE_62 1.837000146
## GENE_63 1.209056266
## GENE_64 -3.028111854
## GENE_65 -0.442788421
## GENE_66 -0.733184271
## GENE_67 -0.570309275
## GENE_68 -0.858859601
## GENE_69 -1.265489477
## GENE_70 -2.475987752
## GENE_71 -1.333153194
## GENE_72 -0.624253934
## GENE_73 -2.103275447
## GENE_74 -1.058433964
## GENE_75 -0.443166579
## GENE_76 -0.526981821
## GENE_77 -2.630555547
## GENE_78 0.515073001
## GENE_79 1.020852440
## GENE_80 -1.682661524
## GENE_81 -1.577422419
## GENE_82 1.745203457
## GENE_83 0.621289205
## GENE_84 1.412342877
## GENE_85 1.935200014
## GENE_86 2.803063159
## GENE_87 1.382126311
## GENE_88 1.673387513
## GENE_89 0.443779757
## GENE_90 -1.457297688
## GENE_91 -1.235342346
## GENE_92 1.926874465
## GENE_93 1.024861265
## GENE_94 0.658884389
## GENE_95 -1.089767973
## GENE_96 -0.439531060
## GENE_97 -0.366492460
## GENE_98 -3.113327262
## GENE_99 2.693032887
## GENE_100 -0.925686493
We can adjust some parameters for creating the backend with appropriate arguments to writeTileDBArray()
.
For example, the example below allows us to control the path to the backend
as well as the name of the attribute containing the data.
X <- matrix(rnorm(1000), ncol=10)
path <- tempfile()
writeTileDBArray(X, path=path, attr="WHEE")
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] -0.4612936 0.6101172 -0.7018026 . -0.91425329 1.31180765
## [2,] -1.3526912 1.7736933 0.1169059 . 0.62245856 0.66210159
## [3,] 0.6406656 -0.4439037 -0.7101705 . 1.73141390 0.79068206
## [4,] 0.2140898 -0.3343304 -0.7792249 . -0.49891645 0.57644445
## [5,] -0.7522347 0.2784878 0.9905263 . -0.04111726 0.20947468
## ... . . . . . .
## [96,] -1.262845531 0.636011496 1.783595484 . -1.01874948 2.39215758
## [97,] -0.138271268 0.105865241 -2.236847461 . -0.01086105 -0.45781723
## [98,] -0.949875495 -0.001761688 0.448228570 . 0.64970090 1.07507071
## [99,] 0.978588682 0.783394640 0.685812449 . -1.33874591 -1.74213708
## [100,] -0.800292344 0.825885621 1.532486477 . -0.71564986 -1.13253152
As these arguments cannot be passed during coercion, we instead provide global variables that can be set or unset to affect the outcome.
path2 <- tempfile()
setTileDBPath(path2)
as(X, "TileDBArray") # uses path2 to store the backend.
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] -0.4612936 0.6101172 -0.7018026 . -0.91425329 1.31180765
## [2,] -1.3526912 1.7736933 0.1169059 . 0.62245856 0.66210159
## [3,] 0.6406656 -0.4439037 -0.7101705 . 1.73141390 0.79068206
## [4,] 0.2140898 -0.3343304 -0.7792249 . -0.49891645 0.57644445
## [5,] -0.7522347 0.2784878 0.9905263 . -0.04111726 0.20947468
## ... . . . . . .
## [96,] -1.262845531 0.636011496 1.783595484 . -1.01874948 2.39215758
## [97,] -0.138271268 0.105865241 -2.236847461 . -0.01086105 -0.45781723
## [98,] -0.949875495 -0.001761688 0.448228570 . 0.64970090 1.07507071
## [99,] 0.978588682 0.783394640 0.685812449 . -1.33874591 -1.74213708
## [100,] -0.800292344 0.825885621 1.532486477 . -0.71564986 -1.13253152
sessionInfo()
## R version 4.5.0 RC (2025-04-04 r88126)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.2 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.21-bioc/R/lib/libRblas.so
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0 LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: America/New_York
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] RcppSpdlog_0.0.21 TileDBArray_1.18.0 DelayedArray_0.34.0
## [4] SparseArray_1.8.0 S4Arrays_1.8.0 IRanges_2.42.0
## [7] abind_1.4-8 S4Vectors_0.46.0 MatrixGenerics_1.20.0
## [10] matrixStats_1.5.0 BiocGenerics_0.54.0 generics_0.1.3
## [13] Matrix_1.7-3 BiocStyle_2.36.0
##
## loaded via a namespace (and not attached):
## [1] bit_4.6.0 jsonlite_2.0.0 compiler_4.5.0
## [4] BiocManager_1.30.25 crayon_1.5.3 Rcpp_1.0.14
## [7] nanoarrow_0.6.0 jquerylib_0.1.4 yaml_2.3.10
## [10] fastmap_1.2.0 lattice_0.22-7 R6_2.6.1
## [13] RcppCCTZ_0.2.13 XVector_0.48.0 tiledb_0.30.2
## [16] knitr_1.50 bookdown_0.43 bslib_0.9.0
## [19] rlang_1.1.6 cachem_1.1.0 xfun_0.52
## [22] sass_0.4.10 bit64_4.6.0-1 cli_3.6.4
## [25] spdl_0.0.5 digest_0.6.37 grid_4.5.0
## [28] lifecycle_1.0.4 data.table_1.17.0 evaluate_1.0.3
## [31] nanotime_0.3.12 zoo_1.8-14 rmarkdown_2.29
## [34] tools_4.5.0 htmltools_0.5.8.1