TileDBArray 1.19.0
TileDB implements a framework for local and remote storage of dense and sparse arrays.
We can use this as a DelayedArray
backend to provide an array-level abstraction,
thus allowing the data to be used in many places where an ordinary array or matrix might be used.
The TileDBArray package implements the necessary wrappers around TileDB-R
to support read/write operations on TileDB arrays within the DelayedArray framework.
TileDBArray
Creating a TileDBArray
is as easy as:
X <- matrix(rnorm(1000), ncol=10)
library(TileDBArray)
writeTileDBArray(X)
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] -0.03225181 -1.98511321 -0.16819424 . -0.5829637 -1.3590788
## [2,] 0.44867887 -0.93880954 0.16947891 . 0.9494439 1.4771147
## [3,] -0.33366682 -0.85202228 1.79619053 . 0.3432725 -1.6005889
## [4,] 0.06657190 1.30510829 0.80795759 . 0.2251172 0.8772670
## [5,] 0.05026136 -1.96817600 0.82593130 . 0.6916550 -1.8566116
## ... . . . . . .
## [96,] 1.09050641 0.93040889 -1.54171752 . -0.5015762 0.2982801
## [97,] 2.22984230 1.37210150 -0.14799809 . 1.3630392 0.8025892
## [98,] 0.42263154 0.88769191 -0.47985647 . 0.7197962 0.4434052
## [99,] -0.56784921 -0.46173206 0.05890026 . 1.5013391 -1.0544084
## [100,] -0.86367577 -0.31397800 -0.47033194 . -1.0250520 -0.2927141
Alternatively, we can use coercion methods:
as(X, "TileDBArray")
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] -0.03225181 -1.98511321 -0.16819424 . -0.5829637 -1.3590788
## [2,] 0.44867887 -0.93880954 0.16947891 . 0.9494439 1.4771147
## [3,] -0.33366682 -0.85202228 1.79619053 . 0.3432725 -1.6005889
## [4,] 0.06657190 1.30510829 0.80795759 . 0.2251172 0.8772670
## [5,] 0.05026136 -1.96817600 0.82593130 . 0.6916550 -1.8566116
## ... . . . . . .
## [96,] 1.09050641 0.93040889 -1.54171752 . -0.5015762 0.2982801
## [97,] 2.22984230 1.37210150 -0.14799809 . 1.3630392 0.8025892
## [98,] 0.42263154 0.88769191 -0.47985647 . 0.7197962 0.4434052
## [99,] -0.56784921 -0.46173206 0.05890026 . 1.5013391 -1.0544084
## [100,] -0.86367577 -0.31397800 -0.47033194 . -1.0250520 -0.2927141
This process works also for sparse matrices:
Y <- Matrix::rsparsematrix(1000, 1000, density=0.01)
writeTileDBArray(Y)
## <1000 x 1000> sparse TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,999] [,1000]
## [1,] 0.00 0.00 -0.22 . 0 0
## [2,] 0.00 0.00 0.00 . 0 0
## [3,] 0.00 0.00 0.00 . 0 0
## [4,] 0.00 0.00 0.00 . 0 0
## [5,] 0.00 0.00 0.00 . 0 0
## ... . . . . . .
## [996,] 0 0 0 . 0 0
## [997,] 0 0 0 . 0 0
## [998,] 0 0 0 . 0 0
## [999,] 0 0 0 . 0 0
## [1000,] 0 0 0 . 0 0
Logical and integer matrices are supported:
writeTileDBArray(Y > 0)
## <1000 x 1000> sparse TileDBMatrix object of type "logical":
## [,1] [,2] [,3] ... [,999] [,1000]
## [1,] FALSE FALSE FALSE . FALSE FALSE
## [2,] FALSE FALSE FALSE . FALSE FALSE
## [3,] FALSE FALSE FALSE . FALSE FALSE
## [4,] FALSE FALSE FALSE . FALSE FALSE
## [5,] FALSE FALSE FALSE . FALSE FALSE
## ... . . . . . .
## [996,] FALSE FALSE FALSE . FALSE FALSE
## [997,] FALSE FALSE FALSE . FALSE FALSE
## [998,] FALSE FALSE FALSE . FALSE FALSE
## [999,] FALSE FALSE FALSE . FALSE FALSE
## [1000,] FALSE FALSE FALSE . FALSE FALSE
As are matrices with dimension names:
rownames(X) <- sprintf("GENE_%i", seq_len(nrow(X)))
colnames(X) <- sprintf("SAMP_%i", seq_len(ncol(X)))
writeTileDBArray(X)
## <100 x 10> TileDBMatrix object of type "double":
## SAMP_1 SAMP_2 SAMP_3 ... SAMP_9 SAMP_10
## GENE_1 -0.03225181 -1.98511321 -0.16819424 . -0.5829637 -1.3590788
## GENE_2 0.44867887 -0.93880954 0.16947891 . 0.9494439 1.4771147
## GENE_3 -0.33366682 -0.85202228 1.79619053 . 0.3432725 -1.6005889
## GENE_4 0.06657190 1.30510829 0.80795759 . 0.2251172 0.8772670
## GENE_5 0.05026136 -1.96817600 0.82593130 . 0.6916550 -1.8566116
## ... . . . . . .
## GENE_96 1.09050641 0.93040889 -1.54171752 . -0.5015762 0.2982801
## GENE_97 2.22984230 1.37210150 -0.14799809 . 1.3630392 0.8025892
## GENE_98 0.42263154 0.88769191 -0.47985647 . 0.7197962 0.4434052
## GENE_99 -0.56784921 -0.46173206 0.05890026 . 1.5013391 -1.0544084
## GENE_100 -0.86367577 -0.31397800 -0.47033194 . -1.0250520 -0.2927141
TileDBArray
sTileDBArray
s are simply DelayedArray
objects and can be manipulated as such.
The usual conventions for extracting data from matrix-like objects work as expected:
out <- as(X, "TileDBArray")
dim(out)
## [1] 100 10
head(rownames(out))
## [1] "GENE_1" "GENE_2" "GENE_3" "GENE_4" "GENE_5" "GENE_6"
head(out[,1])
## GENE_1 GENE_2 GENE_3 GENE_4 GENE_5 GENE_6
## -0.03225181 0.44867887 -0.33366682 0.06657190 0.05026136 0.51303408
We can also perform manipulations like subsetting and arithmetic.
Note that these operations do not affect the data in the TileDB backend;
rather, they are delayed until the values are explicitly required,
hence the creation of the DelayedMatrix
object.
out[1:5,1:5]
## <5 x 5> DelayedMatrix object of type "double":
## SAMP_1 SAMP_2 SAMP_3 SAMP_4 SAMP_5
## GENE_1 -0.03225181 -1.98511321 -0.16819424 -1.52931734 1.12191707
## GENE_2 0.44867887 -0.93880954 0.16947891 -0.61776926 1.23057783
## GENE_3 -0.33366682 -0.85202228 1.79619053 -0.59776037 -0.76853164
## GENE_4 0.06657190 1.30510829 0.80795759 -1.12309783 -0.81928879
## GENE_5 0.05026136 -1.96817600 0.82593130 0.43911650 0.82178063
out * 2
## <100 x 10> DelayedMatrix object of type "double":
## SAMP_1 SAMP_2 SAMP_3 ... SAMP_9 SAMP_10
## GENE_1 -0.06450362 -3.97022641 -0.33638848 . -1.1659274 -2.7181576
## GENE_2 0.89735774 -1.87761908 0.33895781 . 1.8988878 2.9542294
## GENE_3 -0.66733363 -1.70404456 3.59238107 . 0.6865449 -3.2011779
## GENE_4 0.13314379 2.61021658 1.61591519 . 0.4502344 1.7545341
## GENE_5 0.10052272 -3.93635201 1.65186261 . 1.3833100 -3.7132231
## ... . . . . . .
## GENE_96 2.1810128 1.8608178 -3.0834350 . -1.0031523 0.5965602
## GENE_97 4.4596846 2.7442030 -0.2959962 . 2.7260784 1.6051785
## GENE_98 0.8452631 1.7753838 -0.9597129 . 1.4395924 0.8868104
## GENE_99 -1.1356984 -0.9234641 0.1178005 . 3.0026781 -2.1088168
## GENE_100 -1.7273515 -0.6279560 -0.9406639 . -2.0501041 -0.5854281
We can also do more complex matrix operations that are supported by DelayedArray:
colSums(out)
## SAMP_1 SAMP_2 SAMP_3 SAMP_4 SAMP_5 SAMP_6
## 11.5033875 -16.3050636 -4.8087907 -11.9326725 0.6475357 12.3112273
## SAMP_7 SAMP_8 SAMP_9 SAMP_10
## 10.1665076 11.6831104 -7.6547832 -3.1915441
out %*% runif(ncol(out))
## [,1]
## GENE_1 -3.38266185
## GENE_2 -0.53736470
## GENE_3 -1.74143063
## GENE_4 2.72167302
## GENE_5 -0.18343882
## GENE_6 2.79843648
## GENE_7 -0.62086713
## GENE_8 -1.71048623
## GENE_9 1.56828451
## GENE_10 -2.26764369
## GENE_11 1.18955762
## GENE_12 2.70818496
## GENE_13 2.20384263
## GENE_14 0.19918471
## GENE_15 1.94649391
## GENE_16 -1.02878808
## GENE_17 1.34774477
## GENE_18 1.87019204
## GENE_19 2.60851277
## GENE_20 0.18422150
## GENE_21 1.85979818
## GENE_22 -1.20705328
## GENE_23 -1.43526213
## GENE_24 -3.53221265
## GENE_25 -2.06502690
## GENE_26 0.33234080
## GENE_27 -1.65604634
## GENE_28 0.61522827
## GENE_29 -2.93387346
## GENE_30 -0.52054469
## GENE_31 0.79971275
## GENE_32 -2.54339198
## GENE_33 2.19829407
## GENE_34 -2.07908553
## GENE_35 -0.49112991
## GENE_36 0.54024492
## GENE_37 -1.22852426
## GENE_38 0.05050532
## GENE_39 1.37261797
## GENE_40 1.49445845
## GENE_41 -0.03166287
## GENE_42 -0.91444792
## GENE_43 -0.24203390
## GENE_44 -1.75738452
## GENE_45 -0.32747019
## GENE_46 -0.05859003
## GENE_47 2.19830484
## GENE_48 -0.60252806
## GENE_49 -1.38344841
## GENE_50 -1.77606518
## GENE_51 -0.33801589
## GENE_52 -0.22699818
## GENE_53 -1.41174926
## GENE_54 -0.01310152
## GENE_55 2.26931386
## GENE_56 -0.31691755
## GENE_57 -1.69912060
## GENE_58 0.07456965
## GENE_59 0.83237548
## GENE_60 2.20054123
## GENE_61 -0.64644827
## GENE_62 -0.39995221
## GENE_63 1.78644757
## GENE_64 -0.18088395
## GENE_65 -0.09950084
## GENE_66 3.58507512
## GENE_67 -1.07652991
## GENE_68 -3.02862181
## GENE_69 -3.26188629
## GENE_70 4.28791154
## GENE_71 -0.01791742
## GENE_72 0.23479931
## GENE_73 -0.79580470
## GENE_74 -1.78920681
## GENE_75 0.01253172
## GENE_76 -1.96343593
## GENE_77 0.47570404
## GENE_78 2.76447172
## GENE_79 -3.19521207
## GENE_80 -0.73761657
## GENE_81 -0.09415035
## GENE_82 3.64012976
## GENE_83 -3.81928846
## GENE_84 -0.58808997
## GENE_85 -0.67723559
## GENE_86 -2.47354167
## GENE_87 1.60056275
## GENE_88 0.59322412
## GENE_89 -0.39412562
## GENE_90 2.51657062
## GENE_91 0.10656019
## GENE_92 3.04347537
## GENE_93 0.58944626
## GENE_94 3.06363129
## GENE_95 0.16574714
## GENE_96 -1.14600524
## GENE_97 2.36925088
## GENE_98 0.49304428
## GENE_99 -1.54642071
## GENE_100 -0.44419454
We can adjust some parameters for creating the backend with appropriate arguments to writeTileDBArray()
.
For example, the example below allows us to control the path to the backend
as well as the name of the attribute containing the data.
X <- matrix(rnorm(1000), ncol=10)
path <- tempfile()
writeTileDBArray(X, path=path, attr="WHEE")
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] 0.08416578 0.24746891 -0.45183260 . 0.72946531 0.36978673
## [2,] 1.22608190 0.50066748 1.44851546 . 0.89631789 -1.86008018
## [3,] 2.41332955 1.53556750 0.56941043 . 1.28770260 0.60006026
## [4,] 0.61303357 -1.27826051 0.18947935 . 0.51684269 0.07106174
## [5,] 0.21765866 0.11642487 0.19462652 . 0.66070389 1.04520525
## ... . . . . . .
## [96,] -0.67012471 1.11782445 0.37809463 . 0.918363655 0.009736818
## [97,] -0.03839270 -0.68017137 -0.30813983 . -0.251659444 -0.781723706
## [98,] 1.85337619 -0.04497988 -0.44850164 . 0.631919476 0.646699245
## [99,] 0.51783635 0.76005926 -0.11271946 . 1.086652280 -0.578675032
## [100,] 0.41789853 0.40758726 -1.13778819 . 1.496327776 0.029800446
As these arguments cannot be passed during coercion, we instead provide global variables that can be set or unset to affect the outcome.
path2 <- tempfile()
setTileDBPath(path2)
as(X, "TileDBArray") # uses path2 to store the backend.
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] 0.08416578 0.24746891 -0.45183260 . 0.72946531 0.36978673
## [2,] 1.22608190 0.50066748 1.44851546 . 0.89631789 -1.86008018
## [3,] 2.41332955 1.53556750 0.56941043 . 1.28770260 0.60006026
## [4,] 0.61303357 -1.27826051 0.18947935 . 0.51684269 0.07106174
## [5,] 0.21765866 0.11642487 0.19462652 . 0.66070389 1.04520525
## ... . . . . . .
## [96,] -0.67012471 1.11782445 0.37809463 . 0.918363655 0.009736818
## [97,] -0.03839270 -0.68017137 -0.30813983 . -0.251659444 -0.781723706
## [98,] 1.85337619 -0.04497988 -0.44850164 . 0.631919476 0.646699245
## [99,] 0.51783635 0.76005926 -0.11271946 . 1.086652280 -0.578675032
## [100,] 0.41789853 0.40758726 -1.13778819 . 1.496327776 0.029800446
sessionInfo()
## R version 4.5.0 beta (2025-04-02 r88102)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.2 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.22-bioc/R/lib/libRblas.so
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0 LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: America/New_York
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] RcppSpdlog_0.0.21 TileDBArray_1.19.0 DelayedArray_0.35.0
## [4] SparseArray_1.9.0 S4Arrays_1.9.0 IRanges_2.43.0
## [7] abind_1.4-8 S4Vectors_0.47.0 MatrixGenerics_1.21.0
## [10] matrixStats_1.5.0 BiocGenerics_0.55.0 generics_0.1.3
## [13] Matrix_1.7-3 BiocStyle_2.37.0
##
## loaded via a namespace (and not attached):
## [1] bit_4.6.0 jsonlite_2.0.0 compiler_4.5.0
## [4] BiocManager_1.30.25 crayon_1.5.3 Rcpp_1.0.14
## [7] nanoarrow_0.6.0 jquerylib_0.1.4 yaml_2.3.10
## [10] fastmap_1.2.0 lattice_0.22-7 R6_2.6.1
## [13] RcppCCTZ_0.2.13 XVector_0.49.0 tiledb_0.30.2
## [16] knitr_1.50 bookdown_0.43 bslib_0.9.0
## [19] rlang_1.1.6 cachem_1.1.0 xfun_0.52
## [22] sass_0.4.10 bit64_4.6.0-1 cli_3.6.4
## [25] spdl_0.0.5 digest_0.6.37 grid_4.5.0
## [28] lifecycle_1.0.4 data.table_1.17.0 evaluate_1.0.3
## [31] nanotime_0.3.12 zoo_1.8-14 rmarkdown_2.29
## [34] tools_4.5.0 htmltools_0.5.8.1