NVBIO is a library of reusable components developed by NVIDIA Corporation to accelerate bioinformatics applications using CUDA technology. Although specifically designed to harness the computational power of NVIDIA GPUs, many of its components are fully cross-platform and can be used in both host C++ and device CUDA code.
The purpose of NVBIO is twofold. It serves as a robust foundation for developing modern GPU-focused applications, ensuring that core computations delegated to the library automatically benefit from advances in GPU technology. At the same time, it provides valuable resources for designing novel bioinformatics algorithms tailored to massively parallel architectures.
In addition to its core components, NVBIO includes a suite of applications built on top of the library. Among these is nvBowtie, a re-engineered implementation of the widely recognized Bowtie2 short-read aligner 1. Unlike many prototypes, nvBowtie is built as an industrial-grade aligner, replicating most of Bowtie2’s original features while adding enhancements such as efficient support for direct BAM output, with future plans for CRAM support.
nvBowtie is designed to fully exploit the massive parallelism of modern GPUs, delivering significantly higher alignment throughput without compromising accuracy, or achieving even greater accuracy in the same amount of time. For example, its performance was compared to Bowtie2 using an Illumina HiSeq 2000 dataset (the first 10 million reads of ERR161544) and an IonProton dataset, applying both end-to-end and local alignment methods. The results demonstrate an impressive 99.98% agreement at high MAPQ scores. For comparison, Bowtie2 tests were performed using 20 CPU threads with default alignment settings.
knitr::include_graphics("benchmark-nvbowtie-speedup.png")
benchmark
This package provides an R wrapper for nvBio/nvBowtie, offering user-friendly interfaces specifically designed for R users. To maximize efficiency, the indexing and alignment functions (nvBWT and nvBowtie) are implemented in C++ and seamlessly integrated into R using the system2 function. This integration is fully transparent to the user, ensuring that the package is easy to use while providing high-performance features optimized to take full advantage of your machine’s capabilities.
Additionally, an experimental implementation of the
Wavefront Alignment (WFA)
method 2 is included and can be
accessed by using the --wfa
parameter during execution.
Please note that this package requires an NVIDIA graphics card for proper functionality.
For detailed installation steps, please refer to the INSTALL file included with the package.
RbowtieCuda is compatible with all Cuda versions greater than or equal to 10.
To install the latest version of RbowtieCuda, ensure that you are running the most up-to-date version of R. As RbowtieCuda is part of the Bioconductor project, you can easily install it along with its dependencies by following these steps:
library(BiocManager)
BiocManager::install("RbowtieCuda")
As with any other R package, you need to load RbowtieCuda each time before using it, like this:
library(RbowtieCuda)
## RbowtieCuda is distributed under the BSD 3-Clause License.
## Please read the LICENSE file carefully before use.
##
## Note:
## - Redistribution and use (source/binary, with/without modification) are permitted
## under certain conditions.
## - The name of NVIDIA CORPORATION and its contributors may not be used to endorse
## or promote derived products without written permission.
## - This software is provided "AS IS", without warranty of any kind.
nvBWT is an application developed as part of the NVBIO library, designed to perform BWT-based reference indexing for nvBowtie and potentially other FM-index-based applications. When provided with one or more FASTA files, nvBWT generates both the forward and reverse Burrows-Wheeler Transform (BWT), along with a 2-bit packed representation of the sequences. In addition, it produces several auxiliary indices to support efficient alignment and querying.
td <- tempdir()
fa_file <- system.file(package="RbowtieCuda", "extdata", "bt2", "refs", "lambda_virus.fa")
nvBWT(myinput=fa_file, output=file.path(td, "index"), options="")
## [1] 0
will generate the following files:
index.pac
index.rpac
index.bwt
index.rbwt
index.sa
index.rsa
index.ann
index.amb
Warning: if you run the command in a directory that already contains these files, they will be deleted and new files will be generated.
nvBWT supports the following command options:
-v | --verbosity int (0-6) [5] // select the verbosity level
-m | --max-length int [inf] // clamp input length
-b | --byte-packing // output a byte-encoded .pac file
-w | --word-packing // output a word-encoded .wpac file (more efficient)
-c | --crc // compute CRCs
-d | --device // select a cuda device
nvBowtie is a GPU-accelerated re-engineering of Bowtie2, one of the most widely used short-read aligners. Completely rewritten from scratch, nvBowtie retains many of the key features of Bowtie2, though not all functionalities are replicated.
Designed to fully exploit the massive parallelism of modern GPUs, nvBowtie achieves significantly higher alignment throughput without sacrificing accuracy—or offers even greater accuracy within the same time frame. Despite its focus on performance, nvBowtie is carefully designed to align closely with Bowtie2 in terms of specificity and sensitivity, maintaining the same level of reliability for users.
To harness the computational power of modern processor architectures, nvBowtie re-implements the algorithms underlying Bowtie2 but adopts a fundamentally different approach. While Bowtie2 is optimized to process one read at a time—using multiple CPU threads to handle different reads simultaneously—nvBowtie operates on large batches of reads, treating their alignment as a pipeline. This pipeline consists of many relatively simple but highly parallel stages, each optimized for execution on GPUs. In several stages, the parallelism extends far beyond the read level, processing multiple candidate hits for each read simultaneously, enabling a much finer granularity of parallel computation.
We’ve introduced several new functions to nvBowtie. You can now
perform alignments using the WFA method by including the
--wfa (or --scoring wfa)
parameter. The WFA method
requires a large amount of RAM on the graphics card. We therefore
recommend using an Nvidia card with 8GB or more. Please note that this
feature is still experimental; it currently supports only end-to-end
alignments and does not yet allow customization of scoring parameters.
By default, it uses the following scoring: match:0, mismatch:1,
gap_open:1 and gap_ext:1.
Additionally, the --cache-writes
parameter optimizes
disk write operations, resulting in faster alignments. This
functionality requires 4GB of RAM and is limited to paired-end
alignments.
Reads_1 and Reads_2 represent raw paired-end read files in FASTQ format. Using a nvBWT index, these reads are mapped to the reference genome by invoking nvBowtie. The resulting alignments are stored in a BAM file, with its file path specified by the output parameter.
read_1 <- system.file(package="RbowtieCuda", "extdata", "bt2", "reads", "reads_1.fastq")
read_2 <- system.file(package="RbowtieCuda", "extdata", "bt2", "reads", "reads_2.fastq")
nvBowtie(file.path(td, "index"), file.path(td, "my_result.bam"), options="", seq1=read_1, seq2=read_2)
## [1] 1
nvBowtie does not automatically generate the .bai index files that are typically associated with .bam files.
These index files are essential for visualizing .bam files in tools such as the Integrative Genomics Viewer (IGV).
Fortunately, this issue can be easily resolved using the Rsamtools package, which includes the required functionality. For example, if you have generated a file named results.bam, you can create the corresponding index file with a simple command.
You only need to run the following in R:
library(Rsamtools)
sortBam("results.bam", "results")
indexBam("results.bam")
You can customize the alignment process by adjusting the available options, enabling you to optimize performance and accuracy according to your specific needs:
nvBowtie_usage()
## [1] "options:"
## [1] "General:"
## [1] " --verbosity int [5] verbosity level"
## [1] " --upto | -u int [-1] maximum number of reads to process"
## [1] " --trim3 | -3 int [0] trim the first N bases of 3'"
## [1] " --trim5 | -5 int [0] trim the first N bases of 5'"
## [1] " --nofw [false] do not align the forward strand"
## [1] " --norc [false] do not align the reverse-complemented strand"
## [1] " --device int [0] select the given cuda device(s) (e.g. --device 0 --device 1 ...)"
## [1] " --file-ref [false] load reference from file"
## [1] " --server-ref [false] load reference from server"
## [1] " --phred33 [true] qualities are ASCII characters equal to Phred quality + 33"
## [1] " --phred64 [false] qualities are ASCII characters equal to Phred quality + 64"
## [1] " --solexa-quals [false] qualities are in the Solexa format"
## [1] " --rg-id string add the RG-ID field of the SAM output header"
## [1] " --rg string,val add an RG-TAG field of the SAM output header"
## [1] " --cache-writes bool [false] speed up writes on disk"
## [1] "Paired-End:"
## [1] " --ff [false] paired mates are forward-forward"
## [1] " --fr [true] paired mates are forward-reverse"
## [1] " --rf [false] paired mates are reverse-forwardd"
## [1] " --rr [false] paired mates are reverse-reverse"
## [1] " --minins | -I int [0] minimum insert length"
## [1] " --maxins | -X int [500] maximum insert length"
## [1] " --overlap [true] allow overlapping mates"
## [1] " --no-mixed [false] only report paired alignments"
## [1] " --ungapped-mates | -ug perform ungapped mate alignment"
## [1] "Seeding:"
## [1] " --seed-len | -L int [22] seed lengths"
## [1] " --seed-freq | -i {G|L|S},x,y seed interval, as x + y*func(read-len) (G=log,L=linear,S=sqrt)"
## [1] " --max-hits int [100] maximum amount of seed hits"
## [1] " --max-reseed | -R int [2] number of reseeding rounds"
## [1] " --top bool [false] explore top seed entirely"
## [1] " --N bool [false] allow substitution in seed"
## [1] "Extension:"
## [1] " --mode {best,best-exact,all} [best] alignment mode\n"
## [1] " --all | -a [false] perform all-mapping (i.e. find and report all alignments)"
## [1] " --local [false] perform local alignment"
## [1] " --rand [true] randomized seed hit selection"
## [1] " --no-rand [false] do not randomize seed hit selection"
## [1] " --max-dist int [15] maximum edit distance"
## [1] " --max-effort-init int [15] initial maximum number of consecutive extension failures"
## [1] " --max-effort | -D int [15] maximum number of consecutive extension failures"
## [1] " --min-ext int [30] minimum number of extensions per read"
## [1] " --max-ext int [400] maximum number of extensions per read"
## [1] " --very-fast apply the very-fast presets"
## [1] " --fast apply the fast presets"
## [1] " --sensitive apply the sensitive presets"
## [1] " --very-sensitive apply the very-sensitive presets"
## [1] " --very-fast-local apply the very-fast presets"
## [1] " --fast-local apply the fast presets"
## [1] " --sensitive-local apply the sensitive presets"
## [1] " --very-sensitive-local apply the very-sensitive presets"
## [1] "Scoring:"
## [1] " --scoring {sw|ed|wfa} [ed] Smith-Waterman / Edit-Distance / Wfa scoring"
## [1] " --score-min {G|L|S},x,y minimum score function, as x + y*func(read-len)"
## [1] " --ma int match bonus"
## [1] " --mp int,int mismatch min/max penalties"
## [1] " --np int N penalty"
## [1] " --rdg int,int read open/extension gap penalties"
## [1] " --rfg int,int reference open/extension gap penalties"
## [1] "Alternative:"
## [1] " --wfa Activate wavefront algorithm"
## [1] "Reporting:"
## [1] " --mapQ-filter | -Q int [0] minimum mapQ threshold"
## [1] ""
## [1] ""
## [1] "Default values are indicated in brackets []."
## [1] ""
## [1] "The '--scoring-scheme filename' option allows to provide a custom Smith-Waterman scoring"
## [1] "scheme through a text file, where each line must contain a token value pair."
## [1] " The tokens and default values are reported below:"
## [1] "* match 0 // local alignment: 2"
## [1] "* mm-penalty-min 2"
## [1] "* mm-penalty-max 6"
## [1] "* N-penalty-min 1"
## [1] "* N-penalty-max 1"
## [1] "* score-min-const -0.6 // local alignment: 0"
## [1] "* score-min-coeff -0.6 // local alignment: 10"
## [1] "* score-min-type linear // local alignment: log"
## [1] "* N-ceil-const 0"
## [1] "* N-ceil-coeff 0.15"
## [1] "* read-gap-const 5"
## [1] "* read-gap-coeff 3"
## [1] "* ref-gap-const 5"
## [1] "* ref-gap-coeff 3"
## [1] "* gap-free 5"
You can execute unit tests for the program using the following commands or tools, ensuring its functionality and correctness:
nvbio_tests()
## [1] 0
And you can obtain version information by executing the following command:
nvBowtie_version()
## [1] 0
sessionInfo()
## R Under development (unstable) (2025-02-19 r87757)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.2 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.21-bioc-gpu/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.21-bioc-gpu/R/lib/libRlapack.so; LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: America/New_York
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] RbowtieCuda_1.0.1
##
## loaded via a namespace (and not attached):
## [1] digest_0.6.37 R6_2.6.1 fastmap_1.2.0 xfun_0.51
## [5] cachem_1.1.0 knitr_1.49 htmltools_0.5.8.1 rmarkdown_2.29
## [9] lifecycle_1.0.4 cli_3.6.4 sass_0.4.9 jquerylib_0.1.4
## [13] compiler_4.5.0 tools_4.5.0 evaluate_1.0.3 bslib_0.9.0
## [17] yaml_2.3.10 rlang_1.1.5 jsonlite_1.9.0
We would like to thank Ismael Galve Roperh for his assistance.
The main contributors of the original NVBIO are:
Jacopo Pantaleoni - jpantaleoni@nvidia.com
Nuno Subtil - nsubtil@nvidia.com
RbowtieCuda developers:
Samuel Simon Sanchez - samsimon@ucm.es
Franck RICHARD - franck.richard@winstars.net
The maintainer of the RbowtieCuda package is Franck RICHARD
A dedicated website with helpful resources for the RbowtieCuda package is available here, and a github copy here
[1] Langmead, B., & Salzberg, S. L. (2012). Fast gapped-read alignment with Bowtie 2. Nature methods, 9(4), 357-359.
[2] Marco-Sola S, Moure JC, Moreto M et al. Fast gap-affine pairwise alignment using the wavefront algorithm. Bioinformatics 2021;37: 456–63.