An Introduction to RbowtieCuda

Introduction

NVBIO is a library of reusable components developed by NVIDIA Corporation to accelerate bioinformatics applications using CUDA technology. Although specifically designed to harness the computational power of NVIDIA GPUs, many of its components are fully cross-platform and can be used in both host C++ and device CUDA code.

The purpose of NVBIO is twofold. It serves as a robust foundation for developing modern GPU-focused applications, ensuring that core computations delegated to the library automatically benefit from advances in GPU technology. At the same time, it provides valuable resources for designing novel bioinformatics algorithms tailored to massively parallel architectures.

In addition to its core components, NVBIO includes a suite of applications built on top of the library. Among these is nvBowtie, a re-engineered implementation of the widely recognized Bowtie2 short-read aligner ¹. Unlike many prototypes, nvBowtie is built as an industrial-grade aligner, replicating most of Bowtie2’s original features while adding enhancements such as efficient support for direct BAM output, with future plans for CRAM support.

Performances

nvBowtie is designed to fully exploit the massive parallelism of modern GPUs, delivering significantly higher alignment throughput without compromising accuracy, or achieving even greater accuracy in the same amount of time. For example, its performance was compared to Bowtie2 using an Illumina HiSeq 2000 dataset (the first 10 million reads of ERR161544) and an IonProton dataset, applying both end-to-end and local alignment methods. The results demonstrate an impressive 99.98% agreement at high MAPQ scores. For comparison, Bowtie2 tests were performed using 20 CPU threads with default alignment settings.

knitr::include_graphics("benchmark-nvbowtie-speedup.png")

benchmark

RBowtieCuda

This package provides an R wrapper for nvBio/nvBowtie, offering user-friendly interfaces specifically designed for R users. To maximize efficiency, the indexing and alignment functions (nvBWT and nvBowtie) are implemented in C++ and seamlessly integrated into R using the system2 function. This integration is fully transparent to the user, ensuring that the package is easy to use while providing high-performance features optimized to take full advantage of your machine’s capabilities.

Additionally, an experimental implementation of the Wavefront Alignment (WFA) method ² is included and can be accessed by using the --wfa parameter during execution.

Please note that this package requires an NVIDIA graphics card for proper functionality.

Additional Installation Instructions

For detailed installation steps, please refer to the INSTALL file included with the package.

RbowtieCuda is compatible with all Cuda versions greater than or equal to 10.

An Example Workflow Using RbowtieCuda

Installation

To install the latest version of RbowtieCuda, ensure that you are running the most up-to-date version of R. As RbowtieCuda is part of the Bioconductor project, you can easily install it along with its dependencies by following these steps:

library(BiocManager)
BiocManager::install("RbowtieCuda")

As with any other R package, you need to load RbowtieCuda each time before using it, like this:

library(RbowtieCuda)

## RbowtieCuda is distributed under the BSD 3-Clause License.
## Please read the LICENSE file carefully before use.
## 
## Note:
## - Redistribution and use (source/binary, with/without modification) are permitted
##   under certain conditions.
## - The name of NVIDIA CORPORATION and its contributors may not be used to endorse
##   or promote derived products without written permission.
## - This software is provided "AS IS", without warranty of any kind.

nvBWT : Building BWT Indices for Reference FASTA Files

nvBWT is an application developed as part of the NVBIO library, designed to perform BWT-based reference indexing for nvBowtie and potentially other FM-index-based applications. When provided with one or more FASTA files, nvBWT generates both the forward and reverse Burrows-Wheeler Transform (BWT), along with a 2-bit packed representation of the sequences. In addition, it produces several auxiliary indices to support efficient alignment and querying.

Example:

td <- tempdir()
fa_file <- system.file(package="RbowtieCuda", "extdata", "bt2", "refs", "lambda_virus.fa")
nvBWT(myinput=fa_file, output=file.path(td, "index"), options="")

## [1] 0

will generate the following files:

index.pac
index.rpac
index.bwt
index.rbwt
index.sa
index.rsa
index.ann
index.amb

Warning: if you run the command in a directory that already contains these files, they will be deleted and new files will be generated.

nvBWT supports the following command options:

-v       | --verbosity     int (0-6) [5]     // select the verbosity level
-m       | --max-length    int       [inf]   // clamp input length
-b       | --byte-packing                    // output a byte-encoded .pac file
-w       | --word-packing                    // output a word-encoded .wpac file (more efficient)
-c       | --crc                             // compute CRCs
-d       | --device                          // select a cuda device

nvBowtie Alignments:

nvBowtie is a GPU-accelerated re-engineering of Bowtie2, one of the most widely used short-read aligners. Completely rewritten from scratch, nvBowtie retains many of the key features of Bowtie2, though not all functionalities are replicated.

Designed to fully exploit the massive parallelism of modern GPUs, nvBowtie achieves significantly higher alignment throughput without sacrificing accuracy—or offers even greater accuracy within the same time frame. Despite its focus on performance, nvBowtie is carefully designed to align closely with Bowtie2 in terms of specificity and sensitivity, maintaining the same level of reliability for users.

To harness the computational power of modern processor architectures, nvBowtie re-implements the algorithms underlying Bowtie2 but adopts a fundamentally different approach. While Bowtie2 is optimized to process one read at a time—using multiple CPU threads to handle different reads simultaneously—nvBowtie operates on large batches of reads, treating their alignment as a pipeline. This pipeline consists of many relatively simple but highly parallel stages, each optimized for execution on GPUs. In several stages, the parallelism extends far beyond the read level, processing multiple candidate hits for each read simultaneously, enabling a much finer granularity of parallel computation.

New features :

We’ve introduced several new functions to nvBowtie. You can now perform alignments using the WFA method by including the --wfa (or --scoring wfa) parameter. The WFA method requires a large amount of RAM on the graphics card. We therefore recommend using an Nvidia card with 8GB or more. Please note that this feature is still experimental; it currently supports only end-to-end alignments and does not yet allow customization of scoring parameters. By default, it uses the following scoring: match:0, mismatch:1, gap_open:1 and gap_ext:1.

Additionally, the --cache-writes parameter optimizes disk write operations, resulting in faster alignments. This functionality requires 4GB of RAM and is limited to paired-end alignments.

Example :

Reads_1 and Reads_2 represent raw paired-end read files in FASTQ format. Using a nvBWT index, these reads are mapped to the reference genome by invoking nvBowtie. The resulting alignments are stored in a BAM file, with its file path specified by the output parameter.

read_1 <- system.file(package="RbowtieCuda", "extdata", "bt2", "reads", "reads_1.fastq")
read_2 <- system.file(package="RbowtieCuda", "extdata", "bt2", "reads", "reads_2.fastq")
nvBowtie(file.path(td, "index"), file.path(td, "my_result.bam"), options="", seq1=read_1, seq2=read_2)

## [1] 1

Indexing of bam files produced by nvBowtie

nvBowtie does not automatically generate the .bai index files that are typically associated with .bam files.

These index files are essential for visualizing .bam files in tools such as the Integrative Genomics Viewer (IGV).

Fortunately, this issue can be easily resolved using the Rsamtools package, which includes the required functionality. For example, if you have generated a file named results.bam, you can create the corresponding index file with a simple command.

You only need to run the following in R:

library(Rsamtools) 
sortBam("results.bam", "results")
indexBam("results.bam")

Options and Version of the nvBowtie Aligner

You can customize the alignment process by adjusting the available options, enabling you to optimize performance and accuracy according to your specific needs:

nvBowtie_usage()

## [1] "options:"
## [1] "General:"
## [1] "  --verbosity         int                    [5]        verbosity level"
## [1] "  --upto       | -u   int                    [-1]       maximum number of reads to process"
## [1] "  --trim3      | -3   int                    [0]        trim the first N bases of 3'"
## [1] "  --trim5      | -5   int                    [0]        trim the first N bases of 5'"
## [1] "  --nofw                                     [false]    do not align the forward strand"
## [1] "  --norc                                     [false]    do not align the reverse-complemented strand"
## [1] "  --device            int                    [0]        select the given cuda device(s) (e.g. --device 0 --device 1 ...)"
## [1] "  --file-ref                                 [false]    load reference from file"
## [1] "  --server-ref                               [false]    load reference from server"
## [1] "  --phred33                                  [true]     qualities are ASCII characters equal to Phred quality + 33"
## [1] "  --phred64                                  [false]    qualities are ASCII characters equal to Phred quality + 64"
## [1] "  --solexa-quals                             [false]    qualities are in the Solexa format"
## [1] "  --rg-id             string                            add the RG-ID field of the SAM output header"
## [1] "  --rg                string,val                        add an RG-TAG field of the SAM output header"
## [1] "  --cache-writes      bool                   [false]    speed up writes on disk"
## [1] "Paired-End:"
## [1] "  --ff                                       [false]    paired mates are forward-forward"
## [1] "  --fr                                       [true]     paired mates are forward-reverse"
## [1] "  --rf                                       [false]    paired mates are reverse-forwardd"
## [1] "  --rr                                       [false]    paired mates are reverse-reverse"
## [1] "  --minins     |  -I  int                    [0]        minimum insert length"
## [1] "  --maxins     |  -X  int                    [500]      maximum insert length"
## [1] "  --overlap                                  [true]     allow overlapping mates"
## [1] "  --no-mixed                                 [false]    only report paired alignments"
## [1] "  --ungapped-mates | -ug                                perform ungapped mate alignment"
## [1] "Seeding:"
## [1] "  --seed-len   | -L   int                    [22]       seed lengths"
## [1] "  --seed-freq  | -i   {G|L|S},x,y                       seed interval, as x + y*func(read-len) (G=log,L=linear,S=sqrt)"
## [1] "  --max-hits          int                    [100]      maximum amount of seed hits"
## [1] "  --max-reseed | -R   int                    [2]        number of reseeding rounds"
## [1] "  --top               bool                   [false]    explore top seed entirely"
## [1] "  --N                 bool                   [false]    allow substitution in seed"
## [1] "Extension:"
## [1] "  --mode              {best,best-exact,all}  [best]     alignment mode\n"
## [1] "  --all        | -a                          [false]    perform all-mapping (i.e. find and report all alignments)"
## [1] "  --local                                    [false]    perform local alignment"
## [1] "  --rand                                     [true]     randomized seed hit selection"
## [1] "  --no-rand                                  [false]    do not randomize seed hit selection"
## [1] "  --max-dist          int                    [15]       maximum edit distance"
## [1] "  --max-effort-init   int                    [15]       initial maximum number of consecutive extension failures"
## [1] "  --max-effort | -D   int                    [15]       maximum number of consecutive extension failures"
## [1] "  --min-ext           int                    [30]       minimum number of extensions per read"
## [1] "  --max-ext           int                    [400]      maximum number of extensions per read"
## [1] "  --very-fast                                           apply the very-fast presets"
## [1] "  --fast                                                apply the fast presets"
## [1] "  --sensitive                                           apply the sensitive presets"
## [1] "  --very-sensitive                                      apply the very-sensitive presets"
## [1] "  --very-fast-local                                     apply the very-fast presets"
## [1] "  --fast-local                                          apply the fast presets"
## [1] "  --sensitive-local                                     apply the sensitive presets"
## [1] "  --very-sensitive-local                                apply the very-sensitive presets"
## [1] "Scoring:"
## [1] "  --scoring           {sw|ed|wfa}            [ed]       Smith-Waterman / Edit-Distance / Wfa scoring"
## [1] "  --score-min         {G|L|S},x,y                       minimum score function, as x + y*func(read-len)"
## [1] "  --ma                int                               match bonus"
## [1] "  --mp                int,int                           mismatch min/max penalties"
## [1] "  --np                int                               N penalty"
## [1] "  --rdg               int,int                           read open/extension gap penalties"
## [1] "  --rfg               int,int                           reference open/extension gap penalties"
## [1] "Alternative:"
## [1] "  --wfa                                                 Activate wavefront algorithm"
## [1] "Reporting:"
## [1] "  --mapQ-filter | -Q  int                    [0]        minimum mapQ threshold"
## [1] ""
## [1] ""
## [1] "Default values are indicated in brackets []."
## [1] ""
## [1] "The '--scoring-scheme filename' option allows to provide a custom Smith-Waterman scoring"
## [1] "scheme through a text file, where each line must contain a token value pair."
## [1] " The tokens and default values are reported below:"
## [1] "*  match               0        // local alignment: 2"
## [1] "*  mm-penalty-min      2"
## [1] "*  mm-penalty-max      6"
## [1] "*  N-penalty-min       1"
## [1] "*  N-penalty-max       1"
## [1] "*  score-min-const     -0.6     // local alignment: 0"
## [1] "*  score-min-coeff     -0.6     // local alignment: 10"
## [1] "*  score-min-type      linear   // local alignment: log"
## [1] "*  N-ceil-const        0"
## [1] "*  N-ceil-coeff        0.15"
## [1] "*  read-gap-const      5"
## [1] "*  read-gap-coeff      3"
## [1] "*  ref-gap-const       5"
## [1] "*  ref-gap-coeff       3"
## [1] "*  gap-free            5"

You can execute unit tests for the program using the following commands or tools, ensuring its functionality and correctness:

nvbio_tests()

## [1] 0

And you can obtain version information by executing the following command:

nvBowtie_version()

## [1] 0

Session Information

sessionInfo()

## R Under development (unstable) (2025-02-19 r87757)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.2 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.21-bioc-gpu/R/lib/libRblas.so 
## LAPACK: /home/biocbuild/bbs-3.21-bioc-gpu/R/lib/libRlapack.so;  LAPACK version 3.12.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB              LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: America/New_York
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] RbowtieCuda_1.0.1
## 
## loaded via a namespace (and not attached):
##  [1] digest_0.6.37     R6_2.6.1          fastmap_1.2.0     xfun_0.51        
##  [5] cachem_1.1.0      knitr_1.49        htmltools_0.5.8.1 rmarkdown_2.29   
##  [9] lifecycle_1.0.4   cli_3.6.4         sass_0.4.9        jquerylib_0.1.4  
## [13] compiler_4.5.0    tools_4.5.0       evaluate_1.0.3    bslib_0.9.0      
## [17] yaml_2.3.10       rlang_1.1.5       jsonlite_1.9.0

Acknowledgement

We would like to thank Ismael Galve Roperh for his assistance.

Credits

The main contributors of the original NVBIO are:

Jacopo Pantaleoni - jpantaleoni@nvidia.com
Nuno Subtil - nsubtil@nvidia.com

RbowtieCuda developers:

Samuel Simon Sanchez - samsimon@ucm.es
Franck RICHARD - franck.richard@winstars.net

The maintainer of the RbowtieCuda package is Franck RICHARD

Website

A dedicated website with helpful resources for the RbowtieCuda package is available here, and a github copy here

References

[1] Langmead, B., & Salzberg, S. L. (2012). Fast gapped-read alignment with Bowtie 2. Nature methods, 9(4), 357-359.

[2] Marco-Sola S, Moure JC, Moreto M et al. Fast gap-affine pairwise alignment using the wavefront algorithm. Bioinformatics 2021;37: 456–63.

Langmead, B., & Salzberg, S. L. (2012). Fast gapped-read alignment with Bowtie 2. Nature methods, 9(4), 357-359.↩︎
Marco-Sola S, Moure JC, Moreto M et al. Fast gap-affine pairwise alignment using the wavefront algorithm. Bioinformatics 2021;37: 456–63.↩︎