The rhinotypeR
package is designed to simplify the genotyping of rhinoviruses using the VP4/2 genomic region. Having worked on rhinoviruses for a few years, I noticed that assigning genotypes after sequencing was particularly laborious, and needed several manual interventions. We, therefore, developed this package to address this challenge by streamlining the process by enabling a user to download prototype sequences, calculate genetic pairwise distances, and compare the distances to prototype strains for genotype assignment. It also provides visualization options such as frequency plots and simple phylogenetic trees.
You can install rhinotypeR from BioConductor using
if (!require("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("rhinotypeR")
library(rhinotypeR)
The getPrototypeSeqs
function downloads the prototype sequences required for genotyping. These should the be combined with the newly generated sequences, aligned using a suitable software, and imported into R. For example, to download to the Desktop directory, one can run:
getPrototypeSeqs("~/Desktop")
Use the Biostrings package to read FASTA files containing sequence data. This extracts the sequence data and headers information and should be stored into an object for downstream analysis.
sequences <- Biostrings::readDNAStringSet(system.file("extdata", "input_aln.fasta", package="rhinotypeR"))
The SNPeek
function visualizes single nucleotide polymorphisms (SNPs) in the sequences, with a select sequence acting as the reference. To specify the reference sequences, move it to the bottom of the alignment before importing into R. Substitutions are color-coded by the nucleotide i.e.,
A = green
T = red
C = blue
G = yellow
SNPeek(sequences)
The pairwiseDistances
function calculates genetic distances between sequences, using a specified evolutionary model.
distances <- pairwiseDistances(sequences, model = "p-distance", gapDeletion = TRUE)
The distance matrix looks like:
## AF343653.1_B26 MT177836.1 MT177837.1 AY040242.1_B97
## AF343653.1_B26 0.0000000 0.2435897 0.2435897 0.2243590
## MT177836.1 0.2435897 0.0000000 0.0000000 0.1185897
## MT177837.1 0.2435897 0.0000000 0.0000000 0.1185897
## AY040242.1_B97 0.2243590 0.1185897 0.1185897 0.0000000
## AF343654.1_B27 0.2147436 0.1698718 0.1698718 0.1794872
## AF343654.1_B27 AY040239.1_B93 AY040240.1_B84
## AF343653.1_B26 0.2147436 0.2435897 0.2115385
## MT177836.1 0.1698718 0.1506410 0.1923077
## MT177837.1 0.1698718 0.1506410 0.1923077
## AY040242.1_B97 0.1794872 0.1634615 0.2083333
## AF343654.1_B27 0.0000000 0.1185897 0.1891026
The assignTypes function assigns genotypes to the sequences by comparing genetic distances to prototype strains.
genotypes <- assignTypes(sequences, model = "p-distance", gapDeletion = TRUE, threshold = 0.105)
head(genotypes)
## query assignedType distance reference
## MT177836.1 MT177836.1 unassigned NA AY040242.1_B97
## MT177837.1 MT177837.1 unassigned NA AY040242.1_B97
## MT177838.1 MT177838.1 B99 0.08974359 AF343652.1_B99
## MT177793.1 MT177793.1 B42 0.08012821 AY016404.1_B42
## MT177794.1 MT177794.1 B106 0.05769231 KP736587.1_B106
## MT177795.1 MT177795.1 B106 0.05769231 KP736587.1_B106
The plotFrequency
function visualizes the frequency of assigned genotypes. This function uses the output of assignTypes
as input.
plotFrequency(genotypes)
The plotDistances
function visualizes pairwise genetic distances in a heatmap. This function uses the output of pairwiseDistances
as input.
plotDistances(distances)
The plotTree
function plots a simple phylogenetic tree. This function uses the output of pairwiseDistances
as input.
# sub-sample
sampled_distances <- distances[1:30,1:30]
plotTree(sampled_distances, hang = -1, cex = 0.6, main = "A simple tree", xlab = "", ylab = "Genetic distance")
The rhinotypeR package simplifies the process of genotyping rhinoviruses and analyzing their genetic data. By automating various steps and providing visualization tools, it enhances the efficiency and accuracy of rhinovirus epidemiological studies.
sessionInfo()
## R version 4.4.1 (2024-06-14)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.20-bioc/R/lib/libRblas.so
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: America/New_York
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] rhinotypeR_1.0.0
##
## loaded via a namespace (and not attached):
## [1] crayon_1.5.3 httr_1.4.7 cli_3.6.3
## [4] knitr_1.48 rlang_1.1.4 xfun_0.48
## [7] highr_0.11 UCSC.utils_1.2.0 jsonlite_1.8.9
## [10] S4Vectors_0.44.0 Biostrings_2.74.0 htmltools_0.5.8.1
## [13] sass_0.4.9 stats4_4.4.1 rmarkdown_2.28
## [16] evaluate_1.0.1 jquerylib_0.1.4 fastmap_1.2.0
## [19] IRanges_2.40.0 lifecycle_1.0.4 GenomeInfoDb_1.42.0
## [22] compiler_4.4.1 XVector_0.46.0 digest_0.6.37
## [25] R6_2.5.1 GenomeInfoDbData_1.2.13 bslib_0.8.0
## [28] tools_4.4.1 zlibbioc_1.52.0 BiocGenerics_0.52.0
## [31] cachem_1.1.0