We illustrate the CRISPR base editing (CRISPRbe) functionalities of
crisprDesign by designing and characterizing gRNAs
targeting the gene IQSEC3 using the cytidine base editor BE4max (Koblan et al. 2018).
We first load the BE4max BaseEditor object available in
crisprBase:
## Class: BaseEditor
## CRISPR Nuclease name: SpCas9
## Target type: DNA
## Metadata: list of length 2
## PAMs: NGG, NAG, NGA
## Weights: 1, 0.2593, 0.0694
## Spacer length: 20
## PAM side: 3prime
## Distance from PAM: 0
## Prototype protospacers: 5'--SSSSSSSSSSSSSSSSSSSS[NGG]--3', 5'--SSSSSSSSSSSSSSSSSSSS[NAG]--3', 5'--SSSSSSSSSSSSSSSSSSSS[NGA]--3'
## Base editor name: BE4max
## Editing strand: original
## Maximum editing weight: C2T at position -15
The editing probabilities of the base editor BE4max are stored in a
matrix where rows correspond to the different nucleotide substitutions,
and columns correspond to the genomic coordinate relative to the PAM
site. The editingWeights function from
crisprBase allows us to retrieve those probabilities. One
can see that C to T editing is optimal around 15 nucleotides upstream of
the PAM site for the BE4max base editor:
## -36 -35 -34 -33 -32 -31 -30 -29 -28 -27 -26 -25 -24
## 0.007 0.007 0.008 0.018 0.010 0.020 0.014 0.012 0.023 0.013 0.024 0.022 0.034
## -23 -22 -21 -20 -19 -18 -17 -16 -15 -14 -13 -12 -11
## 0.022 0.021 0.035 0.058 0.162 0.318 0.632 0.903 1.000 0.870 0.620 0.314 0.163
## -10 -9 -8 -7 -6 -5 -4 -3 -2 -1
## 0.100 0.056 0.033 0.019 0.018 0.024 0.017 0.005 0.002 0.001
To learn how to build a object for your own base editor enzyme, see the package vignette.
Next, we load the grListExample object in
crisprDesign that contains gene coordinates in hg38 for
exons of all human IQSEC3 isoforms. The object was obtained by
converting an Ensembl TxDb object into a
GRangesList object using the TxDb2GRangesList
convenience function in crisprDesign.
The queryTxObject function allows us to query such
objects for a specific gene and feature. Here, we obtain a
GRanges object containing the first exon of the IQSEC3
gene:
gr <- queryTxObject(txObject=grListExample,
featureType="cds",
queryColumn="gene_symbol",
queryValue="IQSEC3")and retain the first exon only:
findSpacers is the main function to obtain a list of all
possible spacer sequences targeting protospacers located in the target
DNA sequence(s). If a GRanges object is provided as input,
a BSgenome object (object containing sequences of a
reference genome) will need to be provided as well:
library(BSgenome.Hsapiens.UCSC.hg38)
bsgenome <- BSgenome.Hsapiens.UCSC.hg38
guideSet <- findSpacers(gr,
bsgenome=bsgenome,
crisprNuclease=BE4max,
strict_overlap=FALSE)
guideSet## GuideSet object with 129 ranges and 5 metadata columns:
## seqnames ranges strand | protospacer pam
## <Rle> <IRanges> <Rle> | <DNAStringSet> <DNAStringSet>
## spacer_1 chr12 66884 + | GCAGGCGGCGGCGGGCGGCA TGG
## spacer_2 chr12 66893 - | CGCGCACCGGATTCTCCAGC AGG
## spacer_3 chr12 66896 + | GGGCGGCATGGAGAGCCTGC TGG
## spacer_4 chr12 66905 + | GGAGAGCCTGCTGGAGAATC CGG
## spacer_5 chr12 66906 - | AGGTAGAGCACGGCGCGCAC CGG
## ... ... ... ... . ... ...
## spacer_125 chr12 67434 - | TCTCCTCTCCCCGCTCACTC AGG
## spacer_126 chr12 67442 + | GAGCAGGAGACCTGAGTGAG CGG
## spacer_127 chr12 67443 + | AGCAGGAGACCTGAGTGAGC GGG
## spacer_128 chr12 67444 + | GCAGGAGACCTGAGTGAGCG GGG
## spacer_129 chr12 67449 + | AGACCTGAGTGAGCGGGGAG AGG
## pam_site cut_site region
## <numeric> <numeric> <character>
## spacer_1 66884 66881 region_1
## spacer_2 66893 66896 region_1
## spacer_3 66896 66893 region_1
## spacer_4 66905 66902 region_1
## spacer_5 66906 66909 region_1
## ... ... ... ...
## spacer_125 67434 67437 region_1
## spacer_126 67442 67439 region_1
## spacer_127 67443 67440 region_1
## spacer_128 67444 67441 region_1
## spacer_129 67449 67446 region_1
## -------
## seqinfo: 711 sequences (1 circular) from hg38 genome
## crisprNuclease: SpCas9
The argument set to FALSE enables spacer sequences to be not-strictly
overlapping with the CDS region; this is useful for base editing design
as the editing window can extend beyond the protospacer sequence region.
The function returns a GuideSet object that stores genomic
coordinates for all spacer sequences found in the regions provided by
gr. The GuideSet object is an extension of a
GenomicRanges object that stores additional information
about gRNAs. For the sake of time, we will retain only two gRNAs:
## GuideSet object with 2 ranges and 5 metadata columns:
## seqnames ranges strand | protospacer pam
## <Rle> <IRanges> <Rle> | <DNAStringSet> <DNAStringSet>
## spacer_50 chr12 67138 - | TTCCTGGGCCCTGGCGGCGG CGG
## spacer_51 chr12 67141 - | AGGTTCCTGGGCCCTGGCGG CGG
## pam_site cut_site region
## <numeric> <numeric> <character>
## spacer_50 67138 67141 region_1
## spacer_51 67141 67144 region_1
## -------
## seqinfo: 711 sequences (1 circular) from hg38 genome
## crisprNuclease: SpCas9
The function addEditedAlleles finds, characterizes, and
scores predicted edited alleles for each gRNA, for a chosen transcript.
It requires a transcript-specific annotation that can be obtained using
the function getTxInfoDataFrame. Here, we will perform the
analysis using the main isoform of IQSEC3 (transcript id
ENST00000538872).
We first get the transcript table for ENST00000538872,
txid <- "ENST00000538872"
txTable <- getTxInfoDataFrame(tx_id=txid,
txObject=grListExample,
bsgenome=bsgenome,
extend=30)
head(txTable)## DataFrame with 6 rows and 10 columns
## chr pos nuc aa aa_number exon pos_plot
## <character> <numeric> <character> <character> <integer> <integer> <integer>
## 1 chr12 66737 G NA NA NA 1
## 2 chr12 66738 G NA NA NA 2
## 3 chr12 66739 C NA NA NA 3
## 4 chr12 66740 A NA NA NA 4
## 5 chr12 66741 G NA NA NA 5
## 6 chr12 66742 T NA NA NA 6
## pos_mrna pos_cds region
## <integer> <integer> <character>
## 1 NA NA Upstream
## 2 NA NA Upstream
## 3 NA NA Upstream
## 4 NA NA Upstream
## 5 NA NA Upstream
## 6 NA NA Upstream
The argument extend specifies the number of nucleotides
upstream and downstream of the exons to include. This is useful to
characterize gRNAs overlapping splice junctions. The region
column indicates where the location of the nucleotide: 3UTR, 5UTR, CDS,
Intron, and Upstream and Downstream of the CDS.
We are ready to add predicted alleles to the GuideSet
object:
editingWindow <- c(-20,-8)
guideSet <- addEditedAlleles(guideSet,
baseEditor=BE4max,
txTable=txTable,
editingWindow=editingWindow,
minEditingWeight = 0,
minMutationScore = 0.3)## [addEditedAlleles] Obtaining edited alleles at each gRNA target site.
## [addEditedAlleles] Adding functional consequences to alleles.
## [addEditedAlleles] Adding summary to GuideSet.
The editingWindow argument specifies the window of
editing that we are interested in. When not provided, it uses the
default window provided in the BaseEditor object. Note that
providing large windows can exponentially increase computing time as the
number of possible alleles grows exponentially.
The minEditingWeight specifies the minimum editing
weight required for an allele to be listed as a predicted allele.
Default is 0. A higher threshold can be used to omit alleles with low
probabilities.
The mutationScore specifies a minimum predicted
probability for labeling an allele with a predicted variant. Alleles
with scores lower than this threshold will be labeled as “not_editing”.
Default of 0.3.
For each gRNA, a predicted alleles annotation is stored and can be
retrieved using the editedAlleles accessor function. As an
example, let’s retrieve the predicted alleles for the first gRNA:
## DataFrame with 99 rows and 9 columns
## seq score variant positions changes
## <DNAStringSet> <numeric> <character> <character> <character>
## spacer_50 TTCTTGGGCCCTG 0.2021425 silent 91 NA
## spacer_50 TTTTTGGGCCCTG 0.0975437 missense 92 E92K
## spacer_50 TTCTTGGGTCCTG 0.0960253 missense 90 A90T
## spacer_50 TTTCTGGGCCCTG 0.0495436 missense 92 E92K
## spacer_50 TTCCTGGGTCCTG 0.0487724 missense 90 A90T
## ... ... ... ... ... ...
## spacer_50 TTCCTGGGCCATG 0.000345692 missense 89 R89M
## spacer_50 TTATTGGGCCTTG 0.000344266 nonsense 89;92 R89K;E92*
## spacer_50 TTCATGGGTCTTG 0.000341052 missense_multi 89;90;91 R89K;A90T;Q91H
## spacer_50 TTTTTGGGTACTG 0.000336996 missense_multi 89;90;92 R89S;A90T;E92K
## spacer_50 TTTTTGGGTGCTG 0.000336996 missense_multi 89;90;92 R89S;A90T;E92K
## aa n_mismatches n_nonsense n_missense
## <character> <integer> <numeric> <numeric>
## spacer_50 ARRRAAAQQQEEE 0 0 0
## spacer_50 ARRRAAAQQQKKK 1 0 1
## spacer_50 ARRRTTTQQQEEE 1 0 1
## spacer_50 ARRRAAAQQQKKK 1 0 1
## spacer_50 ARRRTTTQQQEEE 1 0 1
## ... ... ... ... ...
## spacer_50 AMMMAAAQQQEEE 1 0 1
## spacer_50 AKKKAAAQQQ*** 2 1 1
## spacer_50 AKKKTTTHHHEEE 3 0 3
## spacer_50 ASSSTTTQQQKKK 3 0 3
## spacer_50 ASSSTTTQQQKKK 3 0 3
The resulting DataFrame is ordered so that the top
predicted alleles (based on the score column) are shown
first. The score represents the likelihood of the edited
allele to occur relative to all possible edited alleles, and is
calculated using the editing weights stored in the BE4max
object. The seq column represents the edited nucleotide
sequences. They are always reported from the 5’ to 3’ direction on the
strand corresponding to the gRNA strand.
n_mismatches column indicates the number of amino
acid that differs between the edited allele and the wildtype
allele.n_nonsense and n_missense columns
indicate the number of mismatches that are nonsense and missense
mutation, respectively. Those two columns sum to the
n_mismatches column.variant column indicates the functional consequence
of the allele. There are 7 possible choices:
silent: single or multiple silent mutationsmissense: single missense mutationnonsense: single nonsense mutationmissense_multi: multiple missense mutationsnonsense_multi: multiple nonsense mutationssplice_junction: mutation in a splice junctionnot_targeting: no mutations found in CDS or splice
junctions In case an edited allele leads to multiple editing events that
have different variant label, the most detrimental mutation (splice
junction over nonsense, nonsense over missense, missense over silent) is
reported.positions column lists the amino acid numbers where
mutations occur. For alleles that are labeled as
splice_junction, it lists the closest amino acid.aa column reports the result edited amino acid
sequence.The alleles object also contains useful metadata information that can
be accessed using the metadata accessor function:
## $wildtypeAllele
## spacer_50
## "TTCCTGGGCCCTG"
##
## $start
## [1] 67146
##
## $end
## [1] 67158
##
## $chr
## [1] "chr12"
##
## $strand
## [1] "-"
##
## $editingWindow
## [1] -20 -8
##
## $wildtypeAmino
## [1] "ARRRAAAQQQEEE"
The wildtypeAllele reports the unedited nucleotide
sequence of the region specified by the editing window (with respect to
the gRNA PAM site). It is always reported from the 5’ to 3’ direction on
the strand corresponding to the gRNA strand. The start and
end specify the corresponding coordinates on the
transcript.
For both analysis and visualization purposes, it is often useful to
label gRNAs with a score and label that represents the most likely
functional consequence of that gRNA, for a given base editor. The
addEditedAlleles function described above also implements a
gRNA-level aggregate scoring scheme that adds several gRNA-level
aggregate scores to the GuideSet object:
## GuideSet object with 2 ranges and 18 metadata columns:
## seqnames ranges strand | protospacer pam
## <Rle> <IRanges> <Rle> | <DNAStringSet> <DNAStringSet>
## spacer_50 chr12 67138 - | TTCCTGGGCCCTGGCGGCGG CGG
## spacer_51 chr12 67141 - | AGGTTCCTGGGCCCTGGCGG CGG
## pam_site cut_site region
## <numeric> <numeric> <character>
## spacer_50 67138 67141 region_1
## spacer_51 67141 67144 region_1
## editedAlleles
## <list>
## spacer_50 TTCTTGGGCCCTG:0.2021425:silent:...,TTTTTGGGCCCTG:0.0975437:missense:...,TTCTTGGGTCCTG:0.0960253:missense:...,...
## spacer_51 AGGTTTTTGGGCC:0.7819826:missense:...,AGGTTTGTGGGCC:0.0503345:missense_multi:...,AGGTTTTTGGGTC:0.0465862:missense_multi:...,...
## score_missense_single score_missense_multi score_nonsense_multi
## <numeric> <numeric> <numeric>
## spacer_50 0.436801 0.173223 0
## spacer_51 0.900358 0.153410 0
## score_nonsense_single score_silent score_splice_junction
## <numeric> <numeric> <numeric>
## spacer_50 0.00777293 0.262366 0
## spacer_51 0.02991694 0.000000 0
## score_missense score_nonsense maxVariant maxVariantScore
## <numeric> <numeric> <character> <numeric>
## spacer_50 0.610024 0.00777293 missense 0.610024
## spacer_51 1.053768 0.02991694 missense 1.053768
## aaChanges aaPos
## <character> <character>
## spacer_50 E92K(0.31);A90T(0.31.. 92
## spacer_51 E92K(1);Q91H(0.09);A.. 92
## -------
## seqinfo: 711 sequences (1 circular) from hg38 genome
## crisprNuclease: SpCas9
score_missense_single: sum of probability scores across
all alleles with a single missense mutationscore_nonsense_single: sum of probability scores across
all alleles with a single nonsense mutationscore_missense_multi: sum of probability scores across
all alleles with multiple missense mutationsscore_nonsense_multi: sum of probability scores across
all alleles with multiple nonsense mutationsscore_silent: sum of probability scores across all
alleles with only silent mutationsscore_splice_junction: sum of probability scores across
all alleles with a splice junction mutationscore_missense: score_missense_single and
score_missense_multiple added togetherscore_nonsense: score_nonsense_single and
score_nonsense_multiple added togethermaxVariantScore: maximum score across all score
columnsmaxVariant: variant label for the maximum scoreaapos: amino acid position for the top predicted allele
for the variant category that has the maximum scoreFor both gRNAs, the highest scores are missense, and therefore the
maxVariant is set to missense.
## R version 4.5.2 (2025-10-31)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.3 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Etc/UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] BSgenome.Hsapiens.UCSC.hg38_1.4.5 BSgenome_1.79.1
## [3] rtracklayer_1.71.3 BiocIO_1.21.0
## [5] Biostrings_2.79.4 XVector_0.51.0
## [7] GenomicRanges_1.63.1 GenomeInfoDb_1.47.2
## [9] Seqinfo_1.1.0 IRanges_2.45.0
## [11] S4Vectors_0.49.0 BiocGenerics_0.57.0
## [13] generics_0.1.4 crisprDesign_1.13.9
## [15] crisprBase_1.15.0 BiocStyle_2.39.0
##
## loaded via a namespace (and not attached):
## [1] DBI_1.2.3 bitops_1.0-9
## [3] httr2_1.2.2 biomaRt_2.67.1
## [5] rlang_1.1.7 magrittr_2.0.4
## [7] Rbowtie_1.51.0 matrixStats_1.5.0
## [9] compiler_4.5.2 RSQLite_2.4.5
## [11] GenomicFeatures_1.63.1 png_0.1-8
## [13] vctrs_0.7.1 txdbmaker_1.7.3
## [15] stringr_1.6.0 pkgconfig_2.0.3
## [17] crayon_1.5.3 fastmap_1.2.0
## [19] dbplyr_2.5.1 Rsamtools_2.27.0
## [21] rmarkdown_2.30 tzdb_0.5.0
## [23] UCSC.utils_1.7.1 bit_4.6.0
## [25] xfun_0.56 randomForest_4.7-1.2
## [27] cachem_1.1.0 cigarillo_1.1.0
## [29] jsonlite_2.0.0 progress_1.2.3
## [31] blob_1.3.0 DelayedArray_0.37.0
## [33] BiocParallel_1.45.0 prettyunits_1.2.0
## [35] parallel_4.5.2 R6_2.6.1
## [37] VariantAnnotation_1.57.1 bslib_0.10.0
## [39] stringi_1.8.7 reticulate_1.44.1
## [41] jquerylib_0.1.4 Rcpp_1.1.1
## [43] SummarizedExperiment_1.41.0 knitr_1.51
## [45] readr_2.1.6 Matrix_1.7-4
## [47] tidyselect_1.2.1 abind_1.4-8
## [49] yaml_2.3.12 codetools_0.2-20
## [51] curl_7.0.0 lattice_0.22-7
## [53] tibble_3.3.1 Biobase_2.71.0
## [55] KEGGREST_1.51.1 evaluate_1.0.5
## [57] crisprScoreData_1.15.0 BiocFileCache_3.1.0
## [59] ExperimentHub_3.1.0 pillar_1.11.1
## [61] BiocManager_1.30.27 filelock_1.0.3
## [63] MatrixGenerics_1.23.0 crisprScore_1.15.2
## [65] RCurl_1.98-1.17 BiocVersion_3.23.1
## [67] hms_1.1.4 glue_1.8.0
## [69] maketools_1.3.2 tools_4.5.2
## [71] AnnotationHub_4.1.0 sys_3.4.3
## [73] GenomicAlignments_1.47.0 buildtools_1.0.0
## [75] XML_3.99-0.20 grid_4.5.2
## [77] AnnotationDbi_1.73.0 restfulr_0.0.16
## [79] cli_3.6.5 rappdirs_0.3.4
## [81] S4Arrays_1.11.1 dplyr_1.1.4
## [83] crisprBowtie_1.15.0 sass_0.4.10
## [85] digest_0.6.39 SparseArray_1.11.10
## [87] rjson_0.2.23 memoise_2.0.1
## [89] htmltools_0.5.9 lifecycle_1.0.5
## [91] httr_1.4.7 bit64_4.6.0-1