Single-cell RNA sequencing has become a common approach to trace developmental processes of cells, however, using exogenous barcodes is more direct than predicting from expression profiles recently, based on that, as gene-editing technology matures, combining this technological method with exogenous barcodes can generate more complex dynamic information for single-cell. In this application note, we introduce an R package: LinTInd for reconstructing a tree from alleles generated by the genome-editing tool known as CRISPR for a moderate time period based on the order in which editing occurs, and for sc-RNA seq, ScarLin can also quantify the similarity between each cluster in three ways.
Via GitHub
devtools::install_github("mana-W/LinTInd")
Via Bioconductor
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("LinTInd")
library(LinTInd)
The input for LinTInd consists three required files:
and an optional file:
data<-paste0(system.file("extdata",package = 'LinTInd'),"/CB_UMI")
fafile<-paste0(system.file("extdata",package = 'LinTInd'),"/V3.fasta")
cutsite<-paste0(system.file("extdata",package = 'LinTInd'),"/V3.cutSites")
celltype<-paste0(system.file("extdata",package = 'LinTInd'),"/celltype.tsv")
data<-read.table(data,sep="\t",header=TRUE)
ref<-ReadFasta(fafile)
cutsite<-read.table(cutsite,col.names = c("indx","start","end"))
celltype<-read.table(celltype,header=TRUE,stringsAsFactors=FALSE)
For the sequence file, only the column contain reads’ strings is requeired, the cell barcodes and UMIs are both optional.
head(data,3)
## Read.ID
## 1 @A01045:289:HM7K3DRXX:2:2101:9896:1031
## 2 @A01045:289:HM7K3DRXX:2:2101:13367:1031
## 3 @A01045:289:HM7K3DRXX:2:2101:9959:1047
## Read.Seq
## 1 GAACGCGTAGGATAACATGGCCATCATCAAGGAGTTCTCATGCGCTTCAAGGTGCACATGGTTTATTGGAGCCGTACATGAACTGAGGTTAAGGACAGGATGTCCCAGGCGTAGGTAATTGGCCCCCTGCCCTTCGCCTGGGTTATAAGCTTCGGGTTTAAACGGGCCCTGGGGGTGGCATCCCTGTGACCCCTCCCCAGTGCCTCTCCTGGCCCTGGAAGTTGCCACTCCAGTGCCCACCAGCCTTGTC
## 2 GAACGCGTAGGATAACATGGCCATCATCAAGGAGTTCTCATGCGCTTCAAGGTGCACATGGTTTATTGGAGCCGTACATGAACTGAGGTTAAGGACAGGATGTCCCAGGCGTAGGTAATTGGCCCCCTGCCCTTCGCCTGGGTTATAAGCTTCGGGTTTAAACGGGCCCTGGGGGTGGCATCCCTGTGACCCCTCCCCAGTGCCTCTCCTGGCCCTGGAAGTTGCCACTCCAGTGCCCACCAGCCTTGTC
## 3 GAACGCGTAGGATAACATGGCCATCATCAAGGAGTTCTCATGCGCTTCAAGGTGCACATGGTTTATTGGAGCCGTACATGAACTGAGGTTAAGGACAGGATGTCCCAGGCGTAGGTAATTGGCCCCCTGCCCTTCGCCTGGGTTATAAGCTTCGGGTTTAAACGGGCCCTGGGGGTGGCATCCCTGTGACCCCTCCCCAGTGCCTCTCCTGGCCCTGGAAGTTGCCACTCCAGTGCCCACCAGCCTTGTC
## Cell.BC UMI
## 1 GAAGGGTAGCCTCAGC CTTCTCCCGA
## 2 ACCCTCACAAGACTGG TGTAATTTTT
## 3 GAAGGGTAGCCTCAGC CTTCTCCCGA
ref
## $scarfull
## 333-letter DNAString object
## seq: GAACGCGTAGGATAACATGGCCATCATCAAGGAGTT...GGAAGTTGCCACTCCAGTGCCCACCAGCCTTGTCCT
cutsite
## indx start end
## 1 0 39 267
## 2 1 1 23
## 3 2 28 50
## 4 3 55 77
## 5 4 82 104
## 6 5 109 131
## 7 6 136 158
## 8 7 163 185
head(celltype,3)
## Cell.BC Cell.type
## 1 AAGCGAGTCTTCTGTA A
## 2 AATCGACTCGTAGTGT A
## 3 ACATGCAGTCCACACG A
In the first step, we shold use FindIndel()
to alignment and find indels, and the function IndelForm()
will help us to generate an array-form string for each read.
scarinfo<-FindIndel(data=data,scarfull=ref,scar=cutsite,indel.coverage="All",type="test",cln=1)
scarinfo<-IndelForm(scarinfo,cln=1)
Then for single-cell sequencing, we shold define a final-version of array-form string for each cell use IndelIdents()
, there are three method are provided :
For bulk sequencing, in this step, we will generate a “cell barcode” for each read.
cellsinfo<-IndelIdents(scarinfo,method.use="umi.num",cln=1)
After define the indels for each cell, we can use IndelPlot()
to visualise them.
IndelPlot(cellsinfo = cellsinfo)
We can use the function TagProcess()
to extract indels for cells/reads. The parameter Cells is optional.
tag<-TagProcess(cellsinfo$info,Cells=celltype)
And if the annotation of each cells are provided, we can also use TagDist()
to calculate the relationship between each group in three way:
The heatmap of this result will be saved as a pdf file.
tag_dist=TagDist(tag,method = "Jaccard")
## Using Cell.type as value column: use value.var to override.
## Aggregation function missing: defaulting to length
tag_dist
## A B C D E
## A 1.0000000 0.4925373 0.2794118 0.2985075 0.2058824
## B 0.4925373 1.0000000 0.5588235 0.6060606 0.4117647
## C 0.2794118 0.5588235 1.0000000 0.9047619 0.7500000
## D 0.2985075 0.6060606 0.9047619 1.0000000 0.6666667
## E 0.2058824 0.4117647 0.7500000 0.6666667 1.0000000
In the laste part, we can use BuildTree()
to Generate an array generant tree.
treeinfo<-BuildTree(tag)
## Using Cell.num as value column: use value.var to override.
Finally, we can use the function PlotTree()
to visualise the tree created before.
plotinfo<-PlotTree(treeinfo = treeinfo,data.extract = "TRUE",annotation = "TRUE")
## Using tags as id variables
## ℹ invalid tbl_tree object. Missing column: parent,node.
## ℹ invalid tbl_tree object. Missing column: parent,node.
## ℹ invalid tbl_tree object. Missing column: parent,node.
## ℹ invalid tbl_tree object. Missing column: parent,node.
plotinfo$p
sessionInfo()
## R version 4.5.0 RC (2025-04-04 r88126)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.2 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.21-bioc/R/lib/libRblas.so
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0 LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: America/New_York
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 parallel stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] LinTInd_1.12.0 S4Vectors_0.46.0 BiocGenerics_0.54.0
## [4] generics_0.1.3 ggplot2_3.5.2
##
## loaded via a namespace (and not attached):
## [1] stringdist_0.9.15 gtable_0.3.6 xfun_0.52
## [4] bslib_0.9.0 htmlwidgets_1.6.4 rlist_0.4.6.2
## [7] lattice_0.22-7 vctrs_0.6.5 tools_4.5.0
## [10] yulab.utils_0.2.0 tibble_3.2.1 pkgconfig_2.0.3
## [13] pheatmap_1.0.12 data.table_1.17.0 ggnewscale_0.5.1
## [16] ggplotify_0.1.2 RColorBrewer_1.1-3 lifecycle_1.0.4
## [19] GenomeInfoDbData_1.2.14 stringr_1.5.1 compiler_4.5.0
## [22] farver_2.1.2 treeio_1.32.0 Biostrings_2.76.0
## [25] munsell_0.5.1 data.tree_1.1.0 ggtree_3.16.0
## [28] ggfun_0.1.8 GenomeInfoDb_1.44.0 htmltools_0.5.8.1
## [31] sass_0.4.10 yaml_2.3.10 lazyeval_0.2.2
## [34] pillar_1.10.2 crayon_1.5.3 jquerylib_0.1.4
## [37] tidyr_1.3.1 cachem_1.1.0 nlme_3.1-168
## [40] tidyselect_1.2.1 aplot_0.2.5 digest_0.6.37
## [43] stringi_1.8.7 reshape2_1.4.4 dplyr_1.1.4
## [46] purrr_1.0.4 labeling_0.4.3 cowplot_1.1.3
## [49] fastmap_1.2.0 grid_4.5.0 colorspace_2.1-1
## [52] cli_3.6.4 magrittr_2.0.3 patchwork_1.3.0
## [55] ape_5.8-1 withr_3.0.2 scales_1.3.0
## [58] UCSC.utils_1.4.0 pwalign_1.4.0 rmarkdown_2.29
## [61] XVector_0.48.0 httr_1.4.7 networkD3_0.4.1
## [64] igraph_2.1.4 evaluate_1.0.3 knitr_1.50
## [67] IRanges_2.42.0 gridGraphics_0.5-1 rlang_1.1.6
## [70] Rcpp_1.0.14 glue_1.8.0 tidytree_0.4.6
## [73] jsonlite_2.0.0 plyr_1.8.9 R6_2.6.1
## [76] fs_1.6.6