This vignette exemplifies how to perform single molecule sorting based on prior TF binding site annotation as per Sönmezer et al., 2021 and Kleinendorst & Barzaghi et al., 2021.
Sorting by single TF requires designing three bins with coordinates [-35;-25], [-15;+15], [+25,+35] relative to the center of the TF motif.
Here we exemplify the process manually for clarity.
RegionOfInterest = GRanges("chr12", IRanges(20464551, 20465050))
Methylation = qs::qread(system.file("extdata", "Methylation_3.qs", package="SingleMoleculeFootprinting"))
TFBSs = qs::qread(system.file("extdata", "TFBSs_3.qs", package="SingleMoleculeFootprinting"))
motif_center = start(IRanges::resize(TFBSs, 1, "center"))
SortingBins = c(
GRanges("chr1", IRanges(motif_center-35, motif_center-25)),
GRanges("chr1", IRanges(motif_center-15, motif_center+15)),
GRanges("chr1", IRanges(motif_center+25, motif_center+35))
)
PlotAvgSMF(
MethGR = Methylation[[1]], RegionOfInterest = RegionOfInterest,
TFBSs = TFBSs, SortingBins = SortingBins
)
## No sorted reads passed...plotting counts from all reads
Sorting molecules involves averaging, for each molecule, the binary methylation values for the cytosines overlapping those bins.
Reducing those averages to integers summarises each molecule to three binary digits.
We sort and plot molecules based on these digits.
The whole process is wrapped in the SortReadsBySingleTF
function.
SortedReads = SortReadsBySingleTF(MethSM = Methylation[[2]], TFBS = TFBSs)
## Designing sorting bins
## Collecting summarized methylation for bins
## Subsetting those reads that cover all bins
## Summarizing reads into patterns
## Splitting reads by pattern
N.b.: custom bins can be used by through the argument bins
.
The function returns a list with one item per sample.
Each of these is itself a list of up to eight items, one per possible combination of three binary digits, i.e. 000, 001, 010, etc.
Each of these items contains the IDs of the molecules sorted.
N.b.: patterns with 0 molecules will not be reported.
lapply(SortedReads$SMF_MM_TKO_DE_, head, 2)
## $`000`
## [1] "D00404:273:H5757BCXY:1:1102:2767:29201"
## [2] "D00404:273:H5757BCXY:1:1109:17973:76532"
##
## $`001`
## [1] "D00404:273:H5757BCXY:1:2107:14462:35292"
## [2] "D00404:273:H5757BCXY:1:2116:11877:22847"
##
## $`010`
## [1] "D00404:283:HCFT7BCXY:2:2206:12683:36087"
## [2] "NB501735:9:HLG5VBGX2:2:13108:18702:4995"
##
## $`011`
## [1] "D00404:273:H5757BCXY:1:2103:13490:80813"
## [2] "D00404:273:H5757BCXY:1:2210:11790:19673"
##
## $`100`
## [1] "D00404:273:H5757BCXY:1:1102:2692:28832"
## [2] "D00404:273:H5757BCXY:1:1108:7836:31800"
##
## $`101`
## [1] "D00404:273:H5757BCXY:1:1101:16782:14912"
## [2] "D00404:273:H5757BCXY:1:1103:15983:34280"
##
## $`110`
## [1] "D00404:273:H5757BCXY:1:1203:5101:74304"
## [2] "D00404:273:H5757BCXY:2:2111:8077:43320"
##
## $`111`
## [1] "D00404:273:H5757BCXY:1:1106:20721:33897"
## [2] "D00404:273:H5757BCXY:1:1107:16458:47084"
The number of molecules per pattern can be checked using lenghts
.
lengths(SortedReads$SMF_MM_TKO_DE_)
## 000 001 010 011 100 101 110 111
## 216 72 6 19 301 2372 51 932
Here most molecules show the 101 pattern.
These patterns are not immediately human readable. For convenience we hard-coded a biological interpretation in the function SingleTFStates
.
SingleTFStates()
## $bound
## [1] "101"
##
## $accessible
## [1] "111"
##
## $closed
## [1] "000" "100" "001"
##
## $unassigned
## [1] "010" "110" "011"
This function can be used together with the function StateQuantification
, to compute the frequencies of the biological states associated with single TF binding.
The function StateQuantificationBySingleTF
hard-codes the states
argument for convenience.
StateQuantification(SortedReads = SortedReads, states = SingleTFStates())
## # A tibble: 4 × 4
## Sample State Counts Freqs
## <chr> <chr> <int> <dbl>
## 1 SMF_MM_TKO_DE_ bound 2372 59.8
## 2 SMF_MM_TKO_DE_ accessible 932 23.5
## 3 SMF_MM_TKO_DE_ closed 589 14.8
## 4 SMF_MM_TKO_DE_ unassigned 76 1.91
Sorted molecules can be visualized with the PlotSM
function.
PlotSM(MethSM = Methylation[[2]], RegionOfInterest = RegionOfInterest, sorting.strategy = "classical", SortedReads = SortedReads)
## Arranging reads according to classical sorting.strategy
## Inferring sorting was performed by single TF
## Warning: Removed 80 rows containing missing values or values outside the scale range
## (`geom_tile()`).
A corresponding barplot can be obtained through the StateQuantificationPlot
or the SingleTFStateQuantificationPlot
which hard-codes the states
parameter.
Similarly to single TFs, sorting by TF motif clusters requires designing bins. Here we draw a [-7;+7] bin around each TF motif, a [-35;-25] bin relative to the most upstream TF motif and a [+25,+35] bin relative to the center of the most downstream TFBS, for a total of \(n+2\) bins, with \(n\) being the number of TF motifs.
Here we exemplify the process manually for clarity.
RegionOfInterest = GRanges("chr6", IRanges(88106000, 88106500))
Methylation = qs::qread(system.file("extdata", "Methylation_4.qs", package="SingleMoleculeFootprinting"))
TFBSs = qs::qread(system.file("extdata", "TFBSs_1.qs", package="SingleMoleculeFootprinting"))
motif_center_1 = start(IRanges::resize(TFBSs[1], 1, "center"))
motif_center_2 = start(IRanges::resize(TFBSs[2], 1, "center"))
SortingBins = c(
GRanges("chr6", IRanges(motif_center_1-35, motif_center_1-25)),
GRanges("chr6", IRanges(motif_center_1-7, motif_center_1+7)),
GRanges("chr6", IRanges(motif_center_2-7, motif_center_2+7)),
GRanges("chr6", IRanges(motif_center_2+25, motif_center_2+35))
)
PlotAvgSMF(
MethGR = Methylation[[1]], RegionOfInterest = RegionOfInterest, TFBSs = TFBSs, SortingBins = SortingBins
)
## No sorted reads passed...plotting counts from all reads