Introduction

This vignette exemplifies how to perform single molecule sorting based on prior TF binding site annotation as per Sönmezer et al., 2021 and Kleinendorst & Barzaghi et al., 2021.

Loading libraries

suppressWarnings(library(SingleMoleculeFootprinting))
suppressWarnings(library(BSgenome.Mmusculus.UCSC.mm10))

Sorting by single TF

Sorting by single TF requires designing three bins with coordinates [-35;-25], [-15;+15], [+25,+35] relative to the center of the TF motif.
Here we exemplify the process manually for clarity.

RegionOfInterest = GRanges("chr12", IRanges(20464551, 20465050))
Methylation = qs::qread(system.file("extdata", "Methylation_3.qs", package="SingleMoleculeFootprinting"))
TFBSs = qs::qread(system.file("extdata", "TFBSs_3.qs", package="SingleMoleculeFootprinting"))

motif_center = start(IRanges::resize(TFBSs, 1, "center"))
SortingBins = c(
  GRanges("chr1", IRanges(motif_center-35, motif_center-25)),
  GRanges("chr1", IRanges(motif_center-15, motif_center+15)),
  GRanges("chr1", IRanges(motif_center+25, motif_center+35))
)

PlotAvgSMF(
  MethGR = Methylation[[1]], RegionOfInterest = RegionOfInterest, 
  TFBSs = TFBSs, SortingBins = SortingBins
  )
## No sorted reads passed...plotting counts from all reads

Sorting molecules involves averaging, for each molecule, the binary methylation values for the cytosines overlapping those bins.
Reducing those averages to integers summarises each molecule to three binary digits.
We sort and plot molecules based on these digits.
The whole process is wrapped in the SortReadsBySingleTF function.

SortedReads = SortReadsBySingleTF(MethSM = Methylation[[2]], TFBS = TFBSs)
## Designing sorting bins
## Collecting summarized methylation for bins
## Subsetting those reads that cover all bins
## Summarizing reads into patterns
## Splitting reads by pattern

N.b.: custom bins can be used by through the argument bins.

The function returns a list with one item per sample.
Each of these is itself a list of up to eight items, one per possible combination of three binary digits, i.e. 000, 001, 010, etc.
Each of these items contains the IDs of the molecules sorted.
N.b.: patterns with 0 molecules will not be reported.

lapply(SortedReads$SMF_MM_TKO_DE_, head, 2)
## $`000`
## [1] "D00404:273:H5757BCXY:1:1102:2767:29201" 
## [2] "D00404:273:H5757BCXY:1:1109:17973:76532"
## 
## $`001`
## [1] "D00404:273:H5757BCXY:1:2107:14462:35292"
## [2] "D00404:273:H5757BCXY:1:2116:11877:22847"
## 
## $`010`
## [1] "D00404:283:HCFT7BCXY:2:2206:12683:36087"
## [2] "NB501735:9:HLG5VBGX2:2:13108:18702:4995"
## 
## $`011`
## [1] "D00404:273:H5757BCXY:1:2103:13490:80813"
## [2] "D00404:273:H5757BCXY:1:2210:11790:19673"
## 
## $`100`
## [1] "D00404:273:H5757BCXY:1:1102:2692:28832"
## [2] "D00404:273:H5757BCXY:1:1108:7836:31800"
## 
## $`101`
## [1] "D00404:273:H5757BCXY:1:1101:16782:14912"
## [2] "D00404:273:H5757BCXY:1:1103:15983:34280"
## 
## $`110`
## [1] "D00404:273:H5757BCXY:1:1203:5101:74304"
## [2] "D00404:273:H5757BCXY:2:2111:8077:43320"
## 
## $`111`
## [1] "D00404:273:H5757BCXY:1:1106:20721:33897"
## [2] "D00404:273:H5757BCXY:1:1107:16458:47084"

The number of molecules per pattern can be checked using lenghts.

lengths(SortedReads$SMF_MM_TKO_DE_)
##  000  001  010  011  100  101  110  111 
##  216   72    6   19  301 2372   51  932

Here most molecules show the 101 pattern.

These patterns are not immediately human readable. For convenience we hard-coded a biological interpretation in the function SingleTFStates.

SingleTFStates()
## $bound
## [1] "101"
## 
## $accessible
## [1] "111"
## 
## $closed
## [1] "000" "100" "001"
## 
## $unassigned
## [1] "010" "110" "011"

This function can be used together with the function StateQuantification, to compute the frequencies of the biological states associated with single TF binding.
The function StateQuantificationBySingleTF hard-codes the states argument for convenience.

StateQuantification(SortedReads = SortedReads, states = SingleTFStates())
## # A tibble: 4 × 4
##   Sample         State      Counts Freqs
##   <chr>          <chr>       <int> <dbl>
## 1 SMF_MM_TKO_DE_ bound        2372 59.8 
## 2 SMF_MM_TKO_DE_ accessible    932 23.5 
## 3 SMF_MM_TKO_DE_ closed        589 14.8 
## 4 SMF_MM_TKO_DE_ unassigned     76  1.91

Sorted molecules can be visualized with the PlotSM function.

PlotSM(MethSM = Methylation[[2]], RegionOfInterest = RegionOfInterest, sorting.strategy = "classical", SortedReads = SortedReads)
## Arranging reads according to classical sorting.strategy
## Inferring sorting was performed by single TF
## Warning: Removed 80 rows containing missing values or values outside the scale range
## (`geom_tile()`).

A corresponding barplot can be obtained through the StateQuantificationPlot or the SingleTFStateQuantificationPlot which hard-codes the states parameter.

StateQuantificationPlot(SortedReads = SortedReads, states = SingleTFStates())

Sorting by TF clusters

Similarly to single TFs, sorting by TF motif clusters requires designing bins. Here we draw a [-7;+7] bin around each TF motif, a [-35;-25] bin relative to the most upstream TF motif and a [+25,+35] bin relative to the center of the most downstream TFBS, for a total of \(n+2\) bins, with \(n\) being the number of TF motifs.
Here we exemplify the process manually for clarity.

RegionOfInterest = GRanges("chr6", IRanges(88106000, 88106500))
Methylation = qs::qread(system.file("extdata", "Methylation_4.qs", package="SingleMoleculeFootprinting"))
TFBSs = qs::qread(system.file("extdata", "TFBSs_1.qs", package="SingleMoleculeFootprinting"))

motif_center_1 = start(IRanges::resize(TFBSs[1], 1, "center"))
motif_center_2 = start(IRanges::resize(TFBSs[2], 1, "center"))
SortingBins = c(
  GRanges("chr6", IRanges(motif_center_1-35, motif_center_1-25)),
  GRanges("chr6", IRanges(motif_center_1-7, motif_center_1+7)),
  GRanges("chr6", IRanges(motif_center_2-7, motif_center_2+7)),
  GRanges("chr6", IRanges(motif_center_2+25, motif_center_2+35))
)

PlotAvgSMF(
  MethGR = Methylation[[1]], RegionOfInterest = RegionOfInterest, TFBSs = TFBSs, SortingBins = SortingBins
)
## No sorted reads passed...plotting counts from all reads