Two datasets were used to produce the figure in the paper:
Both datasets were aligned with BISCUIT (Zhou et al. 2024). To produce Bismark BEDgraph files (Krueger and Andrews 2011), we used a python script to convert the beta and coverage value columns to percentage methylation, unmethylated reads and methylated read columns.
import sys
import gzip
bedfile = sys.argv[1]
def bed2cov(line):
chr, start, end, beta, cov = line.split('\t')
percent, meth, unmeth = convert(float(beta), int(cov))
return '\t'.join([chr, start, start, str(percent), str(meth), str(unmeth)])
def convert(beta, cov):
percent = round(beta * 100)
meth = round(beta * cov)
unmeth = cov - meth
return [percent, meth, unmeth]
with gzip.open(bedfile, 'rt') as bed:
for line in bed:
print(bed2cov(line))Using GNU parallel:
Then using the iscream.paper package at https://github.com/huishenlab/iscream.paper we ran the
benchmarks and produced the figures.