CRISPR screens represent a promising technology to systematically evaluate gene functions. We have previously developed two algorithms, MAGeCK (Wei Li and Liu. 2014) and MAGeCK-VISPR (Wei Li and Liu. 2015), to analyze CRISPR screen data. The two algorithms use a negative binomial distribution to model variances of sgRNA read counts and allow users to perform read-count mapping, normalization and QC, as well as to identify positively and negatively selected genes in the screens. Here, we developed MAGeCKFlute for accurate identification of gene hits and associated pathways. MAGeCKFlute is distinguished from other currently available tools by its comprehensive pipeline, which contains a series of functions for analyzing CRISPR screen data. This vignette explains how to use MAGeCKFlute to perform quality control (QC), normalization, batch effect removal, gene hit identification and downstream functional enrichment analysis for CRISPR screens.
knitr::opts_chunk$set(tidy=FALSE, cache=TRUE,
dev="png", message=FALSE, error=FALSE, warning=TRUE)
Note: if you use MAGeCKFlute in published research, please cite: Binbin Wang, Mei Wang, Wubing Zhang. “Integrative analysis of pooled CRISPR genetic screens using MAGeCKFlute.” Nature Protocols (2019), doi: 10.1038/s41596-018-0113-7.
if(!"MAGeCKFlute" %in% installed.packages()) BiocManager::install("MAGeCKFlute")
if(!"clusterProfiler" %in% installed.packages()) BiocManager::install("clusterProfiler")
if(!"ggplot2" %in% installed.packages()) BiocManager::install("ggplot2")
library(MAGeCKFlute)
library(clusterProfiler)
library(ggplot2)
The MAGeCK (mageck test
) uses Robust Rank Aggregation (RRA) for robust identification of CRISPR-screen hits, and outputs the summary results at both sgRNA and gene level. Before performing the downstream analysis, please make sure you have got the gene summary and sgRNA summary results from mageck test
. MAGeCKFlute incorporates an example datasets (Deng Pan 2018), which study the gene functions in T cell mediate tumor killing by performing CRISPR screens in B16F10 cell lines co-cultured with Pmel-1 T cells. There are three samples in the data, including Pmel1_Input (B16F10 cells without T cell co-culture), Pmel1_Ctrl (B16F10 cells co-cultured with control T cells), and Pmel1 (B16F10 cells co-cultured with antigen specific T cells). We compared Pmel1
with Pmel1_Ctrl
using mageck test
, which identifies genes associated with T cell mediated tumor killing.
MAGeCKFlute processes MAGeCK RRA results (“gene_summary” and “sgrna_summary”) with the function FluteRRA, which identifies positively and negatively selected genes and performs functional analysis.
## path to the gene summary file (required)
file1 = file.path(system.file("extdata", package = "MAGeCKFlute"),
"testdata/rra.gene_summary.txt")
## path to the sgRNA summary file (optional)
file2 = file.path(system.file("extdata", package = "MAGeCKFlute"),
"testdata/rra.sgrna_summary.txt")
# Run FluteRRA with only gene summary file
FluteRRA(file1, proj="Pmel1", organism="mmu", outdir = "./")
# Run FluteRRA with both gene summary file and sgRNA summary file
FluteRRA(file1, file2, proj="Pmel1", organism="mmu", outdir = "./")
incorporateDepmap
will allow users to compare the selected genes in the input data and Depmap data, which could clean the selected genes by eliminating lethal genes (false positive hits in some CRISPR screens) and then perform downstream enrichment analysis. You can set the incorporateDepmap
to be TRUE
if you want to include this analysis. Users could specify Depmap data of specific cell lines by setting the cell_lines
and lineages
.FluteRRA(file1, proj="Pmel1", organism="mmu", incorporateDepmap = TRUE, outdir = "./")
omitEssential
allows users to eliminate all common essential genes selected in the Depmap from all of the downstream analysis.FluteRRA(gdata, proj="Pmel1", organism="mmu", omitEssential = TRUE, outdir = "./")
Hints: all figures and intermediate data are saved into local directory “./MAGeCKFlute_Test/”, and all figures are integrated into file “FluteRRA_Test.pdf”.
For more available parameters in FluteRRA
, please read the documentation using the command ?FluteRRA
.
The MAGeCK-VISPR (mageck mle
) utilizes a maximum likelihood estimation (MLE) for robust identification of CRISPR screen hits. It outputs beta scores and the associated statistics in multiple conditions. The beta score describes how a gene is selected: a positive beta score indicates positive selection, and a negative beta score indicates negative selection. Before using FluteMLE
, you should first get gene summary result from MAGeCK-VISPR (mageck mle
). MAGeCKFlute uses the same dataset as before for demonstration. Using mageck mle
, we removed the baseline effect (plasmid sample) from all the three samples, including Pmel1_Input (B16F10 cells without T cell co-culture), Pmel1_Ctrl (B16F10 cells co-cultured with control T cells), and Pmel1 (B16F10 cells co-cultured with antigen specific T cells).
MAGeCKFlute processes MAGeCK MLE results (“gene_summary”) with the function FluteMLE, which performs QC and data normalization based on the beta score, identifies essential genes by comparing control and treatment samples, and finally performs functional enrichment analysis.
file3 = file.path(system.file("extdata", package = "MAGeCKFlute"),
"testdata/mle.gene_summary.txt")
FluteMLE(file3, treatname="Pmel1", ctrlname="Pmel1_Ctrl", proj="Pmel1", organism="mmu")
incorporateDepmap
will allow users to compare the selected genes in the input data and Depmap data, which could clean the selected genes by eliminating lethal genes (false positive hits in some CRISPR screens) and then perform downstream enrichment analysis. You can set the incorporateDepmap
to be TRUE
if you want to include this analysis. Users could specify Depmap data of specific cell lines by setting the cell_lines
and lineages
.## Take Depmap screen as control
FluteMLE(gdata, treatname="Pmel1_Ctrl", ctrlname="Depmap", proj="Pmel1", organism="mmu", incorporateDepmap = TRUE)
omitEssential
allows users to eliminate all common essential genes selected in the Depmap from all of the downstream analysis.FluteMLE(gdata, treatname="Pmel1", ctrlname="Pmel1_Ctrl", proj="Pmel1", organism="mmu", omitEssential = TRUE)
Hint: All pipeline results are written into local directory “./MAGeCKFlute_Test/”, and all figures are integrated into file “FluteMLE_Test.pdf”.
For more available parameters in FluteMLE
, please read the documentation using the command ?FluteMLE
.
MAGeCK/MAGeCK-VISPR outputs a count summary file, which summarizes some basic QC scores at raw count level, including map ratio, Gini index, and NegSelQC. MAGeCKFlute incorporates an example datasets (Ophir Shalem* 2014) for demonstration.
file4 = file.path(system.file("extdata", package = "MAGeCKFlute"),
"testdata/countsummary.txt")
countsummary = read.delim(file4, check.names = FALSE)
head(countsummary)
## File Label Reads Mapped Percentage
## 1 ../data/GSC_0131_Day23_Rep1.fastq.gz day23_r1 62818064 39992777 0.6366
## 2 ../data/GSC_0131_Day0_Rep2.fastq.gz day0_r2 47289074 31709075 0.6705
## 3 ../data/GSC_0131_Day0_Rep1.fastq.gz day0_r1 51190401 34729858 0.6784
## 4 ../data/GSC_0131_Day23_Rep2.fastq.gz day23_r2 58686580 37836392 0.6447
## TotalsgRNAs Zerocounts GiniIndex NegSelQC NegSelQCPval
## 1 64076 57 0.08510 0 1
## 2 64076 17 0.07496 0 1
## 3 64076 14 0.07335 0 1
## 4 64076 51 0.08587 0 1
## NegSelQCPvalPermutation NegSelQCPvalPermutationFDR NegSelQCGene
## 1 1 1 0
## 2 1 1 0
## 3 1 1 0
## 4 1 1 0
# Gini index
BarView(countsummary, x = "Label", y = "GiniIndex",
ylab = "Gini index", main = "Evenness of sgRNA reads")
# Missed sgRNAs
countsummary$Missed = log10(countsummary$Zerocounts)
BarView(countsummary, x = "Label", y = "Missed", fill = "#394E80",
ylab = "Log10 missed gRNAs", main = "Missed sgRNAs")
# Read mapping
MapRatesView(countsummary)
# Or
countsummary$Unmapped = countsummary$Reads - countsummary$Mapped
gg = reshape2::melt(countsummary[, c("Label", "Mapped", "Unmapped")], id.vars = "Label")
gg$variable = factor(gg$variable, levels = c("Unmapped", "Mapped"))
gg = gg[order(gg$Label, gg$variable), ]
p = BarView(gg, x = "Label", y = "value", fill = "variable",
position = "stack", xlab = NULL, ylab = "Reads", main = "Map ratio")
p + scale_fill_manual(values = c("#9BC7E9", "#1C6DAB"))
For CRISPR/Cas9 screens with two experimental conditions, MAGeCK-RRA is available for identification of essential genes. In MAGeCK-RRA results, the sgRNA summary and gene summary file summarizes the statistical significance of positive selections and negative selections at sgRNA level and gene level.
file1 = file.path(system.file("extdata", package = "MAGeCKFlute"),
"testdata/rra.gene_summary.txt")
gdata = ReadRRA(file1)
head(gdata)
## id Score FDR
## 1 Cd274 -3.4698 0.002475
## 2 Psmb8 -3.7260 0.002475
## 3 Rela -3.2413 0.008251
## 4 Otulin -3.6018 0.008663
## 5 Ikbkb -3.4256 0.010891
## 6 Tcea1 -3.4236 0.042079
file2 = file.path(system.file("extdata", package = "MAGeCKFlute"),
"testdata/rra.sgrna_summary.txt")
sdata = ReadsgRRA(file2)
head(sdata)
## sgrna Gene LFC FDR
## 1 AAACATATAGTGTACCTCTA Jak1 10.8910 0
## 2 TCCGAACCGAATCATCACTG Jak1 10.9170 0
## 3 TGAATAAATCCATCAGACAG Jak1 10.5970 0
## 4 GGATAGACGCCCAGCCACTG Stat1 9.9921 0
## 5 TTAATGACGAGCTCGTGGAG Stat1 9.2728 0
## 6 GAAAAGCAAGCGTAATCTCC Stat1 8.7931 0
gdata$HumanGene = TransGeneID(gdata$id, fromType = "symbol", toType = "symbol",
fromOrg = "mmu", toOrg = "hsa")
sdata$HumanGene = TransGeneID(sdata$Gene, fromType = "symbol", toType = "symbol",
fromOrg = "mmu", toOrg = "hsa")
## Remove missing or duplicate human genes
idx = duplicated(gdata$HumanGene)|is.na(gdata$HumanGene)
gdata = gdata[!idx, ]
depmap_similarity = ResembleDepmap(gdata, symbol = "HumanGene", score = "Score")
head(depmap_similarity)
gdata = OmitCommonEssential(gdata, symbol = "HumanGene")
sdata = OmitCommonEssential(sdata, symbol = "HumanGene")
gdata$LogFDR = -log10(gdata$FDR)
p1 = ScatterView(gdata, x = "Score", y = "LogFDR", label = "id",
model = "volcano", top = 5)
print(p1)
# Or
p2 = VolcanoView(gdata, x = "Score", y = "FDR", Label = "id")
print(p2)