The multiWGCNA R package is a WGCNA-based procedure designed for RNA-seq datasets with two biologically meaningful variables. We developed this package because we’ve found that the experimental design for network analysis can be ambiguous. For example, should I analyze my treatment and wildtype samples separately or together? We find that it is informative to perform all the possible design combinations, and subsequently integrate these results. For more information, please see Tommasini and Fogel. BMC Bioinformatics. 2023.
In this vignette, we will be showing how users can perform a full start-to-finish multiWGCNA analysis. We will be using the regional autism data from Voineagu et al. 2011 (https://www.nature.com/articles/nature10110). This dataset has two sample traits: 1) autism versus control, and 2) temporal cortex versus frontal cortex.
if (!require("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("multiWGCNA")
The development version of multiWGCNA can be installed from GitHub like this:
if (!require("devtools", quietly = TRUE))
install.packages("devtools")
devtools::install_github("fogellab/multiWGCNA")
Load multiWGCNA library
library(multiWGCNA)
Since this is purely an example of how to run the entire analysis from start to finish, we are going to limit our analysis to 1500 randomly selected probes.
Input data:
# Download data from the ExperimentHub
library(ExperimentHub)
eh = ExperimentHub()
# Note: this requires the SummarizedExperiment package to be installed
eh_query = query(eh, c("multiWGCNAdata"))
autism_se = eh_query[["EH8219"]]
#> see ?multiWGCNAdata and browseVignettes('multiWGCNAdata') for documentation
#> loading from cache
# Collect the metadata in the sampleTable
sampleTable = colData(autism_se)
# Randomly sample 2000 genes from the expression matrix
set.seed(1)
autism_se = autism_se[sample(rownames(autism_se), 1500),]
# Check the data
assays(autism_se)[[1]][1:5, 1:5]
#> GSM706412 GSM706413 GSM706414 GSM706415 GSM706416
#> ILMN_1672121 11.034264 10.446682 11.473705 11.732849 11.43105
#> ILMN_2151368 10.379812 9.969130 9.990030 9.542288 10.26247
#> ILMN_1757569 9.426955 9.050024 9.347505 9.235251 9.38837
#> ILMN_2400219 12.604047 12.886037 12.890658 12.446960 12.98925
#> ILMN_2222101 12.385019 12.748229 12.418027 11.690253 13.10915
sampleTable
#> DataFrame with 58 rows and 3 columns
#> Sample Status Tissue
#> <character> <character> <character>
#> GSM706412 GSM706412 autism FC
#> GSM706413 GSM706413 autism FC
#> GSM706414 GSM706414 autism FC
#> GSM706415 GSM706415 autism FC
#> GSM706416 GSM706416 autism FC
#> ... ... ... ...
#> GSM706465 GSM706465 controls TC
#> GSM706466 GSM706466 controls TC
#> GSM706467 GSM706467 controls TC
#> GSM706468 GSM706468 controls TC
#> GSM706469 GSM706469 controls TC
# Define our conditions for trait 1 (disease) and 2 (brain region)
conditions1 = unique(sampleTable[,2])
conditions2 = unique(sampleTable[,3])
Also, let’s find best trait for each module and identify outlier modules (ie modules driven by a single sample).
Let’s use power = 10 since Voineagu et al. 2011 used that for all their networks. They also constructed “unsigned” networks and a mergeCutHeight of 0.10 so that modules with similar expression are merged. This step may take a while.
# Construct the combined networks and all the sub-networks (autism only, controls only, FC only, and TC only)
autism_networks = constructNetworks(autism_se, sampleTable, conditions1, conditions2,
networkType = "unsigned", power = 10,
minModuleSize = 40, maxBlockSize = 25000,
reassignThreshold = 0, minKMEtoStay = 0.7,
mergeCutHeight = 0.10, numericLabels = TRUE,
pamRespectsDendro = FALSE, verbose=3)
#> Flagging genes and samples with too many missing values...
#> ..step 1
#> Calculating module eigengenes block-wise from all genes
#> Flagging genes and samples with too many missing values...
#> ..step 1
#> ..Working on block 1 .
#> TOM calculation: adjacency..
#> ..will not use multithreading.
#> Fraction of slow calculations: 0.000000
#> ..connectivity..
#> ..matrix multiplication (system BLAS)..
#> ..normalization..
#> ..done.
#> ....clustering..
#> ....detecting modules..
#> ....calculating module eigengenes..
#> ....checking kME in modules..
#> ..removing 163 genes from module 1 because their KME is too low.
#> ..removing 126 genes from module 2 because their KME is too low.
#> ..removing 59 genes from module 3 because their KME is too low.
#> ..removing 18 genes from module 4 because their KME is too low.
#> ..removing 28 genes from module 5 because their KME is too low.
#> ..removing 24 genes from module 6 because their KME is too low.
#> ..removing 9 genes from module 7 because their KME is too low.
#> ..merging modules that are too close..
#> mergeCloseModules: Merging modules whose distance is less than 0.1
#> Calculating new MEs...
#> softConnectivity: FYI: connecitivty of genes with less than 20 valid samples will be returned as NA.
#> ..calculating connectivities..
#> Flagging genes and samples with too many missing values...
#> ..step 1
#> Calculating module eigengenes block-wise from all genes
#> Flagging genes and samples with too many missing values...
#> ..step 1
#> ..Working on block 1 .
#> TOM calculation: adjacency..
#> ..will not use multithreading.
#> Fraction of slow calculations: 0.000000
#> ..connectivity..
#> ..matrix multiplication (system BLAS)..
#> ..normalization..
#> ..done.
#> ....clustering..
#> ....detecting modules..
#> ....calculating module eigengenes..
#> ....checking kME in modules..
#> ..removing 139 genes from module 1 because their KME is too low.
#> ..removing 82 genes from module 2 because their KME is too low.
#> ..removing 95 genes from module 3 because their KME is too low.
#> ..removing 37 genes from module 4 because their KME is too low.
#> ..removing 35 genes from module 5 because their KME is too low.
#> ..removing 36 genes from module 6 because their KME is too low.
#> ..removing 27 genes from module 7 because their KME is too low.
#> ..merging modules that are too close..
#> mergeCloseModules: Merging modules whose distance is less than 0.1
#> Calculating new MEs...
#> softConnectivity: FYI: connecitivty of genes with less than 10 valid samples will be returned as NA.
#> ..calculating connectivities..
#> Flagging genes and samples with too many missing values...
#> ..step 1
#> Calculating module eigengenes block-wise from all genes
#> Flagging genes and samples with too many missing values...
#> ..step 1
#> ..Working on block 1 .
#> TOM calculation: adjacency..
#> ..will not use multithreading.
#> Fraction of slow calculations: 0.000000
#> ..connectivity..
#> ..matrix multiplication (system BLAS)..
#> ..normalization..
#> ..done.
#> ....clustering..
#> ....detecting modules..
#> ....calculating module eigengenes..
#> ....checking kME in modules..
#> ..removing 212 genes from module 1 because their KME is too low.
#> ..removing 107 genes from module 2 because their KME is too low.
#> ..removing 65 genes from module 3 because their KME is too low.
#> ..removing 51 genes from module 4 because their KME is too low.
#> ..removing 50 genes from module 5 because their KME is too low.
#> ..removing 40 genes from module 6 because their KME is too low.
#> ..merging modules that are too close..
#> mergeCloseModules: Merging modules whose distance is less than 0.1
#> Calculating new MEs...
#> softConnectivity: FYI: connecitivty of genes with less than 10 valid samples will be returned as NA.
#> ..calculating connectivities..
#> Flagging genes and samples with too many missing values...
#> ..step 1
#> Calculating module eigengenes block-wise from all genes
#> Flagging genes and samples with too many missing values...
#> ..step 1
#> ..Working on block 1 .
#> TOM calculation: adjacency..
#> ..will not use multithreading.
#> Fraction of slow calculations: 0.000000
#> ..connectivity..
#> ..matrix multiplication (system BLAS)..
#> ..normalization..
#> ..done.
#> ....clustering..
#> ....detecting modules..
#> ....calculating module eigengenes..
#> ....checking kME in modules..
#> ..removing 164 genes from module 1 because their KME is too low.
#> ..removing 138 genes from module 2 because their KME is too low.
#> ..removing 86 genes from module 3 because their KME is too low.
#> ..removing 33 genes from module 4 because their KME is too low.
#> ..removing 35 genes from module 5 because their KME is too low.
#> ..removing 34 genes from module 6 because their KME is too low.
#> ..removing 19 genes from module 7 because their KME is too low.
#> ..merging modules that are too close..
#> mergeCloseModules: Merging modules whose distance is less than 0.1
#> Calculating new MEs...
#> softConnectivity: FYI: connecitivty of genes with less than 11 valid samples will be returned as NA.
#> ..calculating connectivities..
#> Flagging genes and samples with too many missing values...
#> ..step 1
#> Calculating module eigengenes block-wise from all genes
#> Flagging genes and samples with too many missing values...
#> ..step 1
#> ..Working on block 1 .
#> TOM calculation: adjacency..
#> ..will not use multithreading.
#> Fraction of slow calculations: 0.000000
#> ..connectivity..
#> ..matrix multiplication (system BLAS)..
#> ..normalization..
#> ..done.
#> ....clustering..
#> ....detecting modules..
#> ....calculating module eigengenes..
#> ....checking kME in modules..
#> ..removing 146 genes from module 1 because their KME is too low.
#> ..removing 119 genes from module 2 because their KME is too low.
#> ..removing 52 genes from module 3 because their KME is too low.
#> ..removing 31 genes from module 4 because their KME is too low.
#> ..removing 34 genes from module 5 because their KME is too low.
#> ..removing 12 genes from module 6 because their KME is too low.
#> ..removing 13 genes from module 7 because their KME is too low.
#> ..removing 10 genes from module 8 because their KME is too low.
#> ..merging modules that are too close..
#> mergeCloseModules: Merging modules whose distance is less than 0.1
#> Calculating new MEs...
#> softConnectivity: FYI: connecitivty of genes with less than 9 valid samples will be returned as NA.
#> ..calculating connectivities..
For calculating significance of overlap, we use the hypergeometric test (also known as the one-sided Fisher exact test). We’ll save the results in a list. Then, let’s take a look at the mutual best matches between autism modules and control modules.
# Save results to a list
results=list()
results$overlaps=iterate(autism_networks, overlapComparisons, plot=TRUE)
#>
#> #### comparing combined and autism ####
#>
#> #### comparing combined and controls ####
#>
#> #### comparing combined and FC ####
#>
#> #### comparing combined and TC ####