1 Learning latent variable graphical models

It can be shown that unobserved, latent variables introduce artifacts into empirical estimates of OTU-OTU associations. These effects can be removed from the network by treating the inverse covariance selection problem as a sparse + low-rank decomposition (SPIEC-EASI slr), where the sparse component are the associations encoded as a conditional independence graph, and the low-rank components are the isolated latent effects.

Please see the preprint and the manuscript Synapse project or github repository for more details.

To demonstrate this in action we can show that removing latent effects improves a consistency measure between round 1 and round 2 of the American Gut project data.

library(SpiecEasi)
library(phyloseq)
data(hmp2)
data(amgut1.filt)
data(amgut2.filt.phy)

First we fit the networks, assuming that there are 10 latent components in the dataset:

se1.mb.amgut <- spiec.easi(amgut1.filt, method='mb', lambda.min.ratio=1e-2,
                          nlambda=20, pulsar.params=list(rep.num=20, ncores=4))
se2.mb.amgut <- spiec.easi(amgut2.filt.phy, method='mb', lambda.min.ratio=1e-2,
                          nlambda=20, pulsar.params=list(rep.num=20, ncores=4))

se1.slr.amgut <- spiec.easi(amgut1.filt, method='slr', r=10, lambda.min.ratio=1e-2,
                          nlambda=20, pulsar.params=list(rep.num=20, ncores=4))
se2.slr.amgut <- spiec.easi(amgut2.filt.phy, method='slr', r=10, lambda.min.ratio=1e-2,
                          nlambda=20, pulsar.params=list(rep.num=20, ncores=4))

Then we compare the consistency between the edge sets within each method using the Jaccard index:

otu1 <- colnames(amgut1.filt)
otu2 <- taxa_names(amgut2.filt.phy)
edge.diss(getRefit(se1.mb.amgut), getRefit(se2.mb.amgut), 'jaccard', otu1, otu2)
# [1] 0.2524272
edge.diss(getRefit(se1.slr.amgut), getRefit(se2.slr.amgut), 'jaccard', otu1, otu2)
# [1] 0.2333333

Consistency should be a bit better for the slr networks.

Construct the robust PCA from amgut2 data:

# Use cached X and L if available, otherwise extract from objects
if (!exists("X") || !exists("L")) {
  X <- se2.slr.amgut$est$data
  L <- se2.slr.amgut$est$resid[[getOptInd(se2.slr.amgut)]]
}
pca <- robustPCA(X, L)

We can also check the correlation between AGP meta-data and the latent factors (scores of the robust PCA):

age <- as.numeric(as.character(sample_data(amgut2.filt.phy)$AGE))
# Warning: NAs introduced by coercion
bmi <- as.numeric(as.character(sample_data(amgut2.filt.phy)$BMI))
# Warning: NAs introduced by coercion
depth <- colSums(otu_table(amgut2.filt.phy))

cor(age, pca$scores, use='pairwise')
#          [,1]        [,2]        [,3]       [,4]        [,5]         [,6]
# [1,] 0.173432 -0.08552127 -0.03510872 0.05100723 -0.03731325 -0.001999364
#           [,7]       [,8]        [,9]       [,10]
# [1,] 0.2704554 -0.1544349 -0.01230709 -0.02630695
cor(bmi, pca$scores, use='pairwise')
#           [,1]        [,2]       [,3]        [,4]        [,5]       [,6]
# [1,] 0.1543099 -0.02068402 0.05674442 0.007696536 -0.01501243 0.09462176
#              [,7]       [,8]      [,9]      [,10]
# [1,] 0.0006481053 -0.1362486 0.1268026 -0.1738632
cor(depth, pca$scores, use='pairwise')
#           [,1]      [,2]       [,3]       [,4]      [,5]      [,6]       [,7]
# [1,] 0.2104897 0.5770217 0.08547699 -0.3663146 0.1357491 0.1297332 0.05664281
#             [,8]      [,9]       [,10]
# [1,] 0.002289043 0.1283744 0.003160008

1.1 Key differences from standard SPIEC-EASI

The slr method in SpiecEasi implements the sparse + low-rank decomposition approach, which:

  1. Removes latent effects: By decomposing the inverse covariance into sparse and low-rank components, we can isolate and remove the effects of unobserved variables.

  2. Improves consistency: Networks inferred with latent variable correction tend to be more consistent across different datasets or time points.

  3. Parameter r: Specifies the number of latent components to model. This should be chosen based on domain knowledge or cross-validation.

  4. Computational considerations: The slr method is more computationally intensive than standard methods, so consider using parallel processing options.

Session info:

sessionInfo()
# R Under development (unstable) (2025-11-04 r88984)
# Platform: aarch64-apple-darwin20
# Running under: macOS Ventura 13.7.8
# 
# Matrix products: default
# BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib 
# LAPACK: /Library/Frameworks/R.framework/Versions/4.6-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.1
# 
# locale:
# [1] C/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
# 
# time zone: America/New_York
# tzcode source: internal
# 
# attached base packages:
# [1] stats     graphics  grDevices utils     datasets  methods   base     
# 
# other attached packages:
# [1] phyloseq_1.55.0  igraph_2.2.1     Matrix_1.7-4     SpiecEasi_1.99.3
# [5] BiocStyle_2.39.0
# 
# loaded via a namespace (and not attached):
#  [1] gtable_0.3.6        shape_1.4.6.1       xfun_0.54          
#  [4] bslib_0.9.0         ggplot2_4.0.0       rhdf5_2.55.8       
#  [7] Biobase_2.71.0      lattice_0.22-7      rhdf5filters_1.23.0
# [10] vctrs_0.6.5         tools_4.6.0         generics_0.1.4     
# [13] biomformat_1.39.0   stats4_4.6.0        parallel_4.6.0     
# [16] tibble_3.3.0        cluster_2.1.8.1     pkgconfig_2.0.3    
# [19] huge_1.3.5          data.table_1.17.8   RColorBrewer_1.1-3 
# [22] S7_0.2.0            S4Vectors_0.49.0    lifecycle_1.0.4    
# [25] farver_2.1.2        compiler_4.6.0      stringr_1.6.0      
# [28] Biostrings_2.79.2   tinytex_0.57        Seqinfo_1.1.0      
# [31] codetools_0.2-20    permute_0.9-8       htmltools_0.5.8.1  
# [34] sass_0.4.10         yaml_2.3.10         glmnet_4.1-10      
# [37] pillar_1.11.1       crayon_1.5.3        jquerylib_0.1.4    
# [40] MASS_7.3-65         cachem_1.1.0        vegan_2.7-2        
# [43] magick_2.9.0        iterators_1.0.14    foreach_1.5.2      
# [46] nlme_3.1-168        tidyselect_1.2.1    digest_0.6.38      
# [49] stringi_1.8.7       dplyr_1.1.4         reshape2_1.4.5     
# [52] bookdown_0.45       splines_4.6.0       ade4_1.7-23        
# [55] fastmap_1.2.0       grid_4.6.0          cli_3.6.5          
# [58] magrittr_2.0.4      dichromat_2.0-0.1   survival_3.8-3     
# [61] ape_5.8-1           scales_1.4.0        rmarkdown_2.30     
# [64] XVector_0.51.0      multtest_2.67.0     pulsar_0.3.11      
# [67] VGAM_1.1-13         evaluate_1.0.5      knitr_1.50         
# [70] IRanges_2.45.0      mgcv_1.9-4          rlang_1.1.6        
# [73] Rcpp_1.1.0          glue_1.8.0          BiocManager_1.30.26
# [76] BiocGenerics_0.57.0 jsonlite_2.0.0      R6_2.6.1           
# [79] Rhdf5lib_1.33.0     plyr_1.8.9