CatsCradle Spatial Vignette

Anna Laddach and Michael Shapiro

2025-02-13

BiocStyle

Introduction

Here we describe the tools that CatsCradle offers for exploiting the spatial relationships in spatial transcriptomics data.

We will be using a subset of a Xenium data set that profiles the mouse hippocampus available from 10x genomics (https://www.10xgenomics.com/datasets/fresh-frozen-mouse-brain-for-xenium-explorer-demo-1-standard)

Here we visualise this subset coloured by cell type. Here cell clusters (Louvain cluster) have been assigned cell type identities using RCTD along with a reference dataset (https://www.dropbox.com/s/cuowvm4vrf65pvq/allen_cortex.rds?dl=1) followed by minimal manual curation. Please note these assignments are not definitive and are for illustratory purporses only.

library(Seurat,quietly=TRUE)
library(CatsCradle,quietly=TRUE)
getExample = make.getExample()
smallXenium = getExample('smallXenium')
ImageDimPlot(smallXenium, cols = "polychrome", size = 1)

We will want to answer questions like the following:

Is the expression of a particular gene localised? Who are the immediate neighbours of a given cell? Do certain types of cells tend to co-localise? Is the spatial association of two cell types, (say, glia and macrophages) statistically significant? For any pair of neighbouring cells, are they engaged in ligand-repceptor interactions? Can we classify different types of tissue neighbourhoods? Are different neighbourhood types characterised by the cell types found in them? The genes expressed in them? The ligand-receptor interactions taking place in them?

Neighbourhoods

A key concept here is that of a neighbourhood. A neighbourhood is a spatially contiguous set of cells. In every case we will consider, a neighbourhood is a set of cells centred on a particular cell.

The simplest sort of neighbourhood consists of a cell together with its immediate neighbours. We compute these as a Delaunay triangulation using the centroids of the cells:

centroids = GetTissueCoordinates(smallXenium)
rownames(centroids) = centroids$cell
clusters = smallXenium@active.ident
delaunayNeighbours = computeNeighboursDelaunay(centroids)
head(delaunayNeighbours)
#>   nodeA nodeB
#> 1 16307 16316
#> 2 16314 16317
#> 3 16296 16299
#> 4 16299 16307
#> 5 16309 16316
#> 6 16309 16314

Two cells are neighbours if they appear on the same row. The neighbourhood of the cell 16307 consists of the following 7 cells:

idx = (delaunayNeighbours$nodeA == 16307 |
       delaunayNeighbours$nodeB == 16307)
nbhd = unique(c(delaunayNeighbours$nodeA[idx],
                delaunayNeighbours$nodeB[idx]))
nbhd        
#> [1] "16307" "16299" "16303" "16298" "16296" "16316" "16317"

We can compute extended neighbourhoods by considering (say) each cell’s neighbours, their neighbours, their neighbours and their neighbours. Here we have such an extended neighbourhood as a list and again as a data frame with each row giving a pair of cells in a common extended neighbourhood. In this case we are building an extended neighbourhood of combinatorial radius 4.

extendedNeighboursList = getExtendedNBHDs(delaunayNeighbours, 4)
#> radius 2
#> radius 3
#> radius 4
extendedNeighbours = collapseExtendedNBHDs(extendedNeighboursList, 4)

Now the extended neighbourhood of cell 16307 consists of 92 cells:

idx = (extendedNeighbours$nodeA == 16307 |
       extendedNeighbours$nodeB == 16307)
nbhd = unique(c(extendedNeighbours$nodeA[idx],
                extendedNeighbours$nodeB[idx]))
length(nbhd)        
#> [1] 92

We can also create neighbours based on Euclidean distance in the tissue.

euclideanNeighbours = computeNeighboursEuclidean(centroids,
threshold=20)

We see two reasons for looking at these extended neighbourhoods. One is that while the Delaunay triangulation might be more appropriate for looking at cell to cell interactions involving direct contact, extended neighbourhoods might be more appropriate for studying interactions based on diffusible ligands. The second is that when we go on to characterise types of neighbourhoods, these extended neighbourhoods exhibit less noise.

We offer two viewpoints as to what is “going on” in a neighbourhood. One is that a neighbourhood is characterised by the cell types that are found in it. In this view, just as a cell expresses genes, a neighbourhood “expresses” cell types. A second viewpoint is that a neighbourhood also expresses genes. That is, for each neighbourhood, we can compute the total gene expression across all the cells in that neighbourhood.

In this vignette we will focus on the first viewpoint. This is not because we think the first viewpoint is more important or because we necessarily expect it to be more fruitful. It’s because the second viewpoint so directly parallels standard Seurat single cell transcriptomics analyses that it requires less explanation. So before we move on to the neighbourhoods / cell types viewpoint, we give an overview of the neighbourhoods / aggregate gene expression viewpoint.

Neighbourhoods as characterised by gene expression

Here we create a Seurat object based on the aggregate gene expression in each of our extended neighbourhoods and display the clustering of the neighbourhoods (called aggregation_clusters) both on the tissue plot and on the resulting UMAP.

agg = aggregateGeneExpression(smallXenium,extendedNeighbours,
                                    verbose=FALSE)
#> Warning: Data is of class matrix. Coercing to dgCMatrix.
smallXenium$aggregateNBHDClusters = agg@active.ident
ImageDimPlot(smallXenium,group.by='aggregateNBHDClusters',cols='polychrome')

Since neighbourhoods here are indexed by individual cells, the neighbourhood gene aggregation Seurat object is formally identical to a standard Seurat object. In particular, this allows the standard sorts of analyses including clustering of neighbourhoods into types; dimension reduction using PCA, UMAP, tSNE; discovery of marker genes for each neighbourhood type; plotting of aggregate gene expression on neighbourhood UMAP; use of CatsCradle to discover novel clustering of genes based on their expression across the different neighbourhoods.

Neighbourhoods as characterised by cell types

Calculation of neighbourhood celltype composition

Given any of the spatial graphs describing neighbourhoods calculated above, we can now calculate the cell type composition of neighbourhoods.

NBHDByCTMatrix = computeNBHDByCTMatrix(delaunayNeighbours, clusters)

In the resulting matrix neighbourhoods are rows and cell types are columns. The values in the matrix indicate the number of cells of a given type within a neighbourhood.

Let’s do the same for our extended neighbourhoods.

NBHDByCTMatrixExtended = 
  computeNBHDByCTMatrix(extendedNeighbours, clusters)

Analysis of contact based interactions between cell types

We can go on to calculate a matrix which gives the fraction of contacts cells of a given type make with cells of another cell type.

cellTypesPerCellTypeMatrix = 
  computeCellTypesPerCellTypeMatrix(NBHDByCTMatrix,clusters)

Rows and columns both correspond to cell types, but they are playing different roles. For a given row (say, cell type A) the entry in column B represents the fraction of cells in neighbourhoods of cells of type A that are of type B.

We can then display this as a force directed graph. Here we choose only to display contact based interactions that constitute at least 5% of a cell type’s interactions. Of note, this graph is directed as, for example, 50% of cell type A’s interactions might be with cell type B, but only 5% of cell type B’s interactions might be with cell type A.

colours = DiscretePalette(length(levels(clusters)), palette = "polychrome")
names(colours) = levels(clusters)

cellTypesPerCellTypeGraphFromCellMatrix(cellTypesPerCellTypeMatrix, 
                                    minWeight = 0.05, colours = colours)

#> IGRAPH 56ca6cd DNW- 16 55 -- 
#> + attr: coords (g/n), name (v/c), color (v/c), weight (e/n), width
#> | (e/n)
#> + edges from 56ca6cd (vertex names):
#>  [1] 0_Oligo->3_Endo       0_Oligo->5_Astro      0_Oligo->10_Astro    
#>  [4] 0_Oligo->11_Oligo     0_Oligo->13_Microglia 0_Oligo->14_Ependymal
#>  [7] 3_Endo ->0_Oligo      3_Endo ->5_Astro      3_Endo ->10_Astro    
#> [10] 3_Endo ->11_Oligo     4_Astro->0_Oligo      4_Astro->3_Endo      
#> [13] 4_Astro->5_Astro      4_Astro->6_VLMC       4_Astro->7_Granule   
#> [16] 4_Astro->10_Astro     4_Astro->11_Oligo     4_Astro->14_Ependymal
#> [19] 4_Astro->17_Lamp5     4_Astro->20_VLMC      5_Astro->3_Endo      
#> + ... omitted several edges

Here arrows are directed from rows to columns. Thus, we see an arrow from 12_Pvalb to 18_pyramidal because neighbourhoods of cells of type 12_Pvalb are composed (across the dataset) of ~15% cells of type 18_pyramidal. We do not see an arrow from 18_pyramidal to 12_Pvalb because neighbourhoods of cells of type 18_pyramidal have only 3% cells of type 12_Pvalb, which falls below the chosen cutoff.

It’s worth pointing out the following. The number of edges from cells of type A to cells of type B is the same as the number of edges from cells of type B to cells of type A. Thus the matrix of counts of these edges is symmetric. However, numbers of cells of types A and B are not necessarily equal, and this accounts for the assymmetry in the fractions.

library(pheatmap,quietly=TRUE)
pheatmap(cellTypesPerCellTypeMatrix)

Let’s do the same for our extended nighbourhoods.

cellTypesPerCellTypeMatrixExtended = computeCellTypesPerCellTypeMatrix(NBHDByCTMatrixExtended,clusters)

cellTypesPerCellTypeGraphFromCellMatrix(cellTypesPerCellTypeMatrixExtended,
minWeight = 0.05, colours = colours)

#> IGRAPH 9b5c991 DNW- 16 62 -- 
#> + attr: coords (g/n), name (v/c), color (v/c), weight (e/n), width
#> | (e/n)
#> + edges from 9b5c991 (vertex names):
#>  [1] 0_Oligo->3_Endo       0_Oligo->10_Astro     0_Oligo->11_Oligo    
#>  [4] 0_Oligo->14_Ependymal 3_Endo ->0_Oligo      3_Endo ->5_Astro     
#>  [7] 3_Endo ->6_VLMC       3_Endo ->10_Astro     3_Endo ->11_Oligo    
#> [10] 3_Endo ->14_Ependymal 4_Astro->0_Oligo      4_Astro->5_Astro     
#> [13] 4_Astro->6_VLMC       4_Astro->7_Granule    4_Astro->10_Astro    
#> [16] 4_Astro->11_Oligo     4_Astro->14_Ependymal 4_Astro->20_VLMC     
#> [19] 5_Astro->6_VLMC       5_Astro->11_Oligo     6_VLMC ->10_Astro    
#> + ... omitted several edges

We can also calculate p values (upper tail, one-sided) for whether cell types are more commonly neighbours than expected by chance. To do this we compare the actual spatial neighbour graph to randomised spatial neighbour graphs where edges are randomised but the degree of each vertice is preserved. As this is carried out on the level of counts of undirected edges between celltypes the p value matrix is symmetric.

cellTypesPerCellTypePValues = computeNeighbourEnrichment(delaunayNeighbours, 
                                          clusters, nSim=100,verbose = FALSE)
#> Warning in simGraph$nodeB[selfEdge] <- nodeB[(selfEdge + 1)%%n]: number of
#> items to replace is not a multiple of replacement length
#> Warning in simGraph$nodeB[(selfEdge + 1)%%n] <- nodeB[selfEdge]: number of
#> items to replace is not a multiple of replacement length

Let’s plot -log10(pvalue)

cellTypesPerCellTypePValuesNegLog = -log10(cellTypesPerCellTypePValues)
pheatmap(cellTypesPerCellTypePValuesNegLog)

Analysis of neighbourhoods based on cell type composition

As mentioned above we can create Seurat objects of neighbourhoods where the underlying counts matrix gives the the number of cells of each type in a given neighbourhood. We can now perform dimensionality reduction and clustering based on neighbourhood composition. As the dimensionality of the feature space is relatively low (number of cell types) we calculate the UMAP using features rather than PCs.

NBHDByCTSeurat = computeNBHDVsCTObject(NBHDByCTMatrix,verbose=FALSE)
#> Warning: Feature names cannot have underscores ('_'), replacing with dashes
#> ('-')
#> Warning: Data is of class matrix. Coercing to dgCMatrix.
#> Warning in svd.function(A = t(x = object), nv = npcs, ...): You're computing
#> too large a percentage of total singular values, use a standard svd instead.
#> Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
#> 
#> Number of nodes: 4261
#> Number of edges: 165647
#> 
#> Running Louvain algorithm...
#> Maximum modularity in 10 random starts: 0.9825
#> Number of communities: 17
#> Elapsed time: 0 seconds

Add cell type information to the neighbourhoodSeurat object.

NBHDByCTSeurat$cellType = clusters

Visualise neighbourhood clusters.

DimPlot(NBHDByCTSeurat, group.by = c("cellType"), cols = "polychrome", reduction = "umap")

DimPlot(NBHDByCTSeurat, group.by = c("neighbourhood_clusters"), cols = "polychrome", reduction = "umap")

We can now add information on neighbourhood clusters to our original Xenium object and visualise these on the tissue.

smallXenium$NBHDCluster = NBHDByCTSeurat@active.ident
ImageDimPlot(smallXenium, group.by = "NBHDCluster", size = 1, cols = "polychrome")