% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/nnSVG.R
\name{nnSVG}
\alias{nnSVG}
\title{nnSVG}
\usage{
nnSVG(
  input,
  spatial_coords = NULL,
  X = NULL,
  assay_name = "logcounts",
  n_neighbors = 10,
  order = "AMMD",
  n_threads = 1,
  BPPARAM = NULL,
  verbose = FALSE
)
}
\arguments{
\item{input}{\code{SpatialExperiment} or \code{numeric} matrix: Input data,
which can either be a \code{SpatialExperiment} object or a \code{numeric}
matrix of values. If it is a \code{SpatialExperiment} object, it is assumed
to have an \code{assay} slot containing either logcounts (e.g. from the
\code{scran} package) or deviance residuals (e.g. from the \code{scry}
package), and a \code{spatialCoords} slot containing spatial coordinates of
the measurements. If it is a \code{numeric} matrix, the values are assumed
to already be normalized and transformed (e.g. logcounts), formatted as
\code{rows = genes} and \code{columns = spots}, and a separate
\code{numeric} matrix of spatial coordinates must also be provided with the
\code{spatial_coords} argument.}

\item{spatial_coords}{\code{numeric} matrix: Matrix containing columns of
spatial coordinates, formatted as \code{rows = spots}. This must be
provided if \code{input} is provied as a \code{numeric} matrix of values,
and is ignored if \code{input} is provided as a \code{SpatialExperiment}
object. Default = NULL.}

\item{X}{\code{numeric} matrix: Optional design matrix containing columns of
covariates per spatial location, e.g. known spatial domains. Number of rows
must match the number of spatial locations. Default = NULL, which fits an
intercept-only model.}

\item{assay_name}{\code{character}: If \code{input} is provided as a
\code{SpatialExperiment} object, this argument selects the name of the
\code{assay} slot in the input object containing the preprocessed gene
expression values. For example, \code{logcounts} for log-transformed
normalized counts from the \code{scran} package, or
\code{binomial_deviance_residuals} for deviance residuals from the
\code{scry} package. Default = \code{"logcounts"}, or ignored if
\code{input} is provided as a \code{numeric} matrix of values.}

\item{n_neighbors}{\code{integer}: Number of nearest neighbors for fitting
the nearest-neighbor Gaussian process (NNGP) model with BRISC. The default
value is 10, which we recommend for most datasets. Higher numbers (e.g. 15)
may give slightly improved likelihood estimates in some datasets (at the
expense of slower runtime), and smaller numbers (e.g. 5) will give faster
runtime (at the expense of reduced performance). Default = 10.}

\item{order}{\code{character}: Ordering scheme to use for ordering
coordinates with BRISC. Default = \code{"AMMD"} for "approximate maximum
minimum distance", which is recommended for datasets with at least 65
spots. For very small datasets (n <= 65), \code{"Sum_coords"} can be used
instead. See BRISC documentation for details. Default = \code{"AMMD"}.}

\item{n_threads}{\code{integer}: Number of threads for parallelization.
Default = 1. We recommend setting this equal to the number of cores
available (if working on a laptop or desktop) or around 10 or more (if
working on a compute cluster).}

\item{BPPARAM}{\code{BiocParallelParam}: Optional additional argument for
parallelization. This argument is provided for advanced users of
\code{BiocParallel} for further flexibility for parallelization on some
operating systems. If provided, this should be an instance of
\code{BiocParallelParam}. For most users, the recommended option is to use
the \code{n_threads} argument instead. Default = NULL, in which case
\code{n_threads} will be used instead.}

\item{verbose}{\code{logical}: Whether to display verbose output for model
fitting and parameter estimation from \code{BRISC}. Default = FALSE.}
}
\value{
If the input was provided as a \code{SpatialExperiment} object, the
  output values are returned as additional columns in the \code{rowData} slot
  of the input object. If the input was provided as a \code{numeric} matrix
  of values, the output is returned as a \code{numeric} matrix. The output
  values include spatial variance parameter estimates, likelihood ratio (LR)
  statistics, effect sizes (proportion of spatial variance), p-values, and
  multiple testing adjusted p-values.
}
\description{
Function to run 'nnSVG' method to identify spatially variable genes (SVGs) in
spatially-resolved transcriptomics data.
}
\details{
Function to run 'nnSVG' method to identify spatially variable genes (SVGs) in
spatially-resolved transcriptomics data.

The 'nnSVG' method is based on nearest-neighbor Gaussian processes (Datta et
al. 2016) and uses the BRISC algorithm (Saha and Datta 2018) for model
fitting and parameter estimation. The method scales linearly with the number
of spatial locations, and can be applied to datasets containing thousands or
more spatial locations. For more details on the method, see our paper.

This function runs 'nnSVG' for a full dataset. The function fits a separate
model for each gene, using parallelization with BiocParallel for faster
runtime. The parameter estimates from BRISC (sigma.sq, tau.sq, phi) for each
gene are stored in 'Theta' in the BRISC output.

Note that the method and this function are designed for a single tissue
section. For an example of how to run nnSVG in a dataset consisting of
multiple tissue sections, see the tutorial in the nnSVG package vignette.

'nnSVG' performs inference on the spatial variance parameter estimates
(sigma.sq) using a likelihood ratio (LR) test against a simpler linear model
without spatial terms (i.e. without tau.sq or phi). The estimated LR
statistics can then be used to rank SVGs. P-values are calculated from the LR
statistics using the asymptotic chi-squared distribution with 2 degrees of
freedom, and multiple testing adjusted p-values are calculated using the
Benjamini-Hochberg method. We also calculate an effect size, defined as the
proportion of spatial variance, 'prop_sv = sigma.sq / (sigma.sq + tau.sq)'.

The function assumes the input is provided either as a
\code{SpatialExperiment} object or a \code{numeric} matrix of values. If the
input is a \code{SpatialExperiment} object, it is assumed to contain an
\code{assay} slot containing either log-transformed normalized counts (also
known as logcounts, e.g. from the \code{scran} package) or deviance residuals
(e.g. from the \code{scry} package), which have been preprocessed, quality
controlled, and filtered to remove low-quality spatial locations. If the
input is a \code{numeric} matrix of values, these values are assumed to
already be normalized and transformed (e.g. logcounts).
}
\examples{
library(SpatialExperiment)
library(STexampleData)
library(scran)


### Example 1
### for more details see extended example in vignette

# load example dataset from STexampleData package
spe1 <- Visium_humanDLPFC()

# preprocessing steps
# keep only spots over tissue
spe1 <- spe1[, colData(spe1)$in_tissue == 1]
# skip spot-level quality control (already performed in this dataset)
# filter low-expressed and mitochondrial genes
spe1 <- filter_genes(spe1)
# calculate logcounts using library size factors
spe1 <- computeLibraryFactors(spe1)
spe1 <- logNormCounts(spe1)

# select small number of genes for fast runtime in this example
set.seed(123)
ix <- c(
  which(rowData(spe1)$gene_name \%in\% c("PCP4", "NPY")), 
  sample(seq_len(nrow(spe1)), 2)
)
spe1 <- spe1[ix, ]

# run nnSVG
set.seed(123)
spe1 <- nnSVG(spe1)

# show results
rowData(spe1)


### Example 2: With covariates
### for more details see extended example in vignette

# load example dataset from STexampleData package
spe2 <- SlideSeqV2_mouseHPC()

# preprocessing steps
# remove spots with NA cell type labels
spe2 <- spe2[, !is.na(colData(spe2)$celltype)]
# skip spot-level quality control (already performed in this dataset)
# filter low-expressed and mitochondrial genes
spe2 <- filter_genes(
  spe2, filter_genes_ncounts = 1, filter_genes_pcspots = 1, 
  filter_mito = TRUE
)
# calculate logcounts using library size normalization
spe2 <- computeLibraryFactors(spe2)
spe2 <- logNormCounts(spe2)

# select small number of genes for fast runtime in this example
set.seed(123)
ix <- c(
  which(rowData(spe2)$gene_name \%in\% c("Cpne9", "Rgs14")), 
  sample(seq_len(nrow(spe2)), 2)
)
spe2 <- spe2[ix, ]

# create model matrix for cell type labels
X <- model.matrix(~ colData(spe2)$celltype)

# run nnSVG with covariates
set.seed(123)
spe2 <- nnSVG(spe2, X = X)

# show results
rowData(spe2)

}
