% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/linkMultipleVariants.R
\name{linkMultipleVariants}
\alias{linkMultipleVariants}
\title{Process an experiment with multiple variable sequences}
\usage{
linkMultipleVariants(combinedDigestParams = list(), ...)
}
\arguments{
\item{combinedDigestParams}{A named list of arguments to
\code{digestFastqs} for the combined ("naive") run.}

\item{...}{Additional arguments providing arguments to \code{digestFastqs}
for the separate runs (processing each variable sequence in turn).
Each argument must be a named list of arguments to \code{digestFastqs}.
In addition, arguments \code{collapseMaxDist}, \code{collapseMinScore}
and \code{collapseMinRatio} can be specified, and will be passed on 
to \code{collapseMutantsBySimilarity}.}
}
\value{
A list with the following elements: 
\itemize{
\item countAggregated - a \code{tibble} with columns corresponding to 
    each of the variable sequences, and a column with the total observed 
    read count for the combination.
\item convSeparate - a list of conversion tables from the respective 
    separate runs.
\item outCombined - the \code{digestFastqs} output for the combined run.
}
}
\description{
This function enables the processing of data sets with multiple
variable sequences, which should potentially be handled in different
ways. For example, a barcode association experiment
with two variable sequences (the barcode and the biological variant)
that need to be processed differently, e.g. in terms of matching to
wildtype sequences or collapsing of similar sequences.
In contrast, while \code{digestFastqs} allow the specification
of multiple variable sequences (within each of the forward and reverse
reads), they will be concatenated and processed as a single unit.
}
\details{
linkMultipleVariants will process the input in the following way:
\itemize{
\item First, run \code{digestFastqs} with the parameters provided
    in \code{combinedDigestParams}. Typically, this will be a
    "naive" counting run, where the frequencies of all observed
    variants are tabulated. The variable sequences
    within the forward and reverse reads, respectively, will be
    processed as a single sequence. 
\item Next, run \code{digestFastqs} with each of the additional
    parameter sets provided (\code{...}). Each of these should
    correspond to a single variable sequence from the combined
    run (i.e., if there are two Vs in the element specifications
    in the combined run, there should be two additional
    parameter sets provided, each corresponding to the
    processing of one variable sequence part). It is assumed
    that the order of the additional arguments correspond to the
    order of the variable sequences in the combined run, in such a way
    that if the variable sequences extracted in each of the separate
    runs are concatenated in the order that the parameter sets are
    provided to \code{linkMultipleVariants}, they will form the variable
    sequence extracted in the combined run.
\item The result of each of the separate runs is a 'conversion table',
    containing the final set of identified sequence variants as well
    as all individual sequences corresponding to each of them. This
    is then combined with the count table from the combined, "naive"
    run in order to create an aggregated count table. More precisely,
    each sequence in the combined run is split into the constituent
    variable sequences, and
    each variable sequence is then matched to the output from the right
    separate run, from which the final feature ID (mutant name, or
    collapsed sequence) will be extracted and used to replace the original
    sequence in the combined count table. Once all the matches are done,
    rows with NAs (where no match could be found in the separate run)
    are removed and the counts are aggregated across all identical
    combinations of variable sequences.
}

In order to define the \code{elementsForward} and \code{elementsReverse}
arguments for the separate runs, a strategy that often works is to simply
copy the arguments from the combined run, and successively replace all
but one of the 'V's by 'S'. This will effectively process one variable
sequence at the time, while keeping all other elements of the reads
consistent (since this can affect e.g. filtering criteria). Note that
to process individual variable sequences in the reverse read, you also
need to swap the 'forward' and 'reverse' specifications (since
\code{digestFastqs} requires a forward read).
}
\examples{
fqFile <- system.file("extdata", "cisInput_1.fastq.gz", 
                      package = "mutscan")
out <- linkMultipleVariants(
    combinedDigestParams = list(fastqForward = fqFile, 
                                elementsForward = "SVCV", 
                                elementLengthsForward = c(1, 10, 18, 96)),
    # the first variable sequence is the UMI
    umi = list(fastqForward = fqFile, elementsForward = "SVCS",
               elementLengthsForward = c(1, 10, 18, 96)),
    # the second variable sequence is the amplicon variant
    var = list(fastqForward = fqFile, elementsForward = "SSCV",
               elementLengthsForward = c(1, 10, 18, 96), 
               collapseMaxDist = 3, collapseMinScore = 1)
)
# conversion tables
lapply(out$convSeparate, head)
# aggregated count table
head(out$countAggregated)

}
\author{
Charlotte Soneson, Michael Stadler
}
