\name{h5dapply}
\alias{h5dapply}
\alias{h5dapply,numeric-method}
\alias{h5dapply,IRanges-method}
\alias{h5dapply,GRanges-method}

\title{h5dapply}
\description{
This is the central function of the h5vc package, allows an apply
operation along common dimensions of datasets in a tally file.
}
\usage{
\S4method{h5dapply}{numeric}( ..., blocksize, range)
\S4method{h5dapply}{GRanges}( ..., group, range)
\S4method{h5dapply}{IRanges}( ..., range)
}
\arguments{
  \item{blocksize}{The size of the blocks in which to process the data (integer) }
  \item{...}{Further parameters to be handed over to \code{FUN}}
  \item{range}{The range along the specified dimensions which should be
  processed, this allows for limiting the apply to a specific region
  or set of samples, etc. - optional (defaults to the whole chromosome); This can be a \code{GRanges}, \code{IRanges} or numerical vector of length 2 (i.e [start, stop])}
  \item{group}{The group (location) within the HDF5 file, note that when range is \code{numeric} or \code{IRanges} this has to point to the location of the chromosome, e.g. \code{/ExampleTally/Chr7}. When range is a \code{GRanges} object, the chromosome information is encoded in the \code{GRanges} directly and \code{group} should only point to the root-group of the study, i.e. \code{/ExampleTally}}
}
\details{

Additional function parameters are:
\describe{
\item{filename}{ The name of a tally file to process }
\item{group}{The name of a group in that tally file }

\item{FUN}{The function to apply to each block, defaults to
  \code{function(x) x}, which returns the data as is (a list of
  arrays) }

\item{names}{The names of the datasets to extract,
  e.g. \code{c("Counts","Coverages")} - optional (defaults to all datasets)}
\item{dims}{The dimension to apply along for each dataset in the same
  order as \code{names}, these should correspond to compatible
  dimensions between the datsets. - optional (defaults to the genomic position dimension)}

\item{samples}{Character vector of sample names - must match contents of sampleData stored in the \code{tallyFile}}
\item{sampleDimMap}{A list mapping dataset names to their respective sample dimensions - default provides values for "Counts", "Coverages", "Deletions" and "Reference"}
\item{verbose}{Boolean flag that controls the amount of messages being
  printed by \code{h5dapply}}
\item{BPPARAM}{BPPARAM object to be passed to the \code{\link[BiocParallel]{bplapply}} call used to apply \code{FUN} to the blocks - see \code{BiocParallel} documentation for details; if this is \code{NULL} a normal \code{lapply} will be used instead of \code{\link[BiocParallel]{bplapply}}.}
}
This function applys parameter \code{FUN} to blocks along a specified
axis within the tally file, group and specified datasets. It creates a
list of arrays (one for each dataset) and processes that list with the
function \code{FUN}.

This is by far the most essential and powerful function within this
package since it allows the user to execute their own analysis functions
on the tallies stored within the HDF5 tally file.

The supplied function \code{FUN} must have a parameter \code{data} or \code{...} (the former is the expected behaviour), which will be supplied to \code{FUN} from \code{h5dapply} for each block. This structure is a \code{list} with one slot for each dataset specified in the \code{names} argument to \code{h5dapply} containing the array corresponding to the current block in the given dataset. Furthemore the slot \code{h5dapplyInfo} is reserved and contains another \code{list} with the following content:

\code{Blockstart} is an integer specifying the starting position of the current block (in the dimension specified by the \code{dims} argument to \code{h5dapply})

\code{Blockend} is an integer specifying the end position of the current block (in the dimension specified by the \code{dims} argument to \code{h5dapply})

\code{Datasets} Contains a \code{data.frame} as it is returned by \code{\link[rhdf5]{h5ls}} listing all datasets present in the other slots of \code{data} with their group, name, dimensions, number of dimensions (\code{DimCount}) and the dimension that is used for splitting into blocks (\code{PosDim})

\code{Group} contains the name of the group as specified by the \code{group} argument to \code{h5dapply}

}
\value{
A list with one entry per block, which is the result of applying
\code{FUN} to the datasets specified in the parameter \code{names}
within the block.
}
\author{
Paul Pyl
}

\examples{
  # loading library and example data
  library(h5vc)
  tallyFile <- system.file( "extdata", "example.tally.hfs5", package = "h5vcData" )
  sampleData <- getSampleData( tallyFile, "/ExampleStudy/16" )
  # check the available samples and sampleData
  print(sampleData)
  data <- h5dapply( #extracting coverage using h5dapply
    filename = tallyFile,
    group = "/ExampleStudy/16",
    blocksize = 1000,
    FUN = function(x) rowSums(x$Coverages),
    names = c( "Coverages" ),
    range = c(29000000,29010000),
    verbose = TRUE
    )
  coverages <- do.call( rbind, data )
  colnames(coverages) <- sampleData$Sample[order(sampleData$Column)]
  coverages
  #Subsetting by Sample
  sampleData <- sampleData[sampleData$Patient == "Patient5",]
  data <- h5dapply( #extracting coverage using h5dapply
    filename = tallyFile,
    group = "/ExampleStudy/16",
    blocksize = 1000,
    FUN = function(x) rowSums(x$Coverages),
    names = c( "Coverages" ),
    range = c(29000000,29010000),
    samples = sampleData$Sample,
    verbose = TRUE
    )
  coverages <- do.call( rbind, data )
  colnames(coverages) <- sampleData$Sample[order(sampleData$Column)]
  coverages
  #Using GRanges and IRanges
  library(GenomicRanges)
  library(IRanges)
  granges <- GRanges(
  c(rep("16", 10), rep("22", 10)),
  ranges = IRanges(
    start = c(seq(29000000,29009000, 1000), seq(39000000,39009000, 1000)),
    width = 1000
  ))
  data <- h5dapply( #extracting coverage using h5dapply
    filename = tallyFile,
    group = "/ExampleStudy",
    blocksize = 1000,
    FUN = function(x) rowSums(x$Coverages),
    names = c( "Coverages" ),
    range = granges,
    verbose = TRUE
    )
  lapply( data, function(x) do.call(rbind, x) )
}