% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/Brick_functions.R
\name{Create_many_Bricks}
\alias{Create_many_Bricks}
\title{Create the entire HDF5 structure and load the bintable}
\usage{
Create_many_Bricks(
    BinTable,
    bin_delim = "\\t",
    col_index = c(1, 2, 3),
    impose_discontinuity = TRUE,
    hdf_chunksize = NULL,
    output_directory = NA,
    file_prefix = NA,
    remove_existing = FALSE,
    link_existing = FALSE,
    experiment_name = NA,
    resolution = NA,
    type = c("both", "cis", "trans")
)
}
\arguments{
\item{BinTable}{\strong{Required}
A string containing the path to the file to load as the binning table for
the Hi-C experiment. The number of entries per chromosome defines the
dimension of the associated Hi-C data matrices. For example, if chr1
contains 250 entries in the binning table, the \emph{cis} Hi-C data matrix for
chr1 will be expected to contain 250 rows and 250 cols. Similary, if the
same binning table contained 150 entries for chr2, the \emph{trans} Hi-C
matrices for chr1,chr2 will be a matrix with dimension 250 rows and
150 cols.

There are no constraints on the bintable format. As long as the table is
in a delimited format, the corresponding table columns can be outlined with
the associated parameters. The columns of importance are chr, start and end.

It is recommended to always use binning tables where the end and start of
consecutive ranges are not the same. If they are the same, this may lead to
\strong{unexpected behaviour} when using the GenomicRanges "any" overlap
function.}

\item{bin_delim}{\strong{Optional}. Defaults to tabs.
A character vector of length 1 specifying the delimiter used in the file
containing the binning table.}

\item{col_index}{\strong{Optional}. Default "c(1,2,3)".
A character vector of length 3 containing the indexes of the required
columns in the binning table. the first index, corresponds to the chr
column, the second to the start column and the third to the end column.}

\item{impose_discontinuity}{\strong{Optional}. Default TRUE.
If TRUE, this parameter ensures a check to make sure that required the end
and start coordinates of consecutive entries are not the same per
chromosome.}

\item{hdf_chunksize}{\strong{Optional}.
A numeric vector of length 1. If provided, the HDF dataset will use this
value as the chunk size, for all matrices. By default, the ChunkSize is
set to matrix dimensions/100.}

\item{output_directory}{\strong{Required}
A string specifying the location where the HDF files will be created.}

\item{file_prefix}{\strong{Required}
A string specifying the prefix that is concatenated to the hdf files stored
in the output_directory.}

\item{remove_existing}{\strong{Optional}. Default FALSE.
If TRUE, will remove the HDF file with the same name and create a new one.
By default, it will not replace existing files.}

\item{link_existing}{\strong{Optional}. Default FALSE.
If TRUE, will re-add the HDF file with the same name.
By default, this parameter is set to FALSE.}

\item{experiment_name}{\strong{Optional}.
If provided, this will be the experiment name for the BrickContainer.}

\item{resolution}{\strong{required}.
A value of length 1 of class character or numeric specifying the resolution
of the Hi-C data loaded.}

\item{type}{\strong{optional}. Default any
A value from one of any, cis, trans specifying the type of matrices to load.
Any will load both cis (intra-choromosomal, e.g. chr1 vs chr1) and trans (
inter-chromosomal, e.g. chr1 vs chr2) Hi-C matrices. Whereas cis and trans
will load either cis or trans Hi-C matrices.}
}
\value{
This function will generate the target Brick file. Upon completion,
the function will return an object of class BrickContainer.
}
\description{
\code{Create_many_Bricks} creates the HDF file and returns a BrickContainer
}
\details{
This function creates the complete HDF data structure, loads the binning
table associated to the Hi-C experiment, creates a 2D matrix
layout for all specified chromosome pairs and creates a json file for the
project. At the end, this function will return a S4 object of class
BrickContainer.  \strong{Please note}, the binning table must be a
discontinuous one (first range end != secode range start),
as ranges overlaps using the "any" form will routinely identify adjacent
ranges with the same end and start to be in the overlap. Therefore, this
criteria is enforced as default behaviour.

The structure of the HDF file is as follows:
The structure contains three major groups which are then hierarchically
nested with other groups to finally lead to the corresponding datasets.
\itemize{
\item Base.matrices - \strong{group} For storing Hi-C matrices
\itemize{
\item chromosome - \strong{group}
\item chromosome - \strong{group}
\itemize{
\item attributes - \strong{attribute}
\itemize{
\item Filename - Name of the file
\item Min - min value of Hi-C matrix
\item Max - max value of Hi-C matrix
\item sparsity - specifies if this is a sparse matrix
\item distance - max distance of data from main diagonal
\item Done - specifies if a matrix has been loaded
}
\item matrix - \strong{dataset} - contains the matrix
\item chr1_bin_coverage - \strong{dataset} - proportion of row
cells with values greater than 0
\item chr1_row_sums - \strong{dataset} - total sum of all values
in a row
\item chr2_col_sums - \strong{dataset} - total sum of all values
in a col
\item chr2_bin_coverage - \strong{dataset} - proportion of col
cells with values greater than 0
\item sparsity - \strong{dataset} - proportion of non-zero cells
near the diagonal
}
}
\item Base.ranges - \strong{group}, Ranges tables for quick and easy
access. Additional ranges tables are added here under separate group names.
\itemize{
\item Bintable - \strong{group} - The main binning table associated
to a Brick.
\itemize{
\item ranges - \strong{dataset} - Contains the three main columns
chr, start and end.
\item offsets - \strong{dataset} - first occurence of any given
chromosome in the ranges dataset.
\item lengths - \strong{dataset} - Number of occurences of that
chromosome
\item chr.names - \strong{dataset} - What chromosomes are present
in the given ranges table.
}
}
\item Base.metadata - \strong{group}, A place to store metadata info
\itemize{
\item chromosomes - \strong{dataset} - Metadata information
specifying the chromosomes present in this particular Brick file.
\item other metadata tables.
}
}

Keep in mind that if the end coordinates and start coordinates of adjacent
ranges are not separated by at least a value of 1, then
impose.discontinuity = TRUE will likely cause an error to occur.
This may seem obnoxious, but GenomicRanges by default will consider an
overlap of 1 bp as an overlap. Therefore, to be certain that ranges which
should not be, are not being targeted during retrieval operations, a check
is initiated to make sure that adjacent ends and starts are not
overlapping.
To load continuous ranges, use impose.discontinuity = FALSE.

Also note, that col.index determines which columns to use for chr, start
and end. Therefore, the original binning table may have 10 or 20 columns,
but it only requires the first three in order of chr, start and end.
}
\examples{
Bintable.path <- system.file(file.path("extdata", "Bintable_100kb.bins"), 
package = "HiCBricks")
out_dir <- file.path(tempdir(), "Creator_test")
dir.create(out_dir)
My_BrickContainer <- Create_many_Bricks(BinTable = Bintable.path, 
    bin_delim = " ", output_directory = out_dir, file_prefix = "Test", 
    experiment_name = "Vignette Test", resolution = 100000, 
    remove_existing = TRUE)

}
