% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/genome_download_helper.R
\name{get_noncoding_rna}
\alias{get_noncoding_rna}
\title{Download genome (fasta), annotation (GTF) and contaminants}
\usage{
get_noncoding_rna(ncRNA, output.dir, organism, gunzip)
}
\arguments{
\item{ncRNA}{logical or character, default FALSE (not used, no download),
if TRUE or defned path, ncRNA is used as a contaminant reference.
If TRUE, will try to find ncRNA sequences from the gtf file, usually represented as
lncRNA (long noncoding RNA's). Will let you know if no ncRNA sequences were found in
gtf.\cr If not found try character input:\cr
Alternatives; "auto":
Will try to find ncRNA file on NONCODE from organism,
Homo sapiens -> human etc. "auto" will not work for all,
then you must specify the name used by
NONCODE, go to the link below and find it.
If not "auto" / "" it must be a character vector
of species common name (not scientific name) Homo sapiens is human,
Rattus norwegicus is rat etc, download ncRNA sequence to filter out with.
From NONCODE online server, if you cant find
common name see: http://www.noncode.org/download.php/}

\item{output.dir}{directory to save downloaded data}

\item{organism}{scientific name of organism, Homo sapiens,
Danio rerio, Mus musculus, etc. See \code{biomartr:::get.ensembl.info()}
for full list of supported organisms.}

\item{gunzip}{logical, default TRUE, uncompress downloaded files
that are zipped when downloaded, should be TRUE!}
}
\value{
a named character vector of path to genomes and gtf downloaded,
 and additional contaminants if used. If merge_contaminants is TRUE, will not
 give individual fasta files to contaminants, but only the merged one.
}
\description{
This function automatically downloads (if files not already exists)
genomes and contaminants specified for genome alignment.
By default, it will use ensembl reference,
upon completion, the function will store
a file called \code{file.path(output.dir, "outputs.rds")} with
the output paths of your completed genome/annotation downloads.
For most non-model nonvertebrate organisms, you need
my fork of biomartr for it to work:
remotes::install_github("Roleren/biomartr)
If you misspelled something or crashed, delete wrong files and
run again.\cr
Do remake = TRUE, to do it all over again.\cr
}
\details{
Some files that are made after download:\cr
- A fasta index for the genome\cr
- A TxDb to speed up GTF/GFF reading\cr
- Seperat of merged contaminant files\cr
Files that can be made:\cr
- Gene symbols (hgnc, etc)\cr
- Uniprot ids (For name of protein structures)\cr
If you want custom genome or gtf from you hard drive, assign existing
paths like this: \cr
annotation <- getGenomeAndAnnotation(GTF = "path/to/gtf.gtf",
genome = "path/to/genome.fasta")\cr
}
\examples{

## Get Saccharomyces cerevisiae genome and gtf (create txdb for R)
#getGenomeAndAnnotation("Saccharomyces cerevisiae", tempdir(), assembly_type = "toplevel")
## Download and add pseudo 5' UTRs
#getGenomeAndAnnotation("Saccharomyces cerevisiae", tempdir(), assembly_type = "toplevel",
#  pseudo_5UTRS_if_needed = 100)
## Get Danio rerio genome and gtf (create txdb for R)
#getGenomeAndAnnotation("Danio rerio", tempdir())

output.dir <- "/Bio_data/references/zebrafish"
## Get Danio rerio and Phix contamints to deplete during alignment
#getGenomeAndAnnotation("Danio rerio", output.dir, phix = TRUE)

## Optimize for ORFik (speed up for large annotations like human or zebrafish)
#getGenomeAndAnnotation("Danio rerio", tempdir(), optimize = TRUE)

# Drosophila melanogaster (toplevel exists only)
#getGenomeAndAnnotation("drosophila melanogaster", output.dir = file.path(config["ref"],
# "Drosophila_melanogaster_BDGP6"), assembly_type = "toplevel")
## How to save malformed refseq gffs:
## First run function and let it crash:
#annotation <- getGenomeAndAnnotation(organism = "Arabidopsis thaliana",
#  output.dir = "~/Desktop/test_plant/",
#  assembly_type = "primary_assembly", db = "refseq")
## Then apply a fix (example for linux, too long rows):
# fixed_gff <- fix_malformed_gff("~/Desktop/test_plant/Arabidopsis_thaliana_genomic_refseq.gff")
## Then updated arguments:
# annotation <- c(fixed_gff, "~/Desktop/test_plant/Arabidopsis_thaliana_genomic_refseq.fna")
# names(annotation) <- c("gtf", "genome")
# Then make the txdb (for faster R use)
# makeTxdbFromGenome(annotation["gtf"], annotation["genome"], organism = "Arabidopsis thaliana")
}
\references{
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4919035/
}
\seealso{
Other STAR: 
\code{\link{STAR.align.folder}()},
\code{\link{STAR.align.single}()},
\code{\link{STAR.allsteps.multiQC}()},
\code{\link{STAR.index}()},
\code{\link{STAR.install}()},
\code{\link{STAR.multiQC}()},
\code{\link{STAR.remove.crashed.genome}()},
\code{\link{install.fastp}()}
}
\keyword{internal}
