% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/find_ORFs.R
\name{findUORFs_exp}
\alias{findUORFs_exp}
\title{Find upstream ORFs from transcript annotation}
\usage{
findUORFs_exp(
  df,
  faFile = findFa(df),
  leaders = loadRegion(txdb, "leaders"),
  startCodon = startDefinition(1),
  stopCodon = stopDefinition(1),
  longestORF = TRUE,
  minimumLength = 0,
  overlappingCDS = FALSE,
  cage = NULL,
  extension = 1000,
  filterValue = 1,
  restrictUpstreamToTx = FALSE,
  removeUnused = FALSE,
  save_optimized = FALSE
)
}
\arguments{
\item{df}{a txdb or \code{\link{experiment}}}

\item{faFile}{FaFile of genome, default findFa(df). Default only works for
ORFik experiments, if TxDb, input manually like: findFa(genome_path)}

\item{leaders}{GRangesList, default: loadRegion(txdb, "leaders").
If you do not have any good leader annotation, a hack is to use
\code{ORFik:::groupGRangesBy(startSites(loadRegion(txdb, "cds"),
 asGR = TRUE, keep.names = TRUE, is.sorted = TRUE))}}

\item{startCodon}{(character vector) Possible START codons to search for.
Check \code{\link{startDefinition}} for helper function. Note that it is
case sensitive, so "atg" would give 0 hits for a sequence with only capital
"ATG" ORFs.}

\item{stopCodon}{(character vector) Possible STOP codons to search for.
Check \code{\link{stopDefinition}} for helper function. Note that it is
case sensitive, so "tga" would give 0 hits for a sequence with only capital
"TGA" ORFs.}

\item{longestORF}{(logical) Default TRUE. Keep only the longest ORF per
unique stopcodon: (seqname, strand, stopcodon) combination, Note: Not longest
per transcript! You can also use function
\code{\link{longestORFs}} after creation of ORFs for same result.}

\item{minimumLength}{(integer) Default is 0. Which is START + STOP = 6 bp.
Minimum length of ORF, without counting 3bps for START and STOP codons.
For example minimumLength = 8 will result in size of ORFs to be at least
START + 8*3 (bp) + STOP = 30 bases. Use this param to restrict search.}

\item{overlappingCDS}{logical, default FALSE. Include uORFs that overlap CDS.}

\item{cage}{Either a filePath for the CageSeq file as .bed .bam or .wig,
with possible compressions (".gzip", ".gz", ".bgz"), or already loaded
CageSeq peak data as GRanges or GAlignment.
NOTE: If it is a .bam file, it will add a score column by running:
convertToOneBasedRanges(cage, method = "5prime", addScoreColumn = TRUE)
The score column is then number of replicates of read, if score column is
something else, like read length, set the score column to NULL first.}

\item{extension}{The maximum number of basses upstream of the TSS to search
for CageSeq peak.}

\item{filterValue}{The minimum number of reads on cage position, for it to
be counted as possible new tss. (represented in score column in
CageSeq data) If you already filtered, set it to 0.}

\item{restrictUpstreamToTx}{a logical (FALSE). If TRUE: restrict leaders to
not extend closer than 5 bases from closest upstream leader, set this
to TRUE.}

\item{removeUnused}{logical (FALSE), if False:  (standard is to set them to
original annotation), If TRUE: remove leaders that did not have any cage
support.}

\item{save_optimized}{logical, default FALSE. If TRUE, save in the optimized
folder for the experiment. You must have made this directory before running
this function (call makeTxdbFromGenome first if not).}
}
\value{
A GRangesList of uORFs, 1 granges list element per uORF.
}
\description{
Procedure:
1. Create a new search space starting with the 5' UTRs.
2. Redefine TSS with CAGE if wanted.
3. Add the whole of CDS to search space to allow uORFs going into cds.
4. find ORFs on that search space.
5. Filter out wrongly found uORFs, if CDS is included. The CDS,
 alternative CDS, uORFs starting within the CDS etc.
}
\details{
From default a filtering process is done to remove "fake" uORFs, but only if
cds is included, since uORFs that stop on the stop codon on the CDS is not
a uORF, but an alternative cds by definition, etc.
}
\examples{
df <- ORFik.template.experiment()
# Without cds overlapping, no 5' leader extension
findUORFs_exp(df, extension = 0)
# Without cds overlapping, extends 5' leaders by 1000 (good for yeast etc)
findUORFs_exp(df)
# Include cds overlapping uorfs
findUORFs_exp(df, overlappingCDS = TRUE)
}
\seealso{
Other findORFs: 
\code{\link{findMapORFs}()},
\code{\link{findORFs}()},
\code{\link{findORFsFasta}()},
\code{\link{startDefinition}()},
\code{\link{stopDefinition}()}
}
