% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/txdbHelpers.R
\name{filterTranscripts}
\alias{filterTranscripts}
\title{Filter transcripts by lengths}
\usage{
filterTranscripts(
  txdb,
  minFiveUTR = 30L,
  minCDS = 150L,
  minThreeUTR = 30L,
  longestPerGene = TRUE,
  stopOnEmpty = TRUE,
  by = "tx",
  create.fst.version = FALSE
)
}
\arguments{
\item{txdb}{a TxDb object, ORFik experiment object or a path to one of:
(.gtf ,.gff, .gff2, .gff2, .db or .sqlite),
Only in the loadRegion function: if it is a GRangesList, it will return it self.}

\item{minFiveUTR}{(integer) minimum bp for 5' UTR during filtering for the
transcripts. Set to NULL if no 5' UTRs exists for annotation.}

\item{minCDS}{(integer) minimum bp for CDS during filtering for the
transcripts}

\item{minThreeUTR}{(integer) minimum bp for 3' UTR during filtering for the
transcripts. Set to NULL if no 3' UTRs exists for annotation.}

\item{longestPerGene}{logical (TRUE), return only longest valid transcript
per gene. NOTE: This is by priority longest cds isoform, if equal then pick
longest total transcript. So if transcript is shorter but cds is longer,
 it will still be the one returned.}

\item{stopOnEmpty}{logical TRUE, stop if no valid transcripts are found ?}

\item{by}{a character, default "tx" Either "tx" or "gene". What names to
output region by, the transcript name "tx" or gene names "gene".
NOTE: this is not the same as cdsBy(txdb, by = "gene"), cdsBy would then
only give 1 cds per Gene, loadRegion gives all isoforms, but with gene names.}

\item{create.fst.version}{logical, FALSE. If TRUE, creates a .fst version
of the transcript length table (if it not already exists),
reducing load time from ~ 15 seconds to
~ 0.01 second next time you run filterTranscripts with this txdb object.
The file is stored in the
same folder as the genome this txdb is created from, with the name:\cr
\code{paste0(ORFik:::remove.file_ext(metadata(txdb)[3,2]), "_",
       gsub(" \\(.*| |:", "", metadata(txdb)[metadata(txdb)[,1] ==
        "Creation time",2]), "_txLengths.fst")}\cr
Some error checks are done to see this is a valid location, if the txdb
data source is a repository like UCSC and not a local folder, it will not
be made.}
}
\value{
a character vector of valid transcript names
}
\description{
Filter transcripts to those who have leaders, CDS, trailers of some lengths,
you can also pick the longest per gene.
}
\details{
If a transcript does not have a trailer, then the length is 0,
so they will be filtered out if you set minThreeUTR to 1.
So only transcripts with leaders, cds and trailers will be returned.
You can set the integer to 0, that will return all within that group.

If your annotation does not have leaders or trailers, set them to NULL,
since 0 means there must exist a column called utr3_len etc.
Genes with gene_id = NA will be be removed.
}
\examples{
df <- ORFik.template.experiment.zf()
txdb <- loadTxdb(df)
txNames <- filterTranscripts(txdb, minFiveUTR = 1, minCDS = 30,
                             minThreeUTR = 1)
loadRegion(txdb, "mrna")[txNames]
loadRegion(txdb, "5utr")[txNames]

}
