P8008 The NCBI Eukaryotic Genome Annotation Pipeline
2016; Oxford University Press; Volume: 94; Issue: suppl_4 Linguagem: Inglês
10.2527/jas2016.94supplement4184x
ISSN1544-7847
AutoresFrançoise Thibaud‐Nissen, Michael DiCuccio, Wratko Hlavina, Avi Kimchi, P. Kitts, Terence D. Murphy, Kim D. Pruitt, A. Souvorov,
Tópico(s)Identification and Quantification in Food
ResumoThe National Center for Biotechnology Information (NCBI) Eukaryotic Genome Annotation Pipeline (available at www.ncbi.nlm.nih.gov/genome/annotation_euk/, verified 26 May 2016) has been used to annotate over 280 organisms, including many animals of agricultural importance. The pipeline provides content for various NCBI resources, including Reference Sequence (RefSeq) sequence databases, Gene, BLAST databases, and the Map Viewer genome browser. The pipeline uses a modular framework for the automated execution of all annotation tasks from the fetching of raw and curated data from public repositories to the submission of the RefSeq-accessioned annotation products to public databases. The quality of the annotation is highly dependent on the availability of evidence for the species or closely related species. Alignments of RNA-Seq, traditional transcripts, expressed sequence tags, transcript assemblies, and proteins by Splign and ProSplign all contribute to the prediction of gene models by Gnomon, an alignment- and Hidden Markov Model-based gene prediction program developed at NCBI. The RefSeq group curates genes and transcripts for several plant and animal genomes, including pig, horse, and cattle. When available for the annotated species, the curated transcripts are aligned to the genome and take precedence over similar models produced by Gnomon based on other (noncurated) evidence. High-quality annotation is also achieved by producing models that compensate for assembly issues using the aligning evidence. The final annotation product can include transcripts and proteins for which the sequence has been modified relative to the draft genome assembly to correct a truncating mismatch or frameshift, or to fully represent a gene only partially present in the genome owing to sequence gaps. The final products of the pipeline include the annotated genomic sequences, the genes, and the transcript and protein products named based on orthology to model organisms or Blast hits to SwissProt/UniProtKb. We aim to reannotate organisms we maintain every 2 yr, so that the annotation incorporates recent evidence deposited in public databases, and benefits from improvements in software. We produce a summary report with each annotation, containing the evidence on which the annotation is based, and statistics on the annotated products. In the case of a reannotation, we also provide details about the genes and transcripts that changed. See all NCBI-annotated eukaryotes at http://www.ncbi.nlm.nih.gov/genome/annotation_euk/all/ (verified 28 May 2016).
Referência(s)