Artigo Acesso aberto Revisado por pares

Gene prediction and verification in a compact genome with numerous small introns

2004; Cold Spring Harbor Laboratory Press; Volume: 14; Issue: 11 Linguagem: Inglês

10.1101/gr.2816704

ISSN

1549-5469

Autores

Aaron Tenney, Randall H. Brown, Charles Vaske, Jennifer K. Lodge, Tamara L. Doering, Michael R. Brent,

Tópico(s)

Molecular Biology Techniques and Applications

Resumo

The genomes of clusters of related eukaryotes are now being sequenced at an increasing rate, creating a need for accurate, low-cost annotation of exon–intron structures. In this paper, we demonstrate that reverse transcription-polymerase chain reaction (RT–PCR) and direct sequencing based on predicted gene structures satisfy this need, at least for single-celled eukaryotes. The TWINSCAN gene prediction algorithm was adapted for the fungal pathogen Cryptococcus neoformans by using a precise model of intron lengths in combination with ungapped alignments between the genome sequences of the two closely related Cryptococcus varieties. This approach resulted in ∼60% of known genes being predicted exactly right at every coding base and splice site. When previously unannotated TWINSCAN predictions were tested by RT–PCR and direct sequencing, 75% of targets spanning two predicted introns were amplified and produced high-quality sequence. When targets spanning the complete predicted open reading frame were tested, 72% of them amplified and produced high-quality sequence. We conclude that sequencing a small number of expressed sequence tags (ESTs) to provide training data, running TWINSCAN on an entire genome, and then performing RT–PCR and direct sequencing on all of its predictions would be a cost-effective method for obtaining an experimentally verified genome annotation.

Referência(s)