Information enhancement methods for large scale sequence analysis

1993; Pergamon Press; Volume: 17; Issue: 2 Linguagem: Inglês

10.1016/0097-8485(93)85010-a

ISSN

1879-0763

Autores

Jean‐Michel Claverie, David J. States,

Tópico(s)

Algorithms and Data Compression

Resumo

Abstract The improved efficiency of similarity search programs and the affordability of even faster computers allow studies where whole sequence databases can be the target of various comparisons with increasingly larger or numerous query sequences. However, the usefulness of those “brute force” methods now becomes limited by the time it takes an experienced scientist to sift the biologically relevant matches from overwhelming, albeit “statistically significant” outputs. The discrepancy between statistical vs biological significance has different causes: erroneous database entries, repetitive sequence elements, and the ubiquity of low complexity segments with biased composition. We present two masking methods (programs XNU and XBLAST) capable of eliminating most of the irrelevant outputs in a variety of large scale sequence analysis situations: global “all against all” database comparisons, massive partial cDNA sequencing (EST), positional cloning and genomic data analysis.

Referência(s)
Altmetric
PlumX