Artigo Revisado por pares

A nucleotide composition constraint of genome sequences

2004; Elsevier BV; Volume: 28; Issue: 2 Linguagem: Inglês

10.1016/j.compbiolchem.2004.02.002

ISSN

1476-928X

Autores

Chun‐Ting Zhang, Ren Zhang,

Tópico(s)

Genomics and Phylogenetic Studies

Resumo

Let a, c, g and t denote the occurrence frequencies of A, C, G and T, respectively, in a genome. We calculated the statistical quantity S=a2+c2+g2+t2 for each of 809 genomes (11 archaea, 42 bacteria, 3 eukaryota, 90 phages, 36 viroids and 627 viruses) and 236 plasmids. We found that S<1/3 is strictly valid for almost all of the above genomes or plasmids. As a direct deduction of the above observation, it is shown that (i) the statistical quantity S is a kind of genome order index, which is negatively correlated with the Shannon H function; (ii) S<1/3 suggests that a minimal value of the Shannon H function is required for each genome; (iii) S defined above would be a new biological statistical quantity, useful to describe the composition features of genomes; (iv) By jointly considering the Chargaff Parity Rule 2, it is shown that the genomic G+C content should be in between 0.211 and 0.789.

Referência(s)