Artigo Produção Nacional Revisado por pares

Information-theoretic analysis and prediction of protein atomic burials: on the search for an informational intermediate between sequence and structure

2012; Oxford University Press; Volume: 28; Issue: 21 Linguagem: Inglês

10.1093/bioinformatics/bts512

ISSN

1367-4811

Autores

Juliana Ribeiro Rocha, Marx G. van der Linden, Diogo C. Ferreira, Paulo Henrique Silva Marques de Azevedo, Antônio F. Pereira de Araújo,

Tópico(s)

Enzyme Structure and Function

Resumo

Abstract Motivation: It has been recently suggested that atomic burials, as expressed by molecular central distances, contain sufficient information to determine the tertiary structure of small globular proteins. A possible approach to structural determination from sequence could therefore involve a sequence-to-burial intermediate prediction step whose accuracy, however, is theoretically limited by the mutual information between these two variables. We use a non-redundant set of globular protein structures to estimate the mutual information between local amino acid sequence and atomic burials. Discretizing central distances of or atoms in equiprobable burial levels, we estimate relevant mutual information measures that are compared with actual predictions obtained from a Naive Bayesian Classifier (NBC) and a Hidden Markov Model (HMM). Results: Mutual information density for 20 amino acids and two or three burial levels were estimated to be roughly 15% of the unconditional burial entropy density. Lower estimates for the mutual information between local amino acid sequence and burial of a single residue indicated an increase in mutual information with the number of burial levels up to at least five or six levels. Prediction schemes were found to efficiently extract the available burial information from local sequence. Lower estimates for the mutual information involving single burials are consistently approached by predictions from the NBC and actually surpassed by predictions from the HMM. Near-optimal prediction for the HMM is indicated by the agreement between its density of prediction information and the corresponding density of mutual information between input and output representations. Availability: The dataset of protein structures and the prediction implementations are available at http://www.btc.unb.br/ (in ‘Software’). Contact: aaraujo@unb.br Supplementary information: Supplementary data are available at Bioinformatics online.

Referência(s)
Altmetric
PlumX