Artigo Acesso aberto Revisado por pares

Identify High-Quality Protein Structural Models by Enhanced K -Means

2017; Hindawi Publishing Corporation; Volume: 2017; Linguagem: Inglês

10.1155/2017/7294519

ISSN

2314-6141

Autores

Hongjie Wu, Haiou Li, Min Jiang, Cheng Chen, Qiang Lv, Chuang Wu,

Tópico(s)

Bioinformatics and Genomic Networks

Resumo

Background. One critical issue in protein three-dimensional structure prediction using either ab initio or comparative modeling involves identification of high-quality protein structural models from generated decoys. Currently, clustering algorithms are widely used to identify near-native models; however, their performance is dependent upon different conformational decoys, and, for some algorithms, the accuracy declines when the decoy population increases. Results. Here, we proposed two enhanced K -means clustering algorithms capable of robustly identifying high-quality protein structural models. The first one employs the clustering algorithm SPICKER to determine the initial centroids for basic K -means clustering ( S K -means), whereas the other employs squared distance to optimize the initial centroids ( K -means++). Our results showed that S K -means and K -means++ were more robust as compared with SPICKER alone, detecting 33 (59%) and 42 (75%) of 56 targets, respectively, with template modeling scores better than or equal to those of SPICKER. Conclusions. We observed that the classic K -means algorithm showed a similar performance to that of SPICKER, which is a widely used algorithm for protein - structure identification. Both S K -means and K -means++ demonstrated substantial improvements relative to results from SPICKER and classical K -means.

Referência(s)