Artigo Revisado por pares

CUDA-enabled hierarchical ward clustering of protein structures based on the nearest neighbour chain algorithm

2015; SAGE Publishing; Volume: 30; Issue: 2 Linguagem: Inglês

10.1177/1094342015597988

ISSN

1741-2846

Autores

Hoang-Vu Dang, Bertil Schmidt, Andreas Hildebrandt, Tuan Tu Tran, Anna Katharina Hildebrandt,

Tópico(s)

Microbial Metabolic Engineering and Bioproduction

Resumo

Clustering of molecular systems according to their three-dimensional structure is an important step in many bioinformatics workflows. In applications such as docking or structure prediction, many algorithms initially generate large numbers of candidate poses (or decoys), which are then clustered to allow for subsequent computationally expensive evaluations of reasonable representatives. Since the number of such candidates can easily range from thousands to millions, performing the clustering on standard central processing units (CPUs) is highly time consuming. In this paper, we analyse and evaluate different approaches to parallelize the nearest neighbour chain algorithm to perform hierarchical Ward clustering of protein structures, using both atom-based root mean square deviation (RMSD) and rigid-body RMSD molecular distances on a graphics processing unit (GPU). This leads to a speedup of around one order of magnitude of our CUDA implementation on a GeForce Titan GPU compared to a multi-threaded CPU implementation on a Core-i7 2700. Furthermore, the runtimes compare favourably with ClusCo, another state-of-the-art CUDA-enabled protein structure clustering method, while achieving similar accuracy on the iTasser benchmark dataset. Our implementation has also been incorporated into the Biochemical Algorithms library to allow easy integration into biologists’ workflows.

Referência(s)