Deep learning on chaos game representation for proteins
2019; Oxford University Press; Volume: 36; Issue: 1 Linguagem: Inglês
10.1093/bioinformatics/btz493
ISSN1367-4811
AutoresHannah F. Löchel, Dominic Eger, Theodor Sperlea, Dominik Heider,
Tópico(s)RNA and protein synthesis mechanisms
ResumoClassification of protein sequences is one big task in bioinformatics and has many applications. Different machine learning methods exist and are applied on these problems, such as support vector machines (SVM), random forests (RF) and neural networks (NN). All of these methods have in common that protein sequences have to be made machine-readable and comparable in the first step, for which different encodings exist. These encodings are typically based on physical or chemical properties of the sequence. However, due to the outstanding performance of deep neural networks (DNN) on image recognition, we used frequency matrix chaos game representation (FCGR) for encoding of protein sequences into images. In this study, we compare the performance of SVMs, RFs and DNNs, trained on FCGR encoded protein sequences. While the original chaos game representation (CGR) has been used mainly for genome sequence encoding and classification, we modified it to work also for protein sequences, resulting in n-flakes representation, an image with several icosagons.We could show that all applied machine learning techniques (RF, SVM and DNN) show promising results compared to the state-of-the-art methods on our benchmark datasets, with DNNs outperforming the other methods and that FCGR is a promising new encoding method for protein sequences.https://cran.r-project.org/.Supplementary data are available at Bioinformatics online.
Referência(s)