Deep Learning for the Classification of Genomic Signals
2020; Hindawi Publishing Corporation; Volume: 2020; Linguagem: Inglês
10.1155/2020/7698590
ISSN1563-5147
AutoresJ. Alejandro Morales, Román Saldaña, Manuel H. Santana-Castolo, Carlos E. Torres-Cerna, Ernesto Borrayo, Adriana P. Mendizábal, Hugo Vélez‐Pérez, Gerardo Mendizabal‐Ruiz,
Tópico(s)Machine Learning in Bioinformatics
ResumoGenomic signal processing (GSP) is based on the use of digital signal processing methods for the analysis of genomic data. Convolutional neural networks (CNN) are the state-of-the-art machine learning classifiers that have been widely applied to solve complex problems successfully. In this paper, we present a deep learning architecture and a method for the classification of three different functional genome types: coding regions (CDS), long noncoding regions (LNC), and pseudogenes (PSD) in genomic data, based on the use of GSP methods to convert the nucleotide sequence into a graphical representation of the information contained in it. The obtained accuracy scores of 83% and 84% when classifying between CDS vs. LNC and CDS vs. PSD, respectively, indicate the feasibility of employing this methodology for the classification of these types of sequences. The model was not able to differentiate from PSD and LNC. Our results indicate the feasibility of employing CNN with GSP for the classification of these types of DNA data.
Referência(s)