RS-MSConvNet: A Novel End-to-End Pathological Voice Detection Model
2022; Institute of Electrical and Electronics Engineers; Volume: 10; Linguagem: Inglês
10.1109/access.2022.3219606
ISSN2169-3536
AutoresWongsathon Pathonsuwan, Khomdet Phapatanaburi, Prawit Buayai, Talit Jumphoo, Patikorn Anchuen, Monthippa Uthansakul, Peerapong Uthansakul,
Tópico(s)Speech and Audio Processing
ResumoRecent studies have reported the success of multi-scale convolution neural network (MSConvNet) model for many classification applications due to its powerful ability of exploring multi-scale convolution block to extract multi-scale representations to make a detection. However, a new design based on MSConvNet for pathological voice detection has not been explored. In this paper, we propose RS-MSConvNet, a novel end-to-end MSConvNet model using raw speech for pathological voice detection. The main contribution of the proposed RS-MSConvNet method is to exploit the multi-scale convolution block, followed by spatial-temporal feature block, and fully connected layer as classification. In addition, to further improve accuracy performance, we propose a novel hybrid detection model by integrating the feature extraction ability of the RS-MSConvNet model and the classifier of support vector machine (SVM) method, called RS-MSConvNet-SVM model. The effectiveness of our proposed models is investigated using the TORGO database. The experimental results reveal that the RS-MSConvNet model outperforms other baseline methods in the speaker-independent task. Moreover and as compared to the RS-MSConvNet-SVM model, a further improved accuracy is obtained using the RS-MSConvNet-SVM model. These outcomes exhibit that our proposed models are useful for pathological voice detection.
Referência(s)