BaDumTss: Multi-task Learning for Beatbox Transcription
2022; Springer Science+Business Media; Linguagem: Inglês
10.1007/978-3-031-05981-0_14
ISSN1611-3349
AutoresPriya Mehta, Meet Maheshwari, Brihi Joshi, Tanmoy Chakraborty,
Tópico(s)Music Technology and Sound Studies
ResumoThe challenge of transcribing audio into symbolic notations is a well-known problem in music information retrieval. In this work, we explore a novel task – automatic music transcription for Beatbox sounds, also known as Vocal Percussions. As Beatbox sounds cannot be created in a synthetic manner, they inherently vary within the same speaker as well as across different speakers. To address this, we propose BaDumTss, which makes use of a pretraining strategy over a novel sequence traversal method, thereby ensuring robustness and efficiency against new Beatbox sequences. Furthermore, BaDumTss is agnostic to time-based stretches and warps, as well as amplitude changes in the Beatbox sequence. It predicts both onsets and frame-set in a multi-task manner while gaining a whopping 56% and 326% relative improvement frame-set and onset-level F1 scores over the best performing baseline respectively. We also release an annotated dataset of monophonic Beatbox sequences along with their corresponding MIDI labels, the first of its kind comprising Beatbox samples with different variations such as time-stretches, pitch shifts, and added noise.
Referência(s)