Artigo Acesso aberto Revisado por pares

The Power of Fragmentation: A Hierarchical Transformer Model for Structural Segmentation in Symbolic Music Generation

2023; Institute of Electrical and Electronics Engineers; Volume: 31; Linguagem: Inglês

10.1109/taslp.2023.3263797

ISSN

2329-9304

Autores

Guowei Wu, Shipei Liu, Xiaoya Fan,

Tópico(s)

Neuroscience and Music Perception

Resumo

Symbolic music generation relies on the contextual representation capabilities of the generative model, where the most prevalent approach is the Transformer-based model. Learning contextual representations are also related to the structural elements in music, i.e., intro, verse, and chorus, which have not received much attention of scientific publications. In this paper, we propose a hierarchical Transformer model to learn multiscale contexts in music. In the encoding phase, we first design a fragment scope localization module to separate the music parts into chords and sections. Then, we use a multiscale attention mechanism to learn note-, chord-, and section-level contexts. In the decoding phase, we propose a hierarchical Transformer model that uses fine decoders to generate sections in parallel and a coarse decoder to decode the combined music. We also designed a music style normalization layer to achieve a consistent music style between the generated sections. Our model is evaluated on two open MIDI datasets. Experiments show that our model outperforms other comparative models in 50% (6 out of 12 metrics) and 83.3% (10 out of 12 metrics) of the quantitative metrics for short- and long-term music generation, respectively. Preliminary visual analysis also suggests its potential in following compositional rules, such as reuse of rhythmic patterns and critical melodies, which are associated with improved music quality.

Referência(s)