How to Dissect a Muppet: The Structure of Transformer Embedding Spaces

Artigo Acesso aberto Revisado por pares

How to Dissect a Muppet: The Structure of Transformer Embedding Spaces

2022; Association for Computational Linguistics; Volume: 10; Linguagem: Inglês

10.1162/tacl_a_00501

ISSN

2307-387X

Autores

Timothee Mickus, Denis Paperno, Mathieu Constant,

Tópico(s)

Natural Language Processing Techniques

Resumo

Abstract Pretrained embeddings based on the Transformer architecture have taken the NLP community by storm. We show that they can mathematically be reframed as a sum of vector factors and showcase how to use this reframing to study the impact of each component. We provide evidence that multi-head attentions and feed-forwards are not equally useful in all downstream applications, as well as a quantitative overview of the effects of finetuning on the overall embedding space. This approach allows us to draw connections to a wide range of previous studies, from vector space anisotropy to attention weights.

Ver no editor

Altmetric

PlumX

Entrar

Lembrar minha senha

Receber meu e-mail de confirmação

How to Dissect a Muppet: The Structure of Transformer Embedding Spaces