Artigo Acesso aberto Revisado por pares

The Digital Middle Ages: An Introduction

2017; University of Chicago Press; Volume: 92; Issue: S1 Linguagem: Inglês

10.1086/694236

ISSN

2040-8072

Autores

David J. Birnbaum, Sheila Bonde, Mike Kestemont,

Tópico(s)

Language and cultural evolution

Resumo

While such hit lists are interesting in their own right, looking at mere frequency does not reveal the complex relationships that might exist among them and with other words with which these entities are typically associated. To study and visualize these, we turn to one final technique, from the sphere of distributional embeddings: word embeddings. Just like topic modeling techniques, word embeddings build upon the distributional hypothesis that words with a similar meaning will have the tendency to appear in similar contexts. However, whereas topic modeling techniques are geared to finding good representations for topics and documents, word embedding can yield much more fine-grained representations for individual words. Word embeddings will represent the items in a vocabulary using a numerical vector, or a list of numbers that aim to characterize the word meaning. The advantage of such a word-level model is that we can apply straightforward arithmetic to these vector representations and ask the model, for instance, to return the five words that it deems most similar to a certain query term. If we apply a popular word-embeddings model (word2vec) to our wikified corpus, we can inspect the immediate semantic neighborhood of the following terms listed in Table 2.75 Using the vector representation that we can extract for our wikified authors, we can also use these embeddings to visualize the relationships between our authors in a dendrogram, or tree diagram. In Fig. 7, the wikified links take the form of leaves in a tree, which are eventually joined into new nodes in a branch structure. The branches reflect the distances between the representations that we obtained for these authors. Note how the structure that arises from this tree makes sense (monarchs cluster with monarchs, philosophers with philosophers, and so on) but also offers some surprising results: Ovid and Virgil, for instance, cluster with Boccaccio, Petrarch, and Dante, instead of with other authors from antiquity, such as Cicero or Plato. Note, also, how the tree realizes at the top level what seems to be a fairly neat split between vernacular authors and nonvernacular authorities.

Referência(s)