Data Driven and Psycholinguistics Motivated Approaches to Hate Speech Detection

Artigo Revisado por pares

Data Driven and Psycholinguistics Motivated Approaches to Hate Speech Detection

2020; National Polytechnic Institute; Volume: 24; Issue: 3 Linguagem: Inglês

10.13053/cys-24-3-3478

ISSN

2007-9737

Autores

Samuel Caetano da Silva, Thiago Castro Ferreira, Ricelli Moreira Silva Ramos, Ivandré Paraboni,

Tópico(s)

Hate Speech and Cyberbullying Detection

Resumo

Computational models of hate speech detection and related tasks (e.g., detecting misogyny, racism, xenophobia, homophobia etc.) have emerged as major Natural Language Processing (NLP) research topics in recent years. In the present work, we investigate a range of alternative implementations of three of these tasks - namely, hate speech, aggressive behaviour and target group recognition- by presenting a number of experiments involving different learning methods, including regularised logistic regression, convolutional neural networks (CNN) and deep bidirectional transformers (BERT), and using word embeddings, word n-grams, character n-grams and psycholinguistics-motivated (LIWC) features a like. Results suggest that a purely data-driven BERT model, and to some extent also a hybrid psycholinguisticly informed CNN model, generally outperform the alternatives under consideration for all tasks in both English and Spanish languages.

Ver no editor

Altmetric

PlumX

Entrar

Lembrar minha senha

Receber meu e-mail de confirmação

Data Driven and Psycholinguistics Motivated Approaches to Hate Speech Detection