Artigo Acesso aberto Revisado por pares

ALBAYZIN 2016 spoken term detection evaluation: an international open competitive evaluation in Spanish

2017; Springer Nature; Volume: 2017; Issue: 1 Linguagem: Inglês

10.1186/s13636-017-0119-z

ISSN

1687-4722

Autores

Javier Tejedor, Doroteo T. Toledano, Paula Lopez‐Otero, Laura Docío-Fernández, L. Monzón Serrano, Inma Hernáez, Alejandro Coucheiro-Limeres, Javier Ferreiros, Júlia Olcoz, Jorge Llombart,

Tópico(s)

Speech and dialogue systems

Resumo

Within search-on-speech, Spoken Term Detection (STD) aims to retrieve data from a speech repository given a textual representation of a search term. This paper presents an international open evaluation for search-on-speech based on STD in Spanish and an analysis of the results. The evaluation has been designed carefully so that several analyses of the main results can be carried out. The evaluation consists in retrieving the speech files that contain the search terms, providing their start and end times, and a score value that reflects the confidence given to the detection. Two different Spanish speech databases have been employed in the evaluation: MAVIR database, which comprises a set of talks from workshops, and EPIC database, which comprises a set of European Parliament sessions in Spanish. We present the evaluation itself, both databases, the evaluation metric, the systems submitted to the evaluation, the results, and a detailed discussion. Five different research groups took part in the evaluation, and ten different systems were submitted in total. We compare the systems submitted to the evaluation and make a deep analysis based on some search term properties (term length, within-vocabulary/out-of-vocabulary terms, single-word/multi-word terms, and native (Spanish)/foreign terms).

Referência(s)