A system for the retrieval of Italian broadcast news
2000; Elsevier BV; Volume: 32; Issue: 1-2 Linguagem: Inglês
10.1016/s0167-6393(00)00022-4
ISSN1872-7182
Autores Tópico(s)Topic Modeling
ResumoAbstract This paper presents a prototype for the retrieval of Italian broadcast news, which has been developed at ITC-irst. The architecture employs a speech recognition engine for the automatic transcription of audio news. Moreover, it features document indexing based on part-of-speech tagging of text coupled with morphological analysis, and query expansion exploiting the Italian WordNet thesaurus. Query-document matching is based on a statistical term weighting scheme. The system was tested on a 203-story collection of audio news, augmented with 9500 newspaper articles. The evaluation was based on a “known item” retrieval task and aimed at evaluating the impact of speech recognition errors and query expansion on retrieval performance.
Referência(s)