Capítulo de livro Acesso aberto Revisado por pares

Error Annotation of the Arabic Learner Corpus

2013; Springer Science+Business Media; Linguagem: Inglês

10.1007/978-3-642-40722-2_2

ISSN

1611-3349

Autores

Abdullah Alfaifi, Eric Atwell, Ghazi M. Abuhakema,

Tópico(s)

Topic Modeling

Resumo

This paper introduces a new two-level error tagset, AALETA (Alfaifi Atwell Leeds Error Tagset for Arabic), to be used for annotating the Arabic Learner Corpora (ALC). The new tagset includes six broad classes, subdivided into 37 more specific error types or subcategories. It is easily understood by Arabic corpus error annotators. AALEETA is based on an existing error tagset for Arabic corpora, ARIDA, created by Abuhakema et al. [1], and a number of other error-analysis studies. It was used to annotate texts of the Arabic Learner Corpus [2]. The paper shows the tagset broad classes and types or subcategories and an example of annotation. The understandability of AALETA was measured against that of ARIDA, and the preliminary results showed that AALETA achieved a slightly higher score. Annotators reported that they preferred using AALETA over ARIDA.

Referência(s)