Error Annotation of the Arabic Learner Corpus
2013; Springer Science+Business Media; Linguagem: Inglês
10.1007/978-3-642-40722-2_2
ISSN1611-3349
AutoresAbdullah Alfaifi, Eric Atwell, Ghazi M. Abuhakema,
Tópico(s)Topic Modeling
ResumoThis paper introduces a new two-level error tagset, AALETA (Alfaifi Atwell Leeds Error Tagset for Arabic), to be used for annotating the Arabic Learner Corpora (ALC). The new tagset includes six broad classes, subdivided into 37 more specific error types or subcategories. It is easily understood by Arabic corpus error annotators. AALEETA is based on an existing error tagset for Arabic corpora, ARIDA, created by Abuhakema et al. [1], and a number of other error-analysis studies. It was used to annotate texts of the Arabic Learner Corpus [2]. The paper shows the tagset broad classes and types or subcategories and an example of annotation. The understandability of AALETA was measured against that of ARIDA, and the preliminary results showed that AALETA achieved a slightly higher score. Annotators reported that they preferred using AALETA over ARIDA.
Referência(s)