Artigo Acesso aberto Revisado por pares

Training and Evaluating a Statistical Part of Speech Tagger for Natural Language Applications using Kepler Workflows

2012; Elsevier BV; Volume: 9; Linguagem: Inglês

10.1016/j.procs.2012.04.174

ISSN

1877-0509

Autores

Doug Briesch, Reginald Hobbs, Claire Jaja, Brian Kjersten, Clare R. Voss,

Tópico(s)

Natural Language Processing Techniques

Resumo

Abstract A core technology of natural language processing (NLP) incorporated into many text processing applications is a part of speech (POS) tagger, a software component that labels words in text with syntactic tags such as noun, verb, adjective, etc. These tags may then be used within more complex tasks such as parsing, question answering, and machine translation (MT). In this paper we describe the phases of our work training and evaluating statistical POS taggers on Arabic texts and their English translations using Kepler workflows. While the original objectives for encapsulating our research code within Kepler workflows were driven by software engineering needs to document and verify the re usability of our software, our research benefitted as well: the ease of rapid retraining and testing enabled our researchers to detect reporting discrepancies, document their source, independently validating the correct results.

Referência(s)
Altmetric
PlumX