Training and Evaluating a Statistical Part of Speech Tagger for Natural Language Applications using Kepler Workflows
2012; Elsevier BV; Volume: 9; Linguagem: Inglês
10.1016/j.procs.2012.04.174
ISSN1877-0509
AutoresDoug Briesch, Reginald Hobbs, Claire Jaja, Brian Kjersten, Clare R. Voss,
Tópico(s)Natural Language Processing Techniques
ResumoAbstract A core technology of natural language processing (NLP) incorporated into many text processing applications is a part of speech (POS) tagger, a software component that labels words in text with syntactic tags such as noun, verb, adjective, etc. These tags may then be used within more complex tasks such as parsing, question answering, and machine translation (MT). In this paper we describe the phases of our work training and evaluating statistical POS taggers on Arabic texts and their English translations using Kepler workflows. While the original objectives for encapsulating our research code within Kepler workflows were driven by software engineering needs to document and verify the re usability of our software, our research benefitted as well: the ease of rapid retraining and testing enabled our researchers to detect reporting discrepancies, document their source, independently validating the correct results.
Referência(s)