Capítulo de livro Revisado por pares

iDocument: Using Ontologies for Extracting and Annotating Information from Unstructured Text

2009; Springer Science+Business Media; Linguagem: Inglês

10.1007/978-3-642-04617-9_32

ISSN

1611-3349

Autores

Benjamin Adrian, J.J. van Hees, Ludger van Elst, Andreas Dengel,

Tópico(s)

Topic Modeling

Resumo

Due to the huge amount of text data in the WWW, annotating unstructured text with semantic markup is a crucial topic in Semantic Web research. This work formally analyzes the incorporation of domain ontologies into information extraction tasks in iDocument. Ontology-based information extraction exploits domain ontologies with formalized and structured domain knowledge for extracting domain-relevant information from un-annotated and unstructured text. iDocument provides a pipeline architecture, an extraction template interface and the ability of exchanging domain ontologies for performing information extraction tasks. This work outlines iDocument's ontology-based architecture, the use of SPARQL queries as extraction templates and an evaluation of iDocument in an automatic document annotation scenario.

Referência(s)