Translating noun compounds using semantic relations
2014; Elsevier BV; Volume: 32; Issue: 1 Linguagem: Inglês
10.1016/j.csl.2014.09.007
ISSN1095-8363
AutoresRenu Balyan, Niladri Chatterjee,
Tópico(s)Text Readability and Simplification
ResumoDespite having a research history of more than 20 years, English to Hindi machine translation often suffers badly from incorrect translations of noun compounds. The problems envisaged can be of various types, such as, the absence of proper postpositions, inappropriate word order, incorrect semantics. Different existing English to Hindi machine translation systems show their vulnerability, irrespective of the underlying technique. A potential solution to this problem lies in understanding the semantics of the noun compounds. The present paper proposes a scheme based on semantic relations to address this issue. The scheme works in three steps: identification of the noun compounds in a given text, determination of the semantic relationship(s) between them, and finally, selecting the right translation pattern. The scheme provides translation patterns for different semantic relations for 2-word noun compounds first. These patterns are used recursively to find the semantic relations and the translation patterns for 3-word and 4-word noun compounds. Frequency and probability based adjacency and dependency models are used for bracketing (grouping) the constituent words of 3-word and 4-word noun compounds into 2-word noun compounds. The semantic relations and the translation patterns generated for 2-word, 3-word and 4-word noun compounds are evaluated. The proposed scheme is compared with some well-known English to Hindi translators, viz. AnglaMT, Anuvadaksh, Bing, Google, and also with the Moses baseline system. The results obtained, show significant improvement over the Moses baseline system. Also, it performs better than the other online MT systems in terms of recall and precision.
Referência(s)