Detecting complex predicates in Hindi using POS projection across parallel corpora

作者： Amitabha Mukerjee , Ankit Soni , Achla M. Raina

关键词: Natural language processing 、 Artificial intelligence 、 Process (engineering) 、 Hindi 、 Parallel corpora 、 Verb 、 Speech recognition 、 Sequence 、 Parsing 、 Computer science 、 Projection (relational algebra) 、 Identification (information)

摘要: Complex Predicates or CPs are multiword complexes functioning as single verbal units. particularly pervasive in Hindi and other Indo-Aryan languages, but an usage account driven by corpus-based identification of these constructs has not been possible since single-language systems based on rules statistical approaches require reliable tools (POS taggers, parsers, etc.) that unavailable for Hindi. This paper highlights the development first such database simple idea projecting POS tags across English-Hindi parallel corpus. The CP types considered include adjective-verb (AV), noun-verb (NV), adverb-verb (Adv-V), verb-verb (VV) composites. hypothesized where a verb English is projected onto multi-word sequence While this process misses some CPs, those detected appear to be more (83% precision, 46% recall). resulting lists instances 1439 4400 sentences.

参考文章(9)

Manindra K. Verma, Complex predicates in South Asian languages Manohar. ,(1993)

Ivan A. Sag, Timothy Baldwin, Francis Bond, Ann Copestake, Dan Flickinger, Multiword Expressions: A Pain in the Neck for NLP international conference on computational linguistics. pp. 1- 15 ,(2002) , 10.1007/3-540-45715-1_1

Mitch Marcus, Beatrice Santorini, Mary Ann Marcinkiewicz, None, Building a large annotated corpus of English: the penn treebank Computational Linguistics. ,vol. 19, pp. 313- 330 ,(1993) , 10.21236/ADA273556

Franz Josef Och, Hermann Ney, Improved statistical alignment models Proceedings of the 38th Annual Meeting on Association for Computational Linguistics - ACL '00. pp. 440- 447 ,(2000) , 10.3115/1075218.1075274

David Yarowsky, Grace Ngai, None, Inducing multilingual POS taggers and NP bracketers via robust projection across aligned corpora Second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies 2001 - NAACL '01. pp. 1- 8 ,(2001) , 10.3115/1073336.1073362

SantoriniBeatrice, MarcinkiewiczMary Ann, P MarcusMitchell, Building a Large Annotated Corpus of English: The Computational Linguistics. ,(1993) , 10.5555/972470.972475

Dekang Lin, Automatic Identification of Non-compositional Phrases meeting of the association for computational linguistics. pp. 317- 324 ,(1999) , 10.3115/1034678.1034730

Miriam Butt, Wilhelm Geuder, Light verbs in Urdu and grammaticalization Words in Time. pp. 295- 350 ,(2003) , 10.1515/9783110899979.295

Eric Brill, Some Advances in Transformation-Based Part of Speech Tagging arXiv: Computation and Language. ,(1994)

Detecting complex predicates in Hindi using POS projection across parallel corpora

来源期刊

我的账户

Detecting complex predicates in Hindi using POS projection across parallel corpora

来源期刊

相似文章 10

我的账户