作者: Amitabha Mukerjee , Ankit Soni , Achla M. Raina
关键词: Natural language processing 、 Artificial intelligence 、 Process (engineering) 、 Hindi 、 Parallel corpora 、 Verb 、 Speech recognition 、 Sequence 、 Parsing 、 Computer science 、 Projection (relational algebra) 、 Identification (information)
摘要: Complex Predicates or CPs are multiword complexes functioning as single verbal units. particularly pervasive in Hindi and other Indo-Aryan languages, but an usage account driven by corpus-based identification of these constructs has not been possible since single-language systems based on rules statistical approaches require reliable tools (POS taggers, parsers, etc.) that unavailable for Hindi. This paper highlights the development first such database simple idea projecting POS tags across English-Hindi parallel corpus. The CP types considered include adjective-verb (AV), noun-verb (NV), adverb-verb (Adv-V), verb-verb (VV) composites. hypothesized where a verb English is projected onto multi-word sequence While this process misses some CPs, those detected appear to be more (83% precision, 46% recall). resulting lists instances 1439 4400 sentences.