Multiword expressions: hard going or plain sailing?

作者: Paul Rayson , Scott Piao , Serge Sharoff , Stefan Evert , Begoña Villada Moirón

DOI: 10.1007/S10579-009-9105-0

关键词: SemanticsTerm (time)Computational linguisticsMultiword expressionChenSpeech recognitionInterpretation (logic)Computer scienceNoun compoundsPhraseologySubject (grammar)Linguistics

摘要: Over the past two decades or so, Multi-Word Expressions (MWEs; also called Multi-word Units) have been an increasingly important concern for Computational Linguistics and Natural Language Processing (NLP). The term MWE has used to refer various types of linguistic units expressions, including idioms, noun compounds, phrasal verbs, light verbs other habitual collocations. However, while there is no universally agreed definition as yet, most researchers use those frequently occurring which are subject certain level semantic opaqueness, non-compositionality. Non-compositional MWEs pose tough challenges automatic analysis because their interpretation cannot be achieved by directly combining semantics constituents, thereby causing "pain in neck NLP" (Sag et al. 2001). In fact, studied Phraseology under phraseological unit. But early 1990s, started receiving increasing attention corpus-based computational linguistics NLP. Early influential work on includes Smadja (1993), Dagan Church (1994), Wu (1997), Daille (1995), Wermter Chen McEnery Michiels Dufour (1998). These studies address treatment applications practical NLP information systems. A milestone

参考文章(13)
S. S. L. Piao, R. Garside, P. Rayson, O. Mudraya, D. Archer, Andrew Wilson, A. M. McEnery, A large semantic lexicon for corpus annotation. ,(2005)
B. Babych, S. Piao, P. Rayson, O. Mudraya, A. Wilson, Developing a Russian semantic tagger for automatic semantic annotation ,(2006)
T McEnery, P Rayson, DE Archer, SL Piao, The UCREL Semantic Analysis System European Language Resources Association. ,(2004)
Ivan A. Sag, Timothy Baldwin, Francis Bond, Ann Copestake, Dan Flickinger, Multiword Expressions: A Pain in the Neck for NLP international conference on computational linguistics. pp. 1- 15 ,(2002) , 10.1007/3-540-45715-1_1
Roger Garside, Geoffrey N. Leech, Anthony M. McEnery, Corpus Annotation: Linguistic Information from Computer Text Corpora ,(1997)
S. Piao, P. Rayson, J.P. Juntunen, L. Löfberg, K. Varantola, A. Nykanen, A semantic tagger for the Finnish language ,(2005)
Amela Ćurković, Eva Lokotar, Klara Bilić Meštrić, Sylviane Granger and Fanny Meunier, eds. 2008. Phraseology. An Interdisciplinary Perspective. Amsterdam – Philadelphia: John Benjamins. Jezikoslovlje. ,vol. 11, pp. 101- 115 ,(2010)
Frank Smadja, Retrieving collocations from text: Xtract Computational Linguistics. ,vol. 19, pp. 143- 177 ,(1993)
Ido Dagan, Ken Church, Termight: Identifying and Translating Technical Terminology conference on applied natural language processing. pp. 34- 40 ,(1994) , 10.3115/974358.974367