Automatic extraction of collocations: a new Web-based method

作者: Jean-Pierre Colson

DOI:

关键词:

摘要: The automatic extraction of collocations from large corpora or the Internet poses a daunting challenge to computational linguistics. Indeed, previous statistical methods based on bigram have shown their limitations, and there is besides no theoretical consensus extension parametric trigrams higher n-grams. This key issue, because significant n-grams has important implications for computer-aided translation, translation quality assessment, automated text correction, terminology lexicography. paper reports promising results that were obtained by using totally different approach any size. Instead having recourse scores, method testing proximity algorithms corroborate native speaker’s competence about existing collocations. It argued compound phraseology in broad sense can be captured linguistic co-occurrence phenomena. made possible subtle manipulation Application Programming Interface (API) Web search engine, this case Yahoo. algorithm presented here, Proximity Measure (WPR), been tested 4,000 mentioned traditional dictionaries 340,000 extracted 1T ‘Google n-grams’. show precision recall scores superior 0.9.

参考文章(15)
Marco Baroni, Sabrina Bisi, Using Cooccurrence Statistics and the Web to Discover Synonyms in a Technical Language language resources and evaluation. pp. 1725- 1728 ,(2004)
Marco Baroni, S. Vegnaduzzo, Identifying subjective adjectives through web-based mutual information Proceedings of KONVENS 2004. pp. 17- 24 ,(2004)
Michael Hoey, Patterns of Lexis In Text ,(1991)
Peter D. Turney, Mining the web for synonyms: PMI-IR versus LSA on TOEFL european conference on machine learning. pp. 491- 502 ,(2001) , 10.1007/3-540-44795-4_42
Patrick Hanks, Kenneth Ward Church, Word association norms, mutual information, and lexicography Computational Linguistics. ,vol. 16, pp. 22- 29 ,(1990) , 10.5555/89086.89095
Jean Pierre Colson, Cross-linguistic phraseological studies: An overview. John Benjamins Publishing Company. pp. 191- 206 ,(2008)
Douglas Biber, Susan Conrad, Viviana Cortes, If you look at …: Lexical Bundles in University Teaching and Textbooks Applied Linguistics. ,vol. 25, pp. 371- 405 ,(2004) , 10.1093/APPLIN/25.3.371
Terence Odlin, John Sinclair, Corpus, Concordance, Collocation The Modern Language Journal. ,vol. 78, pp. 407- ,(1994) , 10.2307/330144