Automatic extraction of collocations: a new Web-based method

作者： Jean-Pierre Colson

DOI:

关键词:

摘要: The automatic extraction of collocations from large corpora or the Internet poses a daunting challenge to computational linguistics. Indeed, previous statistical methods based on bigram have shown their limitations, and there is besides no theoretical consensus extension parametric trigrams higher n-grams. This key issue, because significant n-grams has important implications for computer-aided translation, translation quality assessment, automated text correction, terminology lexicography. paper reports promising results that were obtained by using totally different approach any size. Instead having recourse scores, method testing proximity algorithms corroborate native speaker’s competence about existing collocations. It argued compound phraseology in broad sense can be captured linguistic co-occurrence phenomena. made possible subtle manipulation Application Programming Interface (API) Web search engine, this case Yahoo. algorithm presented here, Proximity Measure (WPR), been tested 4,000 mentioned traditional dictionaries 340,000 extracted 1T ‘Google n-grams’. show precision recall scores superior 0.9.

uclouvain.be 本地加速

univ-paris3.fr PDF 下载加速

参考文章(15)

Marco Baroni, Sabrina Bisi, Using Cooccurrence Statistics and the Web to Discover Synonyms in a Technical Language language resources and evaluation. pp. 1725- 1728 ,(2004)

Marco Baroni, S. Vegnaduzzo, Identifying subjective adjectives through web-based mutual information Proceedings of KONVENS 2004. pp. 17- 24 ,(2004)

Using web data for linguistic purposes Brill Rodopi. pp. 7- 24 ,(2007) , 10.1163/9789401203791_003

Michael Hoey, Patterns of Lexis In Text ,(1991)

Peter D. Turney, Mining the web for synonyms: PMI-IR versus LSA on TOEFL european conference on machine learning. pp. 491- 502 ,(2001) , 10.1007/3-540-44795-4_42

Patrick Hanks, Kenneth Ward Church, Word association norms, mutual information, and lexicography Computational Linguistics. ,vol. 16, pp. 22- 29 ,(1990) , 10.5555/89086.89095

Jean Pierre Colson, Cross-linguistic phraseological studies: An overview. John Benjamins Publishing Company. pp. 191- 206 ,(2008)

Jean-Pierre Colson, The World Wide Web as a corpus for set phrases. ,(2007)

Douglas Biber, Susan Conrad, Viviana Cortes, If you look at …: Lexical Bundles in University Teaching and Textbooks Applied Linguistics. ,vol. 25, pp. 371- 405 ,(2004) , 10.1093/APPLIN/25.3.371

10.

Terence Odlin, John Sinclair, Corpus, Concordance, Collocation The Modern Language Journal. ,vol. 78, pp. 407- ,(1994) , 10.2307/330144

Automatic extraction of collocations: a new Web-based method

来源期刊

我的账户

Automatic extraction of collocations: a new Web-based method

来源期刊

相似文章 4

The contribution of corpus-based phraseology to translation studies : from experiments to theory

Néologie et description linguistique pour le TAL

Lexicométrie pour l'analyse qualitative. Pourquoi et comment résoudre le paradoxe ?

Mining the Web for Collocations: IR Models of Term Associations

我的账户