作者: Jean-Pierre Colson
DOI:
关键词:
摘要: The automatic extraction of collocations from large corpora or the Internet poses a daunting challenge to computational linguistics. Indeed, previous statistical methods based on bigram have shown their limitations, and there is besides no theoretical consensus extension parametric trigrams higher n-grams. This key issue, because significant n-grams has important implications for computer-aided translation, translation quality assessment, automated text correction, terminology lexicography. paper reports promising results that were obtained by using totally different approach any size. Instead having recourse scores, method testing proximity algorithms corroborate native speaker’s competence about existing collocations. It argued compound phraseology in broad sense can be captured linguistic co-occurrence phenomena. made possible subtle manipulation Application Programming Interface (API) Web search engine, this case Yahoo. algorithm presented here, Proximity Measure (WPR), been tested 4,000 mentioned traditional dictionaries 340,000 extracted 1T ‘Google n-grams’. show precision recall scores superior 0.9.