Addressing polysemy in bilingual lexicon extraction from comparable corpora

作者: Nikola Ljubeši'c , Darja Fišer , Ozren Kubelka

DOI:

关键词:

摘要: This paper presents an approach to extract translation equivalents from comparable corpora for polysemous nouns. As opposed the standard approaches that build a single context vector all occurrences of given headword, we first disambiguate headword with third-party sense taggers and then separate each headword. Since state-of-the-art word disambiguation tools are still far perfect, also tried improve results by combining assignments provided two different taggers. Evaluation shows outperform baseline (0.473) in settings experimented with, even when using only one tagger, best-performing indeed obtained taking into account intersection both (0.720).

参考文章(11)
Špela Vintar, Nikola Ljubešić, Darja Fišer, Senja Pollak, Building and Using Comparable Corpora for Domain-Specific Bilingual Lexicon Extraction meeting of the association for computational linguistics. pp. 19- 26 ,(2011)
P. Fung, Statistical View on Bilingual Lexicon Extraction : From Parallel Corpora to Non-parallel Corpora Lecture Notes in Artificial Intelligence. ,vol. 1529, pp. 1- 17 ,(1998)
Nikola Ljubešić, Tomaž Erjavec, hrWaC and slWac: compiling web corpora for Croatian and Slovene text speech and dialogue. pp. 395- 402 ,(2011) , 10.1007/978-3-642-23538-2_50
Nikola Ljubešić, Darja Fišer, Bootstrapping bilingual lexicons from comparable corpora for closely related languages text speech and dialogue. pp. 91- 98 ,(2011) , 10.1007/978-3-642-23538-2_12
Ted Pedersen, Varada Kolhatkar, WordNet::SenseRelate::AllWords - A Broad Coverage Word Sense Tagger that Maximizes Semantic Relatedness north american chapter of the association for computational linguistics. pp. 17- 20 ,(2009) , 10.3115/1620959.1620964
Patrick Pantel, Dekang Lin, Discovering word senses from text Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '02. pp. 613- 619 ,(2002) , 10.1145/775047.775138
Eneko Agirre, Aitor Soroa, Personalizing PageRank for Word Sense Disambiguation meeting of the association for computational linguistics. pp. 33- 41 ,(2009) , 10.3115/1609067.1609070
Hiroyuki Kaji, Word sense acquisition from bilingual comparable corpora Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - NAACL '03. pp. 32- 39 ,(2003) , 10.3115/1073445.1073460
Reinhard Rapp, Automatic Identification of Word Translations from Unrelated English and German Corpora meeting of the association for computational linguistics. pp. 519- 526 ,(1999) , 10.3115/1034678.1034756
Roberto Navigli, Meaningful Clustering of Senses Helps Boost Word Sense Disambiguation Performance meeting of the association for computational linguistics. pp. 105- 112 ,(2006) , 10.3115/1220175.1220189