Improving query translation for cross-language information retrieval using statistical models

作者: Jianfeng Gao , Jian-Yun Nie , Endong Xun , Jian Zhang , Ming Zhou

DOI: 10.1145/383952.383966

关键词: Noun phraseSynchronous context-free grammarEvaluation of machine translationArtificial intelligenceMachine translation software usabilityMachine translationNatural language processingTransfer-based machine translationRule-based machine translationInformation retrievalExample-based machine translationComputer sciencePhraseComputer-assisted translationCross-language information retrievalQuery expansion

摘要: Dictionaries have often been used for query translation in cross-language information retrieval (CLIR). However, we are faced with the problem of ambiguity, i.e. multiple translations stored a dictionary word. In addition, word-by-word is not precise enough. this paper, explore several methods to improve previous dictionary-based translation. First, as many possible, noun phrases recognized and translated whole by using statistical models phrase patterns. Second, best word selected based on cohesion words. Our experimental results TREC English-Chinese CLIR collection show that these techniques result significant improvements over simple approaches, achieve even better performance than high-quality machine system.

参考文章(19)
Jianfeng Gao, Hai-Feng Wang, Mingjing Li, Kai-Fu Lee, A unified approach to statistical language modeling for Chinese international conference on acoustics, speech, and signal processing. ,vol. 3, pp. 1703- 1706 ,(2000) , 10.1109/ICASSP.2000.862079
Mark W. Davis, William C. Ogden, Free resources and advanced alignment for Cross-Language Text Retrieval text retrieval conference. pp. 385- 394 ,(1997)
L. A. Ramshaw, M. P. Marcus, Text Chunking Using Transformation-Based Learning meeting of the association for computational linguistics. pp. 157- 176 ,(1999) , 10.1007/978-94-017-2390-9_10
Mitch Marcus, Beatrice Santorini, Mary Ann Marcinkiewicz, None, Building a large annotated corpus of English: the penn treebank Computational Linguistics. ,vol. 19, pp. 313- 330 ,(1993) , 10.21236/ADA273556
Chris Buckley, Implementation of the SMART Information Retrieval System Cornell University. ,(1985)
K. L. Kwok, Exploiting a Chinese-English bilingual wordlist for English-Chinese cross language information retrieval Proceedings of the fifth international workshop on on Information retrieval with Asian languages - IRAL '00. pp. 173- 179 ,(2000) , 10.1145/355214.355239
Vincent J. Della Pietra, Stephen A. Della Pietra, Robert L. Mercer, Peter F. Brown, The mathematics of statistical machine translation: parameter estimation Computational Linguistics. ,vol. 19, pp. 263- 311 ,(1993)