Semantic spaces for improving language modeling

DOI: 10.1016/J.CSL.2013.05.001

关键词:

摘要: Language models are crucial for many tasks in NLP (Natural Processing) and n-grams the best way to build them. Huge effort is being invested improving n-gram language models. By introducing external information (morphology, syntax, partitioning into documents, etc.) a significant improvement can be achieved. The however improved with no smoothing an excellent example of such improvement. In this article we show another that also requires information. We examine patterns found large corpora by building semantic spaces (HAL, COALS, BEAGLE others described article). These have never been tested modeling before. Our method uses clustering classes class-based model. model then coupled standard create very effective experiments our reduce perplexity improve accuracy added. Training fully unsupervised. inflectional languages, which particularly hard results five different settings number classes. tests accompanied machine translation prove ability proposed performance real-world application.

sciencedirect.com 本地加速

uni-trier.de 本地加速

doi.org 本地加速

sci-hub.se PDF 下载加速

参考文章(41)

Philipp Koehn, Europarl: A Parallel Corpus for Statistical Machine Translation ,(2005)

Jianfeng Gao, Jiangbo Miao, Joshua T. Goodman, The Use of Clustering Techniques for Language Modeling--Application to Asian Language International Journal of Computational Linguistics & Chinese Language Processing, Volume 6, Number 1, February 2001: Special Issue on Natural Language Processing Researches in MSRA. ,vol. 6, pp. 27- 60 ,(2001) , 10.30019/IJCLCLP.200102.0002

Bhuvana Ramabhadran, Abhinav Sethy, Hong-Kwang Jeff Kuo, Sangyun Hahn, A study of unsupervised clustering techniques for language modeling. conference of the international speech communication association. pp. 1598- 1601 ,(2008)

George Karypis, CLUTO - A Clustering Toolkit Defense Technical Information Center. ,(2002) , 10.21236/ADA439508

Magnus Sahlgren, Pentti Kanerva, Anders Holst, Permutations as a means to encode order in word space The 30th Annual Meeting of the Cognitive Science Society (CogSci'08), 23-26 July 2008, Washington D.C., USA. ,(2008)

Amruta Purandare, Ted Pedersen, Word Sense Discrimination by Clustering Contexts in Vector and Similarity Spaces conference on computational natural language learning. pp. 41- 48 ,(2004)

Magnus Sahlgren, An Introduction to Random Indexing terminology and knowledge engineering. ,(2005)

Airenas Vaičiūnas, Vytautas Kaminskas, Gailius Raškinis, Statistical Language Models of Lithuanian Based on Word Clustering and Morphological Decomposition Informatica (lithuanian Academy of Sciences). ,vol. 15, pp. 565- 580 ,(2004) , 10.15388/INFORMATICA.2004.079

Thomas Hofmann, Probabilistic latent semantic analysis uncertainty in artificial intelligence. ,vol. 15, pp. 289- 296 ,(1999)

10.

Daniel Gildea, Thomas Hofmann, Topic-based language models using EM. conference of the international speech communication association. pp. 2167- 2170 ,(1999)

Semantic spaces for improving language modeling

来源期刊

我的账户

Semantic spaces for improving language modeling

来源期刊

相似文章 10

我的账户