作者: Tomáš Brychcín , Miloslav Konopík
DOI: 10.1016/J.CSL.2013.05.001
关键词:
摘要: Language models are crucial for many tasks in NLP (Natural Processing) and n-grams the best way to build them. Huge effort is being invested improving n-gram language models. By introducing external information (morphology, syntax, partitioning into documents, etc.) a significant improvement can be achieved. The however improved with no smoothing an excellent example of such improvement. In this article we show another that also requires information. We examine patterns found large corpora by building semantic spaces (HAL, COALS, BEAGLE others described article). These have never been tested modeling before. Our method uses clustering classes class-based model. model then coupled standard create very effective experiments our reduce perplexity improve accuracy added. Training fully unsupervised. inflectional languages, which particularly hard results five different settings number classes. tests accompanied machine translation prove ability proposed performance real-world application.