Method and apparatus for creating a language model and kana-kanji conversion

作者: Yoshiharu Sato , Miyuki Seki , Maeda Rie

DOI:

关键词:

摘要: Method for creating a language model capable of preventing deterioration quality caused by the conventional back-off to unigram. Parts-of-speech with same display and reading are obtained from storage device (206). A cluster (204) is created combining parts-of-speech. The stored in In addition, when an instruction (214) dividing inputted, (206) divided (210) accordance inputted (212). Two clusters combined (218), probability occurrence text corpus calculated (222). associated bigram indicating into device.

参考文章(20)
Shigeki Umeda, Masayuki Morohasi, Japanese language sentence dividing method and apparatus ,(1989)
Yun-Cheng Ju, Hsiao-Wuen Hon, Multi-modal entry of ideogrammatic languages ,(2002)
Masahiro Wada, Shigeki Kuga, Taro Morishita, Hiroyuki Kanza, System for registering new words by using linguistically comparable reference words ,(1990)
Christopher H. Pratley, Kentaro Urata, Erik J. Rucker, David C. Oliver, Method for converting a phonetic character string into the text of an Asian language ,(1998)
Jianfeng Gao, Hisami Suzuki, Long distance dependency in language modeling: an empirical study international joint conference on natural language processing. pp. 396- 405 ,(2004) , 10.1007/978-3-540-30211-7_42
Lucian Galescu, Eric K. Ringger, Augmented-word language model ,(2002)
J.W. Miller, F. Alleva, Evaluation of a language model using a clustered model backoff international conference on spoken language processing. ,vol. 1, pp. 390- 393 ,(1996) , 10.1109/ICSLP.1996.607136
Jianfeng Gao, Hisami Suzuki, Yang Wen, Exploiting Headword Dependency and Predictive Clustering for Language Modeling empirical methods in natural language processing. pp. 248- 256 ,(2002) , 10.3115/1118693.1118725