Method and system of selecting word sequence for text written in language without word boundary markers

作者: Neng Dai

DOI:

关键词:

摘要: The present disclosure discloses a method and apparatus of selecting word sequence for text written in language without boundary order to solve the problem having excessively large computation load when an optimal existing technologies. disclosed includes: segmenting segment obtain different sequences; determining common performing selection portions sequences prior boundary. Because is performed boundary, shorter independent units can be obtained, thus reducing segmentation.

参考文章(33)
Gareth Loudon, Haizhou Li, Horng Jyh Paul Wu, Shuanhu Bai, System for chinese tokenization and named entity recognition ,(1999)
Yu-Kun Tong, Jinhong Katherine Guo, Mu Li, Yue Ma, Tian-Shun Yao, Jing-Bo Zhu, Post-processing system and method for correcting machine recognized text ,(2003)
Caroline Privault, Herve Poirier, Method and apparatus for recognizing multiword expressions ,(2002)
Dong-Feng Cai, Chang-Ning Huang, Jianfeng Gao, Shuo Di, Lee-Feng Chien, Kai-Fu Lee, Hai-Feng Wang, System and iterative method for lexicon, segmentation and language model joint optimization ,(2000)
Qin Shi, Haixin Chai, Liqin Shen, Method and system for automatically extracting new word ,(2001)
Michael Alan Picheny, Chengjun Julian Chen, Fu-Hua Liu, Automatic segmentation of continuous text using statistical approaches ,(1996)
Xiaoqiang Luo, Robert Todd Ward, Chinese character-based parser ,(2004)