A character-net based Chinese text segmentation method

作者: Lixin Zhou , Qun Liu

DOI: 10.3115/1118735.1118752

关键词: Computer scienceKey (cryptography)AmbiguityText segmentationCharacter (computing)Pattern recognitionArtificial intelligenceSegmentationIdentification (information)

摘要: The segmentation of Chinese texts is a key process in information processing. difficulties are the ambiguous character string and unknown words. In order to obtain correct result, first identification all possible candidates words text. this paper, data structure Chinese-character-net put forward, then, based on character-net, new algorithm presented candidate This paper gives experiment result. Finally characteristics analysed.

参考文章(3)
Richard Sproat, Chilin Shih, William Gale, Nancy Chang, A stochastic finite-state word-segmentation algorithm for Chinese Computational Linguistics. ,vol. 22, pp. 377- 404 ,(1996)
Jin Guo, Critical tokenization and its properties Computational Linguistics. ,vol. 23, pp. 569- 596 ,(1997)
Zhou Lixin, Research of segmentation of Chinese texts in Chinese search engine systems man and cybernetics. ,vol. 4, pp. 2627- 2631 ,(2001) , 10.1109/ICSMC.2001.972960