ACTS: An automatic Chinese text segmentation system for full text retrieval

作者: Zimin Wu , Gwyneth Tseng

DOI: 10.1002/(SICI)1097-4571(199503)46:2<83::AID-ASI2>3.0.CO;2-0

关键词:

摘要: Text segmentation is a prerequisite for text retrieval systems. Chinese texts cannot be readily segmented into words because they do not contain word boundaries. ACTS an automatic proto-type full retrieval. It applies partial syntactic analysis—the analysis of morphemes, words, and phrases. The idea was originally largely inspired by experiments on English morpheme phrase-analysis-based retrieval, which are particularly germane to Chinese, neither nor have phrase built the hypothesis that phrases exceeding two characters can characterized grammar describes concatenation behavior morphological categories their formatives. This examined through three procedures: (1) Segmentation—texts divided one character segments matching against dictionary; (2) Category disambiguation—the determined according context; (3) Parsing—the analyzed based grammar, subsequently combined compound complex indexing experimental results, small sample 30 texts, show most significant in these extracted with high degree accuracy. © 1995 John Wiley & Sons, Inc.

参考文章(0)