作者: J. Ponte
DOI:
关键词: Task (project management) 、 Delimiter 、 Search engine indexing 、 Automaton 、 Segmentation 、 Information retrieval 、 Text segmentation 、 Computer science 、 Natural language processing 、 Human–computer information retrieval 、 Visual Word 、 Artificial intelligence
摘要: Many languages, such as Chinese, are written without interword delimiters. For these a segmenter is required pre-processing step for information retrieval systems. We describe USeg, platform word segmentation designed to fulfill the requirments imposed by task. USeg based on an underlying probabalistic automaton which serves simple language model. A description of proposed model(s), implementation issues models and experimental results presented. The experiments show that fairly model can produce reasonable results, do so quickly enough be useful indexing in system re-targeted new languages great deal human effort.