USe: A Retargetable Word Segmentation Procedure for Information Retrieval

作者: J. Ponte

DOI:

关键词: Task (project management)DelimiterSearch engine indexingAutomatonSegmentationInformation retrievalText segmentationComputer scienceNatural language processingHuman–computer information retrievalVisual WordArtificial intelligence

摘要: Many languages, such as Chinese, are written without interword delimiters. For these a segmenter is required pre-processing step for information retrieval systems. We describe USeg, platform word segmentation designed to fulfill the requirments imposed by task. USeg based on an underlying probabalistic automaton which serves simple language model. A description of proposed model(s), implementation issues models and experimental results presented. The experiments show that fairly model can produce reasonable results, do so quickly enough be useful indexing in system re-targeted new languages great deal human effort.

参考文章(0)