An Unsupervised Algorithm for Segmenting Categorical Timeseries into Episodes

作者: Paul Cohen , Brent Heeringa , Niall M. Adams

DOI: 10.1007/3-540-45728-3_5

关键词:

摘要: This paper describes an unsupervised algorithm for segmenting categorical time series into episodes. The Voting-Experts first collects statistics about the frequency and boundary entropy of ngrams, then passes a window over has two "expert methods" decide where in boundaries should be drawn. successfully segments text words four languages. also robot sensor data subsequences that represent episodes life robot. We claim VOTING-EXPERTS finds meaningful because it exploits statistical characteristics

参考文章(11)
Minos N. Garofalakis, Kyuseok Shim, Rajeev Rastogi, SPIRIT: Sequential Pattern Mining with Regular Expression Constraints very large data bases. pp. 223- 234 ,(1999)
Craig G Nevill-Manning, Ian H Witten, Compression and Explanation using Hierarchical Grammars The Computer Journal. ,vol. 40, pp. 103- 116 ,(1997) , 10.1093/COMJNL/40.2_AND_3.103
C. G. Nevill-Manning, I. H. Witten, Identifying hierarchical structure in sequences: a linear-time algorithm Journal of Artificial Intelligence Research. ,vol. 7, pp. 67- 82 ,(1997) , 10.1613/JAIR.374
David M. Magerman, Mitchell P. Marcus, Parsing a natural language using mutual information statistics national conference on artificial intelligence. pp. 984- 989 ,(1990)
U.M. Feyyad, Data mining and knowledge discovery: making sense out of data IEEE Intelligent Systems. ,vol. 11, pp. 20- 25 ,(1996) , 10.1109/64.539013
Gary M. Weiss, Haym Hirsh, Learning to predict rare events in event sequences knowledge discovery and data mining. pp. 359- 363 ,(1998)
Heikki Mannila, Hannu Toivonen, A. Inkeri Verkamo, Discovery of Frequent Episodes in Event Sequences Data Mining and Knowledge Discovery. ,vol. 1, pp. 259- 289 ,(1997) , 10.1023/A:1009748302351
W. J. Teahan, Yingying Wen, Rodger McNab, Ian H. Witten, A compression-based algorithm for Chinese word segmentation Computational Linguistics. ,vol. 26, pp. 375- 393 ,(2000) , 10.1162/089120100561746
Lillian Lee, Rie Kubota Ando, Mostly-unsupervised statistical segmentation of Japanese: applications to kanji north american chapter of the association for computational linguistics. pp. 241- 248 ,(2000)