Statistical significance of episodes with general partial orders

作者: Avinash Achar , P.S. Sastry

DOI: 10.1016/J.INS.2014.09.063

关键词: Class (philosophy)Node (circuits)StatisticsPartially ordered setStatistical hypothesis testingEvent typeMathematicsInjective functionStatistical significancePruning (decision trees)

摘要: Frequent episode discovery is one of the methods used for temporal pattern in sequential data. An a partially ordered set nodes with each node associated an event type. For more than decade, algorithms existed only when partial order total (serial episode) or trivial (parallel episode). Recently, literature has seen discovering episodes general orders. In frequent mining, threshold beyond which inferred to be interesting typically user-defined and arbitrary. One way addressing this issue mining been based on framework statistical hypothesis testing. This paper presents method assessing significance patterns A proposed calculate thresholds, non-overlapped frequency, would statistically significant. The first explained case injective where event-types are not allowed repeat. Later it pointed out how can extended class all episodes. calculations here also generalize existing results serial Through simulations studies, usefulness these thresholds pruning uninteresting illustrated.

参考文章(46)
Honghua Dai, Min Gan, An efficient one-pass method for discovering bases of recently frequent episodes over online data streams International Journal of Innovative Computing Information and Control. ,vol. 8, pp. 4675- 4690 ,(2012)
Jeffrey J Hunter, Mathematical techniques of applied probability Published in <b>1983</b> - <b>9999</b> in New York (N.Y.) by Academic press. ,(1983)
Honghua Dai, Min Gan, MINING CONDENSED SETS OF FREQUENT EPISODES WITH MORE ACCURATE FREQUENCIES FROM COMPLEX SEQUENCES International Journal of Innovative Computing Information and Control. ,vol. 8, pp. 453- 470 ,(2012)
Mikhail J. Atallah, Wojciech Szpankowski, Robert Gwadera, Markov Models for Identification of Significant Episodes. siam international conference on data mining. pp. 404- 414 ,(2005)
Avinash Achar, Ibrahim A, P.S. Sastry, Pattern-growth based frequent serial episode discovery Data & Knowledge Engineering. ,vol. 87, pp. 91- 108 ,(2013) , 10.1016/J.DATAK.2013.06.005
Anny Ng, Ada Wai-Chee Fu, None, Mining frequent episodes for relating financial events and stock trends knowledge discovery and data mining. pp. 27- 39 ,(2003) , 10.5555/1760894.1760900
M. Atallah, R. Gwadera, W. Szpankowski, Detection of significant sets of episodes in event sequences international conference on data mining. pp. 3- 10 ,(2004) , 10.1109/ICDM.2004.10090
Nicolas Méger, Christophe Rigotti, Constraint-based mining of episode rules and optimal window sizes european conference on principles of data mining and knowledge discovery. pp. 313- 324 ,(2004) , 10.1007/978-3-540-30116-5_30
Gemma Casas-Garriga, Discovering Unbounded Episodes in Sequential Data european conference on principles of data mining and knowledge discovery. pp. 83- 94 ,(2003) , 10.1007/978-3-540-39804-2_10