Minimum description length principle: generators are preferable to closed patterns

作者: Jian Pei , Guozhu Dong , Jinyan Li , Limsoon Wong , Haiquan Li

DOI:

关键词: Minimum description lengthSet (abstract data type)AlgorithmEquivalence classCardinalityContrast (statistics)Inductive reasoningOrder (group theory)Computer scienceGenerator (mathematics)

摘要: The generators and the unique closed pattern of an equivalence class itemsets share a common set transactions. are minimal ones among equivalent itemsets, while is maximum one. As generator usually smaller than in cardinality, by Minimum Description Length Principle, preferable to inductive inference classification. To efficiently discover frequent from large dataset, we develop depth-first algorithm called Gr-growth. idea novel contrast traditional breadth-first bottom-up generator-mining algorithms. Our extensive performance study shows that Gr-growth significantly faster (an order or even two orders magnitudes when support thresholds low) existing mining It can be also state-of-the-art itemset algorithms such as FPclose CLOSET+.

参考文章(18)
Mohammed Javeed Zaki, Ching-Jiu Hsiao, CHARM : An Efficient Algorithm for Closed Itemset Mining siam international conference on data mining. pp. 457- 473 ,(2002)
Peter D. Grünwald, In Jae Myung, Mark A. Pitt, Advances in Minimum Description Length: Theory and Applications MIT Press. ,(2005)
Viet Phan Luong, The Closed Keys Base of Frequent Itemsets data warehousing and knowledge discovery. pp. 181- 190 ,(2002) , 10.1007/3-540-46145-0_18
Nicolas Pasquier, Yves Bastide, Rafik Taouil, Lotfi Lakhal, Discovering Frequent Closed Itemsets for Association Rules international conference on database theory. ,vol. 1540, pp. 398- 416 ,(1999) , 10.1007/3-540-49257-7_25
Marzena Kryszkiewicz, Henryk Rybiński, Marcin Gajek, Dataless Transitions Between Concise Representations of Frequent Patterns intelligent information systems. ,vol. 22, pp. 41- 70 ,(2004) , 10.1023/A:1025828729955
J. Rissanen, Paper: Modeling by shortest data description Automatica. ,vol. 14, pp. 465- 471 ,(1978) , 10.1016/0005-1098(78)90005-5
Jiawei Han, Jian Pei, Yiwen Yin, Mining frequent patterns without candidate generation international conference on management of data. ,vol. 29, pp. 1- 12 ,(2000) , 10.1145/335191.335372
Feng Pan, Gao Cong, Anthony K. H. Tung, Jiong Yang, Mohammed J. Zaki, Carpenter: finding closed patterns in long biological datasets knowledge discovery and data mining. pp. 637- 642 ,(2003) , 10.1145/956750.956832
Yves Bastide, Rafik Taouil, Nicolas Pasquier, Gerd Stumme, Lotfi Lakhal, Mining frequent patterns with counting inference ACM SIGKDD Explorations Newsletter. ,vol. 2, pp. 66- 75 ,(2000) , 10.1145/380995.381017