Mining Association Rules of Simple Conjunctive Queries

作者: Heikki Mannila , Bart Goethals , Wim Le Page

DOI:

关键词:

摘要: The discovery of recurring patterns in databases is one the main topics data mining and many efficient solutions have been developed for relatively simple classes collections. Indeed, most frequent pattern or association rule algorithms work on so called transaction databases. Not only itemsets, but also more complex such as trees, graphs, arbitrary relational structures, consisting a set transactions are used. For example, tree case [2], every database contains tree, presented algorithm tries to find all subtrees occurring within transactions. these classes, specialized exist discover them efficiently. motivation works potentially high business value discovered [1]. Unfortunately, not suited be converted into transactional format even if this would possible, lot information implicitly encoded model lost after conversion. In talk we consider by combining pairs queries which could reveal interesting properties database. Intuitively, pose two that second query specific than first query. Then, number tuples output both almost same, discovery. To illustrate, well known Internet Movie Database containing possible about movies, actors everything related that, following queries: first, ask starred movie genre ‘drama’; then, ‘drama’, (possibly different) ‘comedy’. Now suppose answer consists 1000 actors, 900 actors. Obviously, answers do necessarily any significant insights themselves, when combined, it reveals starring ‘drama’ movies typically (with probability 90%) star ‘comedy’ movie. Of course, found preprocessing database, creating each actor genres he she appeared in. Similarly, like: 77% Ben Affleck, Matt Damon, posing asking Affleck Damon. Again, using methods, time, should differently preprocessed order pattern. Furthermore, impossible preprocess once way above they essentially counting different type example. general, looking Q1, Q2, Q1 asks satisfying certain condition Q2 those condition. When turns out size close learned actually satisfy condition, specified Q2. Clearly, findings given Towards goal, new class conjunctive over databases, define associations notion containment. We propose an completely novel algorithm, Conqueror, efficiently generating pruning search space queries. illustrate next kinds patterns, our able functional dependencies, inclusion their variants, very recently studied conditional turn useful cleaning purposes.

参考文章(16)
Heikki Mannila, A. Inkeri Verkamo, Hannu Toivonen, Efficient algorithms for discovering association rules knowledge discovery and data mining. pp. 181- 192 ,(1994)
Victor Vianu, Serge Abiteboul, Richard Hull, Foundations of databases ,(1994)
Akihiro Inokuchi, Takashi Washio, Hiroshi Motoda, An Apriori-Based Algorithm for Mining Frequent Substructures from Graph Data european conference on principles of data mining and knowledge discovery. pp. 13- 23 ,(2000) , 10.1007/3-540-45372-5_2
Ashok K. Chandra, Philip M. Merlin, Optimal implementation of conjunctive queries in relational data bases symposium on the theory of computing. pp. 77- 90 ,(1977) , 10.1145/800105.803397
Jyrki Kivinen, Heikki Mannila, Approximate inference of functional dependencies from relations international conference on database theory. ,vol. 149, pp. 129- 149 ,(1995) , 10.1016/0304-3975(95)00028-U
Bart Goethals, Jan Van den Bussche, Relational Association Rules: Getting Warmer Lecture Notes in Computer Science. ,vol. 2447, pp. 125- 139 ,(2002) , 10.1007/3-540-45728-3_10
Mohammed J. Zaki, Efficiently mining frequent trees in a forest Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '02. pp. 71- 80 ,(2002) , 10.1145/775047.775058
Luc Dehaspe, Hannu Toivonen, Discovery of frequent DATALOG patterns Data Mining and Knowledge Discovery. ,vol. 3, pp. 7- 36 ,(1999) , 10.1023/A:1009863704807
Heikki Mannila, Hannu Toivonen, Levelwise Search and Borders of Theories in KnowledgeDiscovery Data Mining and Knowledge Discovery. ,vol. 1, pp. 241- 258 ,(1997) , 10.1023/A:1009796218281
M. Kuramochi, G. Karypis, Frequent subgraph discovery international conference on data mining. pp. 313- 320 ,(2001) , 10.1109/ICDM.2001.989534