作者: Heikki Mannila , Bart Goethals , Wim Le Page
DOI:
关键词:
摘要: The discovery of recurring patterns in databases is one the main topics data mining and many efficient solutions have been developed for relatively simple classes collections. Indeed, most frequent pattern or association rule algorithms work on so called transaction databases. Not only itemsets, but also more complex such as trees, graphs, arbitrary relational structures, consisting a set transactions are used. For example, tree case [2], every database contains tree, presented algorithm tries to find all subtrees occurring within transactions. these classes, specialized exist discover them efficiently. motivation works potentially high business value discovered [1]. Unfortunately, not suited be converted into transactional format even if this would possible, lot information implicitly encoded model lost after conversion. In talk we consider by combining pairs queries which could reveal interesting properties database. Intuitively, pose two that second query specific than first query. Then, number tuples output both almost same, discovery. To illustrate, well known Internet Movie Database containing possible about movies, actors everything related that, following queries: first, ask starred movie genre ‘drama’; then, ‘drama’, (possibly different) ‘comedy’. Now suppose answer consists 1000 actors, 900 actors. Obviously, answers do necessarily any significant insights themselves, when combined, it reveals starring ‘drama’ movies typically (with probability 90%) star ‘comedy’ movie. Of course, found preprocessing database, creating each actor genres he she appeared in. Similarly, like: 77% Ben Affleck, Matt Damon, posing asking Affleck Damon. Again, using methods, time, should differently preprocessed order pattern. Furthermore, impossible preprocess once way above they essentially counting different type example. general, looking Q1, Q2, Q1 asks satisfying certain condition Q2 those condition. When turns out size close learned actually satisfy condition, specified Q2. Clearly, findings given Towards goal, new class conjunctive over databases, define associations notion containment. We propose an completely novel algorithm, Conqueror, efficiently generating pruning search space queries. illustrate next kinds patterns, our able functional dependencies, inclusion their variants, very recently studied conditional turn useful cleaning purposes.