Scoring the Data Using Association Rules

作者: Bing Liu , Yiming Ma , Ching Kian Wong , Philip S. Yu

DOI: 10.1023/A:1021931008240

关键词:

摘要: In many data mining applications, the objective is to select cases of a target class. For example, in direct marketing, marketers want likely buyers particular product for promotion. such it often too difficult predict who will definitely be class (e.g., buyer class) because used modeling very noisy and has highly imbalanced distribution. Traditionally, classification systems are solve this problem. Instead classifying each case definite or non-buyer), system modified produce probability estimate (or score) indicate likelihood that belongs class). However, existing only aim find subset regularities rules exist data. This gives partial picture domain. paper, we show selection problem can mapped association rule provide more powerful solution Since aims all data, thus able give complete underlying relationships The set enables us assign accurate case. paper proposes an effective efficient technique compute estimates using rules. Experiment results public domain real-life application general new performs markedly better than state-of-the-art C4.5, boosted Naive Bayesian system.

参考文章(39)
Stan Matwin, Miroslav Kubat, Addressing the Curse of Imbalanced Training Sets: One-Sided Selection. international conference on machine learning. pp. 179- 186 ,(1997)
Charles X. Ling, Chenghui Li, Data mining for direct marketing: problems and solutions knowledge discovery and data mining. pp. 73- 79 ,(1998)
Salvatore J. Stolfo, Philip K. Chan, Toward scalable learning with non-uniform class and cost distributions: a case study in credit card fraud detection knowledge discovery and data mining. pp. 164- 168 ,(1998)
John C. Shafer, Rakesh Agrawal, Manish Mehta, SPRINT: A Scalable Parallel Classifier for Data Mining very large data bases. pp. 544- 555 ,(1996)
Ramakrishnan Srikant, Rakesh Agrawal, Fast algorithms for mining association rules very large data bases. pp. 580- 592 ,(1998)
Ramakrishnan Srikant, Rakesh Agrawal, Fast Algorithms for Mining Association Rules in Large Databases very large data bases. pp. 487- 499 ,(1994)
Jiawei Han, Yongjian Fu, Discovery of Multiple-Level Association Rules from Large Databases very large data bases. pp. 420- 431 ,(1995)
Keki B. Irani, Usama M. Fayyad, Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning international joint conference on artificial intelligence. ,vol. 2, pp. 1022- 1027 ,(1993)
Michael Pazzani, Christopher Merz, Patrick Murphy, Kamal Ali, Timothy Hume, Clifford Brunk, Reducing Misclassification Costs Machine Learning Proceedings 1994. pp. 217- 225 ,(1994) , 10.1016/B978-1-55860-335-6.50034-9
Hannu Toivonen, Sampling Large Databases for Association Rules very large data bases. pp. 134- 145 ,(1996)