Using Optimization-Based Classification Method for Massive Datasets

作者: Yong Shi , Zhengxin Chen , Yi Peng , Gang Kou

DOI:

关键词: ComputationClassification methodsMachine learningTraining setData miningLinear programmingComputer scienceArtificial intelligence

摘要: Optimization-based algorithms, such as Multi-Criteria Linear programming (MCLP), have shown their effectiveness in classification. Nevertheless, due to the limitation of computation power and memory, it is difficult apply MCLP, or similar optimization methods, huge datasets. As size today’s databases continuously increasing, highly important that data mining algorithms are able perform functions regardless dataset sizes. The objectives this paper are: (1) propose a new stratified random sampling majority-vote ensemble approach, (2) compare approach with plain MCLP (which uses only part training set), See5 decision-tree-based classification tool designed analyze substantial datasets), on KDD99 KDD2004 results indicate not has potential handle arbitrary-size datasets, but also outperforms achieves comparable accuracy See5.

参考文章(11)
Yong Shi, Morgan Wise, Ming Luo, Yachen Lin, Data Mining in Credit Card Portfolio Management: A Multiple Criteria Decision Making Approach multiple criteria decision making. pp. 427- 436 ,(2001) , 10.1007/978-3-642-56680-6_39
Thomas G. Dietterich, Ensemble Methods in Machine Learning Multiple Classifier Systems. pp. 1- 15 ,(2000) , 10.1007/3-540-45014-9_1
Louisa Lam, Classifier Combinations: Implementations and Theoretical Issues multiple classifier systems. pp. 77- 86 ,(2000) , 10.1007/3-540-45014-9_7
Gabriele Zenobi, Pádraig Cunningham, An Approach to Aggregating Ensembles of Lazy Learners That Supports Explanation Lecture Notes in Computer Science. ,vol. 2416, pp. 436- 447 ,(2002) , 10.1007/3-540-46119-1_32
L.I. Kuncheva, Clustering-and-selection model for classifier combination international conference on knowledge based and intelligent information and engineering systems. ,vol. 1, pp. 185- 188 ,(2000) , 10.1109/KES.2000.885788
Richard Maclin, David Opitz, Popular ensemble methods: an empirical study Journal of Artificial Intelligence Research. ,vol. 11, pp. 169- 198 ,(1999) , 10.1613/JAIR.614
P. S. Bradley, Usama M. Fayyad, O. L. Mangasarian, Mathematical Programming for Data Mining: Formulations and Challenges Informs Journal on Computing. ,vol. 11, pp. 217- 238 ,(1999) , 10.1287/IJOC.11.3.217
S.J. Stolfo, Wei Fan, Wenke Lee, A. Prodromidis, P.K. Chan, Cost-based modeling for fraud and intrusion detection: results from the JAM project darpa information survivability conference and exposition. ,vol. 2, pp. 130- 144 ,(2000) , 10.1109/DISCEX.2000.821515
B. Parhami, Voting algorithms IEEE Transactions on Reliability. ,(1994)