Using Optimization-Based Classification Method for Massive Datasets

作者： Yong Shi , Zhengxin Chen , Yi Peng , Gang Kou

DOI:

关键词: Computation 、 Classification methods 、 Machine learning 、 Training set 、 Data mining 、 Linear programming 、 Computer science 、 Artificial intelligence

摘要: Optimization-based algorithms, such as Multi-Criteria Linear programming (MCLP), have shown their effectiveness in classification. Nevertheless, due to the limitation of computation power and memory, it is difficult apply MCLP, or similar optimization methods, huge datasets. As size today’s databases continuously increasing, highly important that data mining algorithms are able perform functions regardless dataset sizes. The objectives this paper are: (1) propose a new stratified random sampling majority-vote ensemble approach, (2) compare approach with plain MCLP (which uses only part training set), See5 decision-tree-based classification tool designed analyze substantial datasets), on KDD99 KDD2004 results indicate not has potential handle arbitrary-size datasets, but also outperforms achieves comparable accuracy See5.

uni-trier.de PDF 下载加速

参考文章(11)

Yong Shi, Morgan Wise, Ming Luo, Yachen Lin, Data Mining in Credit Card Portfolio Management: A Multiple Criteria Decision Making Approach multiple criteria decision making. pp. 427- 436 ,(2001) , 10.1007/978-3-642-56680-6_39

Thomas G. Dietterich, Ensemble Methods in Machine Learning Multiple Classifier Systems. pp. 1- 15 ,(2000) , 10.1007/3-540-45014-9_1

Louisa Lam, Classifier Combinations: Implementations and Theoretical Issues multiple classifier systems. pp. 77- 86 ,(2000) , 10.1007/3-540-45014-9_7

Gabriele Zenobi, Pádraig Cunningham, An Approach to Aggregating Ensembles of Lazy Learners That Supports Explanation Lecture Notes in Computer Science. ,vol. 2416, pp. 436- 447 ,(2002) , 10.1007/3-540-46119-1_32

L.I. Kuncheva, Clustering-and-selection model for classifier combination international conference on knowledge based and intelligent information and engineering systems. ,vol. 1, pp. 185- 188 ,(2000) , 10.1109/KES.2000.885788

Richard Maclin, David Opitz, Popular ensemble methods: an empirical study Journal of Artificial Intelligence Research. ,vol. 11, pp. 169- 198 ,(1999) , 10.1613/JAIR.614

P. S. Bradley, Usama M. Fayyad, O. L. Mangasarian, Mathematical Programming for Data Mining: Formulations and Challenges Informs Journal on Computing. ,vol. 11, pp. 217- 238 ,(1999) , 10.1287/IJOC.11.3.217

Eric Bauer, Ron Kohavi, An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants Machine Learning. ,vol. 36, pp. 105- 139 ,(1999) , 10.1023/A:1007515423169

S.J. Stolfo, Wei Fan, Wenke Lee, A. Prodromidis, P.K. Chan, Cost-based modeling for fraud and intrusion detection: results from the JAM project darpa information survivability conference and exposition. ,vol. 2, pp. 130- 144 ,(2000) , 10.1109/DISCEX.2000.821515

10.

B. Parhami, Voting algorithms IEEE Transactions on Reliability. ,(1994)

Using Optimization-Based Classification Method for Massive Datasets

来源期刊

我的账户

Using Optimization-Based Classification Method for Massive Datasets

来源期刊

相似文章 0

我的账户