作者: Yong Shi , Zhengxin Chen , Yi Peng , Gang Kou
DOI:
关键词: Computation 、 Classification methods 、 Machine learning 、 Training set 、 Data mining 、 Linear programming 、 Computer science 、 Artificial intelligence
摘要: Optimization-based algorithms, such as Multi-Criteria Linear programming (MCLP), have shown their effectiveness in classification. Nevertheless, due to the limitation of computation power and memory, it is difficult apply MCLP, or similar optimization methods, huge datasets. As size today’s databases continuously increasing, highly important that data mining algorithms are able perform functions regardless dataset sizes. The objectives this paper are: (1) propose a new stratified random sampling majority-vote ensemble approach, (2) compare approach with plain MCLP (which uses only part training set), See5 decision-tree-based classification tool designed analyze substantial datasets), on KDD99 KDD2004 results indicate not has potential handle arbitrary-size datasets, but also outperforms achieves comparable accuracy See5.