Mining Data with Rare Events: A Case Study

作者: Maria P. Tzamtzi , Fotis N. Koumboulis

DOI: 10.1109/ICTAI.2007.130

关键词:

摘要: The performance of classification models can be negatively impacted if the data on which they are trained contains very rare events. While recent research has investigated issue class imbalance, few any studies address issues related to handling extreme imbalance (rare events), where minority account for as little 0.1% training data. This work investigates effect dataset size and distribution when examples from rare. In addition, we compare improvement achieved by acquiring additional that applying sampling. Our results demonstrate sampling is effective at alleviating problem

参考文章(25)
Stan Matwin, Miroslav Kubat, Addressing the Curse of Imbalanced Training Sets: One-Sided Selection. international conference on machine learning. pp. 179- 186 ,(1997)
Charles Elkan, The foundations of cost-sensitive learning international joint conference on artificial intelligence. pp. 973- 978 ,(2001)
Mark A. Hall, Ian H. Witten, Eibe Frank, Data Mining: Practical Machine Learning Tools and Techniques ,(1999)
Peter D. Turney, Types of cost in inductive concept learning arXiv: Learning. ,(2000)
Gary M. Weiss, Mining with rarity ACM SIGKDD Explorations Newsletter. ,vol. 6, pp. 7- 19 ,(2004) , 10.1145/1007730.1007734
John T. Wen, Kenneth Kreutz-Delgado, Motion and force control of multiple robotic manipulators Automatica. ,vol. 28, pp. 729- 743 ,(1992) , 10.1016/0005-1098(92)90033-C
Claes Wohlin, Per Runeson, Magnus C. Ohlsson, Martin Höst, Bjöorn Regnell, Anders Wesslén, Experimentation in Software Engineering: An Introduction ,(2011)
Taeho Jo, Nathalie Japkowicz, Class imbalances versus small disjuncts ACM SIGKDD Explorations Newsletter. ,vol. 6, pp. 40- 49 ,(2004) , 10.1145/1007730.1007737
C. L. Blake, UCI Repository of machine learning databases www.ics.uci.edu/〜mlearn/MLRepository.html. ,(1998)
K.G. Tzierakis, F.N. Koumboulis, Independent force and position control for cooperating manipulators Journal of the Franklin Institute. ,vol. 340, pp. 435- 460 ,(2003) , 10.1016/J.JFRANKLIN.2003.10.002