On classifier behavior in the presence of mislabeling noise

作者: Katsiaryna Mirylenka , George Giannakopoulos , Le Minh Do , Themis Palpanas

DOI: 10.1007/S10618-016-0484-8

关键词:

摘要: Machine learning algorithms perform differently in settings with varying levels of training set mislabeling noise. Therefore, the choice right algorithm for a particular problem is crucial. The contribution this paper towards two, dual problems: first, comparing behavior; and second, choosing noisy settings. We present "sigmoid rule" framework, which can be used to choose most appropriate depending on properties noise classification problem. framework uses an existing model expected performance as sigmoid function signal-to-noise ratio instances. study characteristics using five representative non-sequential classifiers, namely, Naive Bayes, kNN, SVM, decision tree classifier, rule-based three widely sequential classifiers based hidden Markov models, conditional random fields recursive neural networks. Based parameters we define intuitive criteria that are useful behavior presence Furthermore, show there connection between these underlying dataset, showing estimate over dataset regardless algorithm. applicable concept drift scenarios, including modeling user time, mining time series evolving nature.

参考文章(57)
Tony Martinez, Christophe Giraud-Carrier, Michael R. Smith, Logan Mitchell, Recommending Learning Algorithms and Their Associated Hyperparameters arXiv: Learning. ,(2014)
Sergios Theodoridis, Konstantinos Koutroumbas, Pattern Recognition, Third Edition Academic Press, Inc.. ,(2006)
Kjell Johnson, Max Kuhn, Applied Predictive Modeling ,(2013)
Luís PF Garcia, André CPLF de Carvalho, Ana C Lorena, None, Noise detection in the meta-learning level Neurocomputing. ,vol. 176, pp. 14- 25 ,(2016) , 10.1016/J.NEUCOM.2014.12.100
David Haussler, Probably approximately correct learning national conference on artificial intelligence. pp. 1101- 1108 ,(1990)
Mykoa Pechenizkiy, Predictive analytics on evolving data streams anticipating and adapting to changes in known and unknown contexts international conference on high performance computing and simulation. pp. 658- 659 ,(2015) , 10.1109/HPCSIM.2015.7237112
Gerhard Widmer, Tracking Context Changes through Meta-Learning Machine Learning. ,vol. 27, pp. 259- 286 ,(1997) , 10.1023/A:1007365809034
David H. Wolpert, The Supervised Learning No-Free-Lunch Theorems Soft Computing and Industry. pp. 25- 42 ,(2002) , 10.1007/978-1-4471-0123-9_3
George Marsaglia, Wai Wan Tsang, Jingbo Wang, Evaluating Kolmogorov's distribution Journal of Statistical Software. ,vol. 8, pp. 1- 4 ,(2003) , 10.18637/JSS.V008.I18