作者: Katsiaryna Mirylenka , George Giannakopoulos , Le Minh Do , Themis Palpanas
DOI: 10.1007/S10618-016-0484-8
关键词:
摘要: Machine learning algorithms perform differently in settings with varying levels of training set mislabeling noise. Therefore, the choice right algorithm for a particular problem is crucial. The contribution this paper towards two, dual problems: first, comparing behavior; and second, choosing noisy settings. We present "sigmoid rule" framework, which can be used to choose most appropriate depending on properties noise classification problem. framework uses an existing model expected performance as sigmoid function signal-to-noise ratio instances. study characteristics using five representative non-sequential classifiers, namely, Naive Bayes, kNN, SVM, decision tree classifier, rule-based three widely sequential classifiers based hidden Markov models, conditional random fields recursive neural networks. Based parameters we define intuitive criteria that are useful behavior presence Furthermore, show there connection between these underlying dataset, showing estimate over dataset regardless algorithm. applicable concept drift scenarios, including modeling user time, mining time series evolving nature.