Pitfalls in benchmarking data stream classification and how to avoid them

作者: Albert Bifet , Jesse Read , Indrė Žliobaitė , Bernhard Pfahringer , Geoff Holmes

DOI: 10.1007/978-3-642-40988-2_30

关键词:

摘要: Data stream classification plays an important role in modern data analysis, where arrives a and needs to be mined real time. In the setting underlying distribution from which this comes may changing evolving, so classifiers that can update themselves during operation are becoming state-of-the-art. paper we show streams have temporal component, currently is not considered evaluation benchmarking of classifiers. We demonstrate how naive classifier considering component only outperforms lot current state-of-the-art on dependence, i.e. autocorrelated. propose evaluate taking into account introduce new measure, provides more accurate gauge performance. response dependence issue generic wrapper for classifiers, incorporates attribute space.

参考文章(26)
Albert Bifet, Rafael Morales-Bueno, Ricard Gavald, Manuel Baena-Garc, Jose del Campo ¶ Avila, Early Drift Detection Method ,(2005)
Albert Bifet, Geoff Holmes, Bernhard Pfahringer, Eibe Frank, Fast perceptron decision tree learning from evolving data streams knowledge discovery and data mining. pp. 299- 310 ,(2010) , 10.1007/978-3-642-13672-6_30
João Gama, Pedro Medas, Gladys Castillo, Pedro Rodrigues, Learning with Drift Detection Advances in Artificial Intelligence – SBIA 2004. pp. 286- 295 ,(2004) , 10.1007/978-3-540-28645-5_29
Tarek Abudawood, Peter Flach, Evaluation Measures for Multi-class Subgroup Discovery european conference on machine learning. pp. 35- 50 ,(2009) , 10.1007/978-3-642-04180-8_20
João Bártolo Gomes, Ernestina Menasalvas, Pedro A. C. Sousa, Learning recurring concepts from data streams with a context-aware ensemble Proceedings of the 2011 ACM Symposium on Applied Computing - SAC '11. pp. 994- 999 ,(2011) , 10.1145/1982185.1982403
Geoff Hulten, Laurie Spencer, Pedro Domingos, Mining time-changing data streams knowledge discovery and data mining. pp. 97- 106 ,(2001) , 10.1145/502512.502529
Jakub M. Tomczak, Adam Gonczarek, Decision rules extraction from data stream in the presence of changing context for diabetes treatment Knowledge and Information Systems. ,vol. 34, pp. 521- 546 ,(2013) , 10.1007/S10115-012-0488-7
Albert Bifet, Geoff Holmes, Bernhard Pfahringer, Richard Kirkby, Ricard Gavaldà, New ensemble methods for evolving data streams Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '09. pp. 139- 148 ,(2009) , 10.1145/1557019.1557041
João Bártolo Gomes, Ernestina Menasalvas, Pedro A. C. Sousa, CALDS: context-aware learning from data streams Proceedings of the First International Workshop on Novel Data Stream Pattern Mining Techniques. pp. 16- 24 ,(2010) , 10.1145/1833280.1833283