Data Complexity Measures and Nearest Neighbor Classifiers: A Practical Analysis for Meta-learning

作者: G. D. C. Cavalcanti , T. I. Ren , B. A. Vale

DOI: 10.1109/ICTAI.2012.150

关键词: Artificial intelligenceMachine learningLarge margin nearest neighborPattern recognitionData setClassifier (UML)Data complexityComputer sciencek-nearest neighbors algorithm

摘要: The classifier accuracy is affected by the properties of data sets used to train it. Nearest neighbor classifiers are known for being simple and accurate in several domains, but their behavior strongly dependent on complexity. On other hand, there complexity measures which aim describe sets. This work aims show how can be efficiently predict Neighbor classifier. Seven seventeen real datasets experimental study. Each measure analyzed individually order find a relationship between its value given dataset. No single good enough However, combination these provides powerful tool

参考文章(16)
Ramón A Mollineda, J Salvador Sánchez, José M Sotoca, None, Data Characterization for Effective Prototype Selection Pattern Recognition and Image Analysis. pp. 27- 34 ,(2005) , 10.1007/11492542_4
Nathalie Japkowicz, Shaju Stephen, The class imbalance problem: A systematic study intelligent data analysis. ,vol. 6, pp. 429- 449 ,(2002) , 10.3233/IDA-2002-6504
Cristiano de Santana Pereira, George D. C. Cavalcanti, None, Prototype Selection for Handwritten Connected Digits Classification international conference on document analysis and recognition. pp. 1021- 1025 ,(2009) , 10.1109/ICDAR.2009.186
J. S. Sánchez, R. A. Mollineda, J. M. Sotoca, An analysis of how training data complexity affects the nearest neighbor classifiers Pattern Analysis and Applications. ,vol. 10, pp. 189- 201 ,(2007) , 10.1007/S10044-007-0061-2
Tin Kam Ho, E.B. Mansilla, On classifier domains of competence international conference on pattern recognition. ,vol. 1, pp. 136- 139 ,(2004) , 10.1109/ICPR.2004.648
Cristiano de Santana Pereira, George DC Cavalcanti, None, Handwritten connected digits detection: An approach using instance selection 2011 18th IEEE International Conference on Image Processing. pp. 2613- 2616 ,(2011) , 10.1109/ICIP.2011.6116201
Dennis L. Wilson, Asymptotic Properties of Nearest Neighbor Rules Using Edited Data systems man and cybernetics. ,vol. 2, pp. 408- 421 ,(1972) , 10.1109/TSMC.1972.4309137
Cristiano de Santana Pereira, George DC Cavalcanti, None, Instance selection algorithm based on a Ranking Procedure The 2011 International Joint Conference on Neural Networks. pp. 2409- 2416 ,(2011) , 10.1109/IJCNN.2011.6033531
T. Cover, P. Hart, Nearest neighbor pattern classification IEEE Transactions on Information Theory. ,vol. 13, pp. 21- 27 ,(1967) , 10.1109/TIT.1967.1053964
Tin Kam Ho, M. Basu, Complexity measures of supervised classification problems IEEE Transactions on Pattern Analysis and Machine Intelligence. ,vol. 24, pp. 289- 300 ,(2002) , 10.1109/34.990132