Leveraging k-NN for generic classification boosting

作者: Paolo Piro , Richard Nock , Frank Nielsen , Michel Barlaud

DOI: 10.1016/J.NEUCOM.2011.07.026

关键词: Data miningExponential functionBayes' theoremBoosting (machine learning)Computer scienceMachine learningArtificial intelligence

摘要: Voting rules relying on k-nearest neighbors (k-NN) are an effective tool in countless many machine learning techniques. Thanks to its simplicity, k-NN classification is very attractive practitioners, as it enables good performances several practical applications. However, suffers from various drawbacks, like sensitivity ''noisy'' instances and poor generalization properties when dealing with sparse high-dimensional data. In this paper, we tackle the problem at core by providing a novel boosting approach. Namely, propose supervised algorithm, called Universal Nearest Neighbors (UNN), that induces leveraged rule globally minimizing surrogate risk upper bounding empirical misclassification rate over training Interestingly, can be arbitrary chosen class of Bregman loss functions, including familiar exponential, logistic squared losses. Furthermore, show UNN allows efficiently filter dataset keeping only small fraction Experimental results synthetic Ripley's such filtering strategy able reject examples, yields error close optimal Bayes error. Experiments standard UCI datasets significant improvements current state art.

参考文章(13)
B. D. Ripley, Neural Networks and Related Methods for Classification Journal of the Royal Statistical Society: Series B (Methodological). ,vol. 56, pp. 409- 437 ,(1994) , 10.1111/J.2517-6161.1994.TB01990.X
Robert E. Schapire, Yoav Freund, Peter Bartlett, Wee Sun Lee, Boosting the margin: a new explanation for the effectiveness of voting methods Annals of Statistics. ,vol. 26, pp. 1651- 1686 ,(1998) , 10.1214/AOS/1024691352
Peter L Bartlett, Michael I Jordan, Jon D McAuliffe, None, Convexity, Classification, and Risk Bounds Journal of the American Statistical Association. ,vol. 101, pp. 138- 156 ,(2006) , 10.1198/016214505000000907
P. Hart, The condensed nearest neighbor rule (Corresp.) IEEE Transactions on Information Theory. ,vol. 14, pp. 515- 516 ,(1968) , 10.1109/TIT.1968.1054155
Manfred K. Warmuth, Jun Liao, Gunnar Rätsch, Totally corrective boosting algorithms that maximize the margin Proceedings of the 23rd international conference on Machine learning - ICML '06. pp. 1001- 1008 ,(2006) , 10.1145/1143844.1143970
Robert E. Schapire, Yoram Singer, Improved boosting algorithms using confidence-rated predictions conference on learning theory. ,vol. 37, pp. 80- 91 ,(1998) , 10.1145/279943.279960
Christopher C Holmes, Niall M Adams, Likelihood inference in nearest‐neighbour classification models Biometrika. ,vol. 90, pp. 99- 112 ,(2003) , 10.1093/BIOMET/90.1.99
T. Cover, P. Hart, Nearest neighbor pattern classification IEEE Transactions on Information Theory. ,vol. 13, pp. 21- 27 ,(1967) , 10.1109/TIT.1967.1053964
J.R. Quinlan, Induction of Decision Trees Machine Learning. ,vol. 1, pp. 81- 106 ,(1986) , 10.1023/A:1022643204877
Lionel Cucala, Jean-Michel Marin, Christian P. Robert, D. M. Titterington, A Bayesian Reassessment of Nearest-Neighbor Classification Journal of the American Statistical Association. ,vol. 104, pp. 263- 273 ,(2009) , 10.1198/JASA.2009.0125