Leveraging k-NN for generic classification boosting

作者： Paolo Piro , Richard Nock , Frank Nielsen , Michel Barlaud

DOI: 10.1016/J.NEUCOM.2011.07.026

关键词: Data mining 、 Exponential function 、 Bayes' theorem 、 Boosting (machine learning) 、 Computer science 、 Machine learning 、 Artificial intelligence

摘要: Voting rules relying on k-nearest neighbors (k-NN) are an effective tool in countless many machine learning techniques. Thanks to its simplicity, k-NN classification is very attractive practitioners, as it enables good performances several practical applications. However, suffers from various drawbacks, like sensitivity ''noisy'' instances and poor generalization properties when dealing with sparse high-dimensional data. In this paper, we tackle the problem at core by providing a novel boosting approach. Namely, propose supervised algorithm, called Universal Nearest Neighbors (UNN), that induces leveraged rule globally minimizing surrogate risk upper bounding empirical misclassification rate over training Interestingly, can be arbitrary chosen class of Bregman loss functions, including familiar exponential, logistic squared losses. Furthermore, show UNN allows efficiently filter dataset keeping only small fraction Experimental results synthetic Ripley's such filtering strategy able reject examples, yields error close optimal Bayes error. Experiments standard UCI datasets significant improvements current state art.

sciencedirect.com 本地加速

参考文章(13)

B. D. Ripley, Neural Networks and Related Methods for Classification Journal of the Royal Statistical Society: Series B (Methodological). ,vol. 56, pp. 409- 437 ,(1994) , 10.1111/J.2517-6161.1994.TB01990.X

Robert E. Schapire, Yoav Freund, Peter Bartlett, Wee Sun Lee, Boosting the margin: a new explanation for the effectiveness of voting methods Annals of Statistics. ,vol. 26, pp. 1651- 1686 ,(1998) , 10.1214/AOS/1024691352

Peter L Bartlett, Michael I Jordan, Jon D McAuliffe, None, Convexity, Classification, and Risk Bounds Journal of the American Statistical Association. ,vol. 101, pp. 138- 156 ,(2006) , 10.1198/016214505000000907

P. Hart, The condensed nearest neighbor rule (Corresp.) IEEE Transactions on Information Theory. ,vol. 14, pp. 515- 516 ,(1968) , 10.1109/TIT.1968.1054155

Manfred K. Warmuth, Jun Liao, Gunnar Rätsch, Totally corrective boosting algorithms that maximize the margin Proceedings of the 23rd international conference on Machine learning - ICML '06. pp. 1001- 1008 ,(2006) , 10.1145/1143844.1143970

Robert E. Schapire, Yoram Singer, Improved boosting algorithms using confidence-rated predictions conference on learning theory. ,vol. 37, pp. 80- 91 ,(1998) , 10.1145/279943.279960

Christopher C Holmes, Niall M Adams, Likelihood inference in nearest‐neighbour classification models Biometrika. ,vol. 90, pp. 99- 112 ,(2003) , 10.1093/BIOMET/90.1.99

T. Cover, P. Hart, Nearest neighbor pattern classification IEEE Transactions on Information Theory. ,vol. 13, pp. 21- 27 ,(1967) , 10.1109/TIT.1967.1053964

J.R. Quinlan, Induction of Decision Trees Machine Learning. ,vol. 1, pp. 81- 106 ,(1986) , 10.1023/A:1022643204877

10.

Lionel Cucala, Jean-Michel Marin, Christian P. Robert, D. M. Titterington, A Bayesian Reassessment of Nearest-Neighbor Classification Journal of the American Statistical Association. ,vol. 104, pp. 263- 273 ,(2009) , 10.1198/JASA.2009.0125

Leveraging k-NN for generic classification boosting

来源期刊

我的账户

Leveraging k-NN for generic classification boosting

来源期刊

相似文章 10

我的账户