作者: Paolo Piro , Richard Nock , Frank Nielsen , Michel Barlaud
DOI: 10.1016/J.NEUCOM.2011.07.026
关键词: Data mining 、 Exponential function 、 Bayes' theorem 、 Boosting (machine learning) 、 Computer science 、 Machine learning 、 Artificial intelligence
摘要: Voting rules relying on k-nearest neighbors (k-NN) are an effective tool in countless many machine learning techniques. Thanks to its simplicity, k-NN classification is very attractive practitioners, as it enables good performances several practical applications. However, suffers from various drawbacks, like sensitivity ''noisy'' instances and poor generalization properties when dealing with sparse high-dimensional data. In this paper, we tackle the problem at core by providing a novel boosting approach. Namely, propose supervised algorithm, called Universal Nearest Neighbors (UNN), that induces leveraged rule globally minimizing surrogate risk upper bounding empirical misclassification rate over training Interestingly, can be arbitrary chosen class of Bregman loss functions, including familiar exponential, logistic squared losses. Furthermore, show UNN allows efficiently filter dataset keeping only small fraction Experimental results synthetic Ripley's such filtering strategy able reject examples, yields error close optimal Bayes error. Experiments standard UCI datasets significant improvements current state art.