Discussion: Local Rademacher complexities and oracle inequalities in risk minimization

作者: Gilles Blanchard , Pascal Massart

DOI: 10.1214/009053606000001037

关键词: Class (set theory)Statistical learning theoryRandom variableIndependent and identically distributed random variablesMathematical economicsMathematicsMathematical optimizationFeature vectorRankingEmpirical processEmpirical risk minimization

摘要: In this magnificent paper, Professor Koltchinskii offers general and powerful performance bounds for empirical risk minimization, a fundamental principle of statistical learning theory. Since the elegant pioneering work Vapnik Chervonenkis in early 1970s, various such have been known that relate minimizers to combinatorial geometrical features class over which minimization is performed. This area research has rich source motivation major field applications process The appearance advanced concentration inequalities 1990s, primarily thanks Talagrand’s influential work, provoked advances both theory led much deeper understanding some basic phenomena. discussed paper develops new methodology, iterative localization, which, with help inequalities, able explain most recent results go significantly beyond them many cases. main behind Koltchinskii’s based on classical problems as binary classification regression given sample (Xi ,Y i), i = 1 ,...,n , independent identically distributed pairs random variables (where Xi take their values feature space X Yi are, say, real-valued), goal find function f : → R whose risk, defined terms expected value an appropriately chosen loss function, small possible. remaining part discussion we point out how can be used study seemingly different model, motivated by nonparametric ranking problems, received increasing attention machine literature. Indeed, several applications, search engine problem or credit screening, learn rank—or score—observations rather than just classify them. case, measures involve observations, seen, instance, AUC (Area Under ROC Curve) criterion.

参考文章(38)
Olivier Catoni, Statistical learning theory and stochastic optimization Springer Berlin Heidelberg. ,(2004) , 10.1007/B99352
Bradley Efron, The Estimation of Prediction Error Journal of the American Statistical Association. ,vol. 99, pp. 619- 632 ,(2004) , 10.1198/016214504000000692
Cohn L. Mallows, More Comments onCp Technometrics. ,vol. 37, pp. 362- 372 ,(1995) , 10.1080/00401706.1995.10484370
Evarist Giné, Victor de la Peña, De La Pena .G, Decoupling: From Dependence to Independence ,(1998)
James Franklin, The elements of statistical learning : data mining, inference,and prediction The Mathematical Intelligencer. ,vol. 27, pp. 83- 85 ,(2005) , 10.1007/BF02985802
Alexandre B. Tsybakov, Optimal Rates of Aggregation Learning Theory and Kernel Machines. pp. 303- 313 ,(2003) , 10.1007/978-3-540-45167-9_23
Lucien Birg�, Pascal Massart, Rates of convergence for minimum contrast estimators Probability Theory and Related Fields. ,vol. 97, pp. 113- 150 ,(1993) , 10.1007/BF01199316
Stéphan Clémençon, Gábor Lugosi, Nicolas Vayatis, Ranking and Empirical Minimization of U-statistics Annals of Statistics. ,vol. 36, pp. 844- 874 ,(2008) , 10.1214/009052607000000910