作者: Sébastien Loustau , Michael Chichignoud
DOI:
关键词:
摘要: The problem of adaptive noisy clustering is investigated. Given a set observations $Z_i=X_i+\epsilon_i$, $i=1,...,n$, the goal to design clusters associated with law $X_i$'s, unknown density $f$ respect Lebesgue measure. Since we observe corrupted sample, direct approach as popular {\it $k$-means} not suitable in this case. In paper, propose $k$-means minimization, which based on loss function and deconvolution estimator $f$. particular, suffers from dependence bandwidth involved kernel. Fast rates convergence for excess risk are proposed particular choice bandwidth, depends smoothness Then, turn out into main issue paper: data-driven bandwidth. We state an upper bound new selection rule, called ERC (Empirical Risk Comparison). This rule Lepski's principle, where empirical risks different bandwidths compared. Finally, illustrate that can be used many statistical problems $M$-estimation nuisance parameter.