Issue |
ESAIM: PS
Volume 24, 2020
|
|
---|---|---|
Page(s) | 69 - 99 | |
DOI | https://doi.org/10.1051/ps/2019021 | |
Published online | 27 February 2020 |
An adaptive multiclass nearest neighbor classifier*
1
National Research University Higher School of Economics,
20 Myasnitskaya ulitsa,
101000
Moscow, RF.
2
Institute for Information Transmission Problems RAS,
Bolshoy Karetny per. 19,
127051
Moscow, RF.
3
Weierstrass Institute and Humboldt University,
Mohrenstr. 39,
10117
Berlin, Germany.
** Corresponding author: npuchkin@hse.ru
Received:
23
December
2018
Accepted:
24
October
2019
We consider a problem of multiclass classification, where the training sample S_n={(Xi,Yi)}ni=1 is generated from the model ℙ(Y = m|X = x) = ηm(x), 1 ≤ m ≤ M, and η1(x), …, ηM(x) are unknown α-Holder continuous functions. Given a test point X, our goal is to predict its label. A widely used k-nearest-neighbors classifier constructs estimates of η1(X), …, ηM(X) and uses a plug-in rule for the prediction. However, it requires a proper choice of the smoothing parameter k, which may become tricky in some situations. We fix several integers n1, …, nK, compute corresponding nk-nearest-neighbor estimates for each m and each nk and apply an aggregation procedure. We study an algorithm, which constructs a convex combination of these estimates such that the aggregated estimate behaves approximately as well as an oracle choice. We also provide a non-asymptotic analysis of the procedure, prove its adaptation to the unknown smoothness parameter α and to the margin and establish rates of convergence under mild assumptions.
Mathematics Subject Classification: 62H30 / 62G08
Key words: Multiclass classification / k nearest neighbors / adaptive procedures
© EDP Sciences, SMAI 2020
Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.
Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.
Initial download of the metrics may take a while.