Issue |
ESAIM: PS
Volume 21, 2017
|
|
---|---|---|
Page(s) | 412 - 451 | |
DOI | https://doi.org/10.1051/ps/2017005 | |
Published online | 08 January 2018 |
Slope heuristics and V-Fold model selection in heteroscedastic regression using strongly localized bases
CREST, ENSAI, Campus de Ker-Lann, rue Blaise Pascal, BP 37203, 35172 Bruz Cedex, France.
fabien.navarro@ensai.fr; adrien.saumard@ensai.fr
Received: 25 August 2016
Revised: 14 February 2017
Accepted: 8 March 2017
We investigate the optimality for model selection of the so-called slope heuristics, V-fold cross-validation and V-fold penalization in a heteroscedatic with random design regression context. We consider a new class of linear models that we call strongly localized bases and that generalize histograms, piecewise polynomials and compactly supported wavelets. We derive sharp oracle inequalities that prove the asymptotic optimality of the slope heuristics – when the optimal penalty shape is known – and V-fold penalization. Furthermore, V-fold cross-validation seems to be suboptimal for a fixed value of V since it recovers asymptotically the oracle learned from a sample size equal to 1 − V-1 of the original amount of data. Our results are based on genuine concentration inequalities for the true and empirical excess risks that are of independent interest. We show in our experiments the good behavior of the slope heuristics for the selection of linear wavelet models. Furthermore, V-fold cross-validation and V-fold penalization have comparable efficiency.
Mathematics Subject Classification: 62G08 / 62G09
Key words: Nonparametric regression / heteroscedastic noise / random design / model selection / cross-validation / wavelets
© EDP Sciences, SMAI, 2017
Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.
Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.
Initial download of the metrics may take a while.