PhD dissertation of Sylvain Arlot -- Resampling and model selection

Resampling and Model selection

Version française de cette page - Back to index

I completed my Ph.D. in Mathematics at University Paris-Sud (Orsay). My advisor was Pascal Massart.

My Ph.D. thesis was awarded the price Marie-Jeanne Laurent-Duhamel 2011 from the French Statistical Society (SFDS).

Final version of the manuscript: [pdf] Notice that it is written in english, except the first chapter (chapter 2 being a shorter introduction, in english).
Extended table of contents of the manuscript [pdf]
Slides of the Ph.D. defense: [pdf]

The PhD manuscript, the slides of the defence and an abstract of my PhD are also available at TEL.

Abstract

This thesis takes place within the theories of non-parametric statistics and statistical learning. Its goal is to provide an accurate understanding of several resampling or model selection methods, from the non-asymptotic viewpoint.
The main advance in this thesis consists in the accurate calibration of model selection procedures, in order to make them optimal in practice for prediction. We study V-fold cross-validation (very commonly used, but badly known in theory, in particular for the question of choosing V) and several penalization procedures. We propose methods for calibrating accurately some penalties, for both their general shape and the multiplicative constants. The use of resampling allows to solve hard problems, in particular regression with a variable noise-level. We prove non-asymptotic theoretical results on these methods, such as oracle inequalities and adaptivity properties. These results rely in particular on some concentration inequalities.
We also consider the problem of confidence regions and multiple testing, when the data are high-dimensional, with general and unknown correlations. Using resampling methods, we can get rid of the curse of dimensionality, and "learn" these correlations. We mainly propose two procedures, and prove for both a non-asymptotic control of their level.

Keywords

Non-parametric statistics ; statistical learning ; resampling ; non-asymptotic ; V-fold cross-validation ; bootstrap ; model selection ; penalization ; nonparametric regression ; adaptivity ; heteroscedastic ; confidence regions ; multiple testing

AMS Classification

62G09 ; 62M20 ; 62G08 ; 62J02 ; 62G15 ; 62G10

Ph.D. defense board of examiners

M. Patrice BERTAIL ; CREST and University Paris-X (Examiner)
M. Philippe BERTHET ; University Rennes-I (Examiner)
M. Gilles BLANCHARD ; Fraunhofer FIRST, Berlin (Examiner)
M. Stéphane BOUCHERON ; University Paris-VII (President)
M. Olivier CATONI ; CNRS et University Paris-VI (Examiner)
M. Pascal MASSART ; University Paris-Sud XI (Advisor)

Ph.D. reviewers

Mr. Peter L. BARTLETT ; University of California, Berkeley
Mr. Yuhong YANG ; University of Minnesota