Master2 Data Sciences

Theoretical guidelines for high-dimensional data analysis

These lectures are offered in the M2 Data Sciences


08/11: amphi Monge, Ecole Polytechnique, 14h-18h
15/11: amphi Monge, Ecole Polytechnique, 14h-18h
29/11: amphi Monge, Ecole Polytechnique, 14h-18h
06/12: amphi Monge, Ecole Polytechnique, 14h-18h
12/13: amphi Monge, Ecole Polytechnique, 14h-18h


This course is not suited as a training for a PhD in mathematical statistics. You should instead follow the course Statistiques en grande dimension from the Master2 StatML and PS and MSV.


Goal of the lectures: The lecture will be based on some recent research papers (How to read a paper?). The presence during the lectures is mandatory and taken into account in the final evaluation.

Paper(s)SlidesFurther reading
1False discoveries, multiple testing, online issue and link to bandits problems
paper 1 Slides
Reliability of scientific findings? Quality Preserving Databases? Online FDR control
2Strength and weakness of the Lasso
Paper 1
Slides No free computationnal lunch
3Adaptive data analysis
Paper 1
Slides Kaggle overfiting
4Curse of dimensionality, robust PCA, theoretical limits
Paper 1 (suppl. material)
Slides Robust PCA
5Robust learning
Paper 1 Slides Learning with Median Of Means


The reports must be sent by email by February 15 in a zip file including:
- the report in pdf format (8 to 12 pages)
- the source code for the numerics
Look at the instructions for your report!