Master2 Data Sciences


Theoretical guidelines for high-dimensional data analysis


These lectures are offered in the M2 Data Sciences

Schedule


28/11: room 0A1, Institut de Mathématiqued d'Orsay, 14h-18h
05/12: room 0A1, Institut de Mathématiqued d'Orsay, 14h-18h
12/12: room 0A1, Institut de Mathématiqued d'Orsay, 14h-18h
19/12: room 0A1, Institut de Mathématiqued d'Orsay, 14h-18h
09/01: room 0A1, Institut de Mathématiqued d'Orsay, 14h-18h

Access

The Institut de Mathématiques d'Orsay is located in Building 307, Orsay campus, 5min walk from RER station Orsay-Ville.
Access plan

Warning!

This course is intended as a theoretical reflexion on the practice of data analysis. It is not suited as a training for a PhD in mathematical statistics. You should instead follow the course Statistiques en grande dimension offered in the Master2 StatML and PS and MSV.

Program

Goal of the lectures: The lecture will be based on some recent research papers (How to read a paper?). The list of the research papers below is subject to change before the start of the lectures. The presence during the lectures is mandatory and taken into account in the final evaluation.


LectureTopic
Paper(s)SlidesFurther reading
1False discoveries, multiple testing, online issue and link to bandits problems
Paper 1 Slides
Reliability of scientific findings? Quality Preserving Databases? Online FDR control
2Strength and weakness of the Lasso
Paper 1
Slides No free computationnal lunch
3Adaptive data analysis
Paper 1
Slides Kaggle overfiting
4Curse of dimensionality, robust PCA, theoretical limits
Paper 1 (suppl. material)
Slides Robust PCA
5Robust learning
Paper 1 Slides Learning with Median Of Means


Evaluation

The reports must be sent by email by February 15 in a zip file including:
- the report in pdf format (8 to 12 pages)
- the source code for the numerics
Look at the instructions for your report!