Reading materials will be frequently updated as the course starts.

## General Introduction to Deep Learning

**Readings**

- Deep Learning Book: A Comprehensive Introduction to Deep Learning
- An Introductory Article by LeCun, Bengio, and Hinton Published in *Nature*
- History and Development of Neural Networks
- An Overview from the Statistical Perspective

**Online resources**

## Lecture 1

**Readings**

- Emergence of Simple-cell Receptive Field Properties
- ImageNet Classification with Deep Convolutional Neural Networks (AlexNet)
- Very Deep Convolutional Networks for Large-Scale Image Recognition (VGG)
- Going Deeper with Convolutions (GoogLeNet)
- Deep Residual Learning for Image Recognition (ResNet)
- Dropout: A Simple Way to Prevent Neural Networks from Overfitting
- Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
- Visualizing and Understanding Convolutional Neural Networks
- Understanding Deep Learning Requires Rethinking Generalization

**Blogs**

**Videos**

## Lecture 2

- A. Achille and S. Soatto, Emergence of Invariance and Disentanglement in Deep Representations, JMLR 2018, https://arxiv.org/pdf/1706.01350.pdf
- A. Achille and S. Soatto, Where is the Information in a Deep Neural Network? https://arxiv.org/pdf/1905.12213.pdf
- (optional) A. Achille et al., The Information Complexity of Learning Tasks, their Structure and their Distance https://arxiv.org/pdf/1904.03292.pdf
- A. Achille, M. Rovere and S. Soatto, Critical Learning Periods in Deep Neural Networks, ICLR 2019, https://arxiv.org/pdf/1711.08856.pdf
- (optional) A. Achille, G. Mbeng an S. Soatto, Dynamics and Reachability of Learning Tasks, https://arxiv.org/abs/1810.02440
- A. Achille et al., Task2Vec, Task Embedding for Meta Learning, ICCV 2019, https://arxiv.org/pdf/1902.03545.pdf
- A. Golaktar et al., Time Matters in Regularizing Deep Networks: Weight Decay and Data Augmentation Affect Early Learning Dynamics, Matter Little Near Convergence, NeurIPS 2019, https://arxiv.org/pdf/1905.13277.pdf

## Lecture 3

**Readings**

- Data-dependent Sample Complexity of Deep Neural Networks via Lipschitz Augmentation
- Regularization Matters: Generalization and Optimization of Neural Nets v.s. their Induced Kernel
- Learning One-hidden-layer Neural Networks with Landscape Design

## Lecture 4

**Readings**

- Deep Neural Networks as Gaussian Processes
- Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent
- Bayesian Deep Convolutional Networks with Many Channels are Gaussian Processes

## Lecture 5

**Readings**

- A Mean Field View of the Landscape of Two-Layers Neural Networks
- Mean-Field Theory of Two-Layers Neural Networks: Dimension-Free Bounds and Kernel Limit
- Mean Field Limit of the Learning Dynamics of Multilayer Neural Networks
- On the Global Convergence of Gradient Descent for Over-parameterized Models using Optimal Transport
- Convex Neural Networks

## Lecture 6

**Readings**

- Neural Tangent Kernel: Convergence and Generalization in Neural Networks
- Random Features for Large-Scale Kernel Machines
- Limitations of Lazy Training of Two-layers Neural Networks

## Lecture 7

**Readings**

- Towards Deep Learning Models Resistant to Adversarial Attacks
- Robustness May Be at Odds with Accuracy
- Intriguing Properties of Neural Networks
- Explaining and Harnessing Adversarial Examples

## Lecture 8

**Readings**

- In Search of the Real Inductive Bias: On the Role of Implicit Regularization in Deep Learning
- Characterizing Implicit Bias in Terms of Optimization Geometry
- The Implicit Bias of Gradient Descent on Separable Data

## Lecture 9

**Readings**

- Exact Solutions to the Nonlinear Dynamics of Learning in Deep Linear Neural Networks
- Qualitatively Characterizing Neural Network Optimization Problems

## Lecture 10

**Readings**

- The Full Spectrum of Deepnet Hessians at Scale: Dynamics with SGD Training and Sample Size
- Identifying and Attacking the Saddle Point Problem in High-dimensional Non-convex Optimization
- Empirical Analysis of the Hessian of Over-Parametrized Neural Networks

## To be discussed and extra

- Mastering the game of Go with deep neural networks and tree search by Silver et al.
- Auto-Encoding Variational Bayes by Kingma and Welling
- Generative Adversarial Networks by Goodfellow et al.
- Understanding Deep Learning Requires Rethinking Generalization by Zhang et al.
- Deep Neural Networks with Random Gaussian Weights: A Universal Classification Strategy? by Giryes et al.
- Robust Large Margin Deep Neural Networks by Sokolic et al.
- Tradeoffs between Convergence Speed and Reconstruction Accuracy in Inverse Problems by Giryes et al.
- Understanding Trainable Sparse Coding via Matrix Factorization by Moreau and Bruna
- Why are Deep Nets Reversible: A Simple Theory, With Implications for Training by Arora et al.
- Stable Recovery of the Factors From a Deep Matrix Product and Application to Convolutional Network by Malgouyres and Landsberg
- Optimal Approximation with Sparse Deep Neural Networks by Bolcskei et al.
- Convolutional Rectifier Networks as Generalized Tensor Decompositions by Cohen and Shashua
- Emergence of Invariance and Disentanglement in Deep Representations by Achille and Soatto
- Deep Learning and the Information Bottleneck Principle by Tishby and Zaslavsky