Reading materials will be frequently updated as the course starts.

General Introduction to Deep Learning

Readings

  1. Deep Learning Book: A Comprehensive Introduction to Deep Learning
  2. An Introductory Article by LeCun, Bengio, and Hinton Published in *Nature*
  3. History and Development of Neural Networks
  4. An Overview from the Statistical Perspective

Online resources

  1. Online Tutorials
  2. Videos of Turing Lectures by Geoffrey Hinton and Yann LeCun

Lecture 1

Readings

  1. Emergence of Simple-cell Receptive Field Properties
  2. ImageNet Classification with Deep Convolutional Neural Networks (AlexNet)
  3. Very Deep Convolutional Networks for Large-Scale Image Recognition (VGG)
  4. Going Deeper with Convolutions (GoogLeNet)
  5. Deep Residual Learning for Image Recognition (ResNet)
  6. Dropout: A Simple Way to Prevent Neural Networks from Overfitting
  7. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
  8. Visualizing and Understanding Convolutional Neural Networks
  9. Understanding Deep Learning Requires Rethinking Generalization

Blogs

  1. An Intuitive Guide to Deep Network Architectures
  2. Neural Network Architectures

Videos

  1. Deep Visualization Toolbox

Lecture 2

  1. A. Achille and S. Soatto, Emergence of Invariance and Disentanglement in Deep Representations, JMLR 2018, https://arxiv.org/pdf/1706.01350.pdf
  2. A. Achille and S. Soatto, Where is the Information in a Deep Neural Network? https://arxiv.org/pdf/1905.12213.pdf
  3. (optional) A. Achille et al., The Information Complexity of Learning Tasks, their Structure and their Distance https://arxiv.org/pdf/1904.03292.pdf
  4. A. Achille, M. Rovere and S. Soatto, Critical Learning Periods in Deep Neural Networks, ICLR 2019, https://arxiv.org/pdf/1711.08856.pdf
  5. (optional) A. Achille, G. Mbeng an S. Soatto, Dynamics and Reachability of Learning Tasks, https://arxiv.org/abs/1810.02440
  6. A. Achille et al., Task2Vec, Task Embedding for Meta Learning, ICCV 2019, https://arxiv.org/pdf/1902.03545.pdf
  7. A. Golaktar et al., Time Matters in Regularizing Deep Networks: Weight Decay and Data Augmentation Affect Early Learning Dynamics, Matter Little Near Convergence, NeurIPS 2019, https://arxiv.org/pdf/1905.13277.pdf

Lecture 3

Readings

  1. Data-dependent Sample Complexity of Deep Neural Networks via Lipschitz Augmentation
  2. Regularization Matters: Generalization and Optimization of Neural Nets v.s. their Induced Kernel
  3. Learning One-hidden-layer Neural Networks with Landscape Design

Lecture 4

Readings

  1. Deep Neural Networks as Gaussian Processes
  2. Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent
  3. Bayesian Deep Convolutional Networks with Many Channels are Gaussian Processes

Lecture 5

Readings

  1. A Mean Field View of the Landscape of Two-Layers Neural Networks
  2. Mean-Field Theory of Two-Layers Neural Networks: Dimension-Free Bounds and Kernel Limit
  3. Mean Field Limit of the Learning Dynamics of Multilayer Neural Networks
  4. On the Global Convergence of Gradient Descent for Over-parameterized Models using Optimal Transport
  5. Convex Neural Networks

Lecture 6

Readings

  1. Neural Tangent Kernel: Convergence and Generalization in Neural Networks
  2. Random Features for Large-Scale Kernel Machines
  3. Limitations of Lazy Training of Two-layers Neural Networks

Lecture 7

Readings

  1. Towards Deep Learning Models Resistant to Adversarial Attacks
  2. Robustness May Be at Odds with Accuracy
  3. Intriguing Properties of Neural Networks
  4. Explaining and Harnessing Adversarial Examples

Lecture 8

Readings

  1. In Search of the Real Inductive Bias: On the Role of Implicit Regularization in Deep Learning
  2. Characterizing Implicit Bias in Terms of Optimization Geometry
  3. The Implicit Bias of Gradient Descent on Separable Data

Lecture 9

Readings

  1. Exact Solutions to the Nonlinear Dynamics of Learning in Deep Linear Neural Networks
  2. Qualitatively Characterizing Neural Network Optimization Problems

Lecture 10

Readings

  1. The Full Spectrum of Deepnet Hessians at Scale: Dynamics with SGD Training and Sample Size
  2. Identifying and Attacking the Saddle Point Problem in High-dimensional Non-convex Optimization
  3. Empirical Analysis of the Hessian of Over-Parametrized Neural Networks

To be discussed and extra

back