On the mean field theory and the tangent kernel theory for neural networks

Deep neural networks trained with stochastic gradient algorithms often achieve near vanishing training error, and generalize well on test data. Such empirical success of optimization and generalization, however, is quite surprising from a theoretical point of view, mainly due to non-convexity and overparameterization of deep neural networks.

In this lecture, I will talk about the mean field theory and the tangent kernel theory on the training dynamics of neural networks, and discuss about their benefits and shortcomings in terms of both optimization and generalization.Then I will analyze the generalization error of linearized neural networks with two interesting phenomena: staircase and double-descent. Finally, I will propose challenges and open problems in analyzing deep neural networks.

Song Mei

References

  1. Mei, Montanari, and Nguyen. A mean field view of the landscape of two-layers neural networks. Proceedings of the National Academy of Sciences 115, E7665-E7671.
  2. Rotskoff and Vanden-Eijnden. Neural networks as interacting particle systems: Asymptotic convexity of the loss landscape and universal scaling of the approximation error. arXiv:1805.00915.
  3. Chizat and Bach. On the Global Convergence of Gradient Descent for Over-parameterized Models using Optimal Transport. Advances in neural information processing systems, 2018, pp. 3036–3046.
  4. Jacot, Gabriel, and Hongler. Neural Tangent Kernel: Convergence and Generalization in Neural Networks. Advances in neural information processing systems, 2018, pp. 8571–8580.
  5. Belkin, Hsu, Ma, and Mandal. Reconciling modern machine learning practice and the bias-variance trade-off. Proceedings of the National Academy of Sciences 116.32 (2019): 15849-15854.
  6. Bach. Breaking the Curse of Dimensionality with Convex Neural Networks. The Journal of Machine Learning Research 18 (2017), no. 1, 629–681.
  7. Ghorbani, Mei, Misiakiewicz, and Montanari. Linearized two-layers neural networks in high dimension. arXiv:1904.12191.
  8. Hastie, Montanari, Rosset, and Tibshirani. Surprises in High-Dimensional Ridgeless Least Squares Interpolation. arXiv:1903.08560.
  9. Mei and Montanari. The generalization error of random features regression: Precise asymptotics and double descent curve. arXiv:1904.12191.

Back