One of the main challenges in developing a robust and predictive theory of deep learning is that we do not have a comprehensive mathematical formalism with which to describe the problem. In this talk, I will discuss aspects of random matrix theory and free probability theory and argue that these tools should be considered as important elements of such a formalism. I will present three case studies showcasing the power of these tools, demonstrating that they can provide robust characterizations of the loss landscape, accurately predict model performance, and significantly reduce training time.
Readings for this lecture
Jeffrey Pennington is a Research Scientist at Google Brain, New York City. Prior to this, he was a postdoctoral fellow at Stanford University, as a member of the Stanford Artificial Intelligence Laboratory in the Natural Language Processing (NLP) group. He received his Ph.D. in theoretical particle physics from Stanford University while working at the SLAC National Accelerator Laboratory.
Jeffrey’s research interests are multidisciplinary, ranging from the development of calculational techniques in perturbative quantum field theory to the vector representation of words and phrases in NLP to the study of trainability and expressivity in deep learning. Recently, his work has focused on building a set of theoretical tools with which to study deep neural networks. Leveraging techniques from random matrix theory and free probability, Jeffrey has investigated the geometry of neural network loss surfaces and the learning dynamics of very deep neural networks. He has also developed a new framework to begin harnessing the power of random matrix theory in applications with nonlinear dependencies, like deep learning.