Deep Convnets from First Principles: Generative Models, Dynamic Programming, and EM

A grand challenge in machine learning is the development of computational algorithms that match or outperform humans in perceptual inference tasks such as visual object and speech recognition. The key factor complicating such tasks is the presence of numerous nuisance variables, for instance, the unknown object position, orientation, and scale in object recognition or the unknown voice pronunciation, pitch, and speed in speech recognition. Recently, a new breed of deep learning algorithms have emerged for high-nuisance inference tasks; they are constructed from many layers of alternating linear and nonlinear processing units and are trained using large-scale algorithms and massive amounts of training data. The recent success of deep learning systems is impressive — they now routinely yield pattern recognition systems with near- or super-human capabilities — but a fundamental question remains: Why do they work? Intuitions abound, but a coherent framework for understanding, analyzing, and synthesizing real-world deep learning architectures has remained elusive. We answer this question for convnets by developing a new probabilistic framework on a generative probabilistic model that explicitly captures variation due to nuisance variables. The graphical structure of the model enables it to be learned from data using classical expectation-maximization (EM) techniques. Furthermore, by relaxing the generative model to a discriminative one, we can recover two of the current leading deep learning systems, deep convolutional neural networks (DCNs) and random decision forests (RDFs), thus unifying hierarchical generative models with deep neural nets. Our work provides insights into the successes and shortcomings of current deep architectures, as well as a principled route to their improvement. For instance, Convnets admit a Dynamic Programming interpretation for solving a particular inference task. We also define a new class of generative convnets that can learn semi- and unsupervised from largely unlabeled data, and can also execute top-down inference. Using this model, we report state-of-the-art performance for semi-supervised learning on the MNIST and SVHN benchmarks.

Readings for this lecture

Ankit Patel

Ankit Patel is currently an Assistant Professor at the Baylor College of Medicine in the Dept. of Neuroscience, and at Rice University in the Dept. of Electrical and Computer Engineering. Ankit is broadly interested in the intersection between machine learning and computational neuroscience, two areas essential for understanding and building truly intelligent systems, with a focus on the low-level mechanisms by which learned representations work. He works with neuroscientists to build a bridge between artificial and real neuronal networks, using theories and experiments about artificial nets to help understand and make testable predictions about real brain circuits.

Ankit returned to academia after spending 6 years in industry, building real-time inference systems trained on large-scale data for ballistic missile defense (MIT Lincoln Laboratory), and high-frequency trading. He received his graduate and undergraduate degrees in Computer Science and Applied Mathematics from Harvard University.