This talk focuses on training Deep Neural Networks (DNNs), which due to the enormous number of parameters current DNNs have, using the Hessian matrix or a full approximation to it in a second-order method is prohibitive, both in terms of memory requirements and computational cost per iteration. Hence, to be practical, layer-wise block-diagonal approximations to these matrices are usually used. Here we describe second-order quasi-Newton (QN), natural gradient (NG), and generalized Gauss-Newton (GGN) methods of this type that are competitive with and often outperform first-order methods. These methods include those that use layer-wise (i) Kronecker-factored BFGS and L-BFGS QN approximations, (ii) tensor normal covariance and (iii) mini-block Fisher matrix approximations, and (iv) Sherman-Morrison-Woodbury based variants of NG and GGN methods.
Donald Goldfarb is the Avanessians Professor in the IEOR Department at Columbia University. He is internationally recognized for the design and analysis of efficient and practical algorithms including the BFGS method, the steepest-edge simplex algorithms, and the Goldfarb-Idnani algorithm, for solving classes of optimization problems. Goldfarb received a BSChE from Cornell University, and M.S. and PhD from Princeton University. Goldfarb spent two years as a post-doc at the Courant Institute. In 1968, he co-founded the CS Department at the City College of New York, serving 14 years on its faculty. During the 1979–80 academic year, he was a Visiting Professor in the CS and ORIE Departments at Cornell University. In 1982, Goldfarb joined the IEOR Department at Columbia, serving as Chair from 1984–2002. He also served as Interim Dean of Columbia’s School of Engineering and Applied Science during the 1994–95 and 2012–13 academic years.
Goldfarb is a SIAM Fellow. He was awarded the INFORMS John Von Neumann Theory Prize in 2017, the Khachiyan Prize in 2013 the INFORMS Prize for Research Excellence in the Interface between OR and CS in 1995, and was listed in The World’s Most Influential Scientific Minds, 2014, as being among the 99 most cited mathematicians between 2002 and 2012. Goldfarb has served as an editor-in-chief of Mathematical Programming, an editor of the SIAM J. Numerical Analysis and the SIAM J. Optimization, and as an associate editor of Mathematics of Computation, Operations Research and Mathematical Programming Computation.