Faster Algorithms for Deep Learning?
The last 10 years have seen a revolution in stochastic gradient methods, with variance-reduced methods like SAG/SVRG provably achieving faster convergence rates than all previous methods. These methods give dramatic speedups in a variety of applications but have had virtually no impact to the practice of training deep models. We hypothesize that this is due to the over-parameterized nature of modern deep learning models, where the models are so powerful that they could fit every training example with zero error (at least theoretically). Such over-parameterization nullifies the benefits of variance-reduced methods because in some sense it leads to “easier” optimization problems. In this work, we present algorithms specifically designed for over-parameterized models. This leads to methods that provably achieve Nesterov acceleration, methods that automatically tune the step-size as they learn, and methods that achieve superlinear convergence with second-order information.
About Prof. Mark Schmidt
Mark Schmidt is an associate professor in the Department of Computer Science at the University of British Columbia. His research focuses on machine learning and numerical optimization. He is a Canada Research Chair, Alfred P. Sloan Fellow, CIFAR Canada AI Chair with the Alberta Machine Intelligence Institute (Amii), and was awarded the most-recent SIAM/MOS Lagrange Prize in Continuous Optimization with Nicolas Le Roux and Francis Bach.
Mark Research Page: https://www.cs.ubc.ca/~schmidtm/
The seminar will be held online via Zoom on May 27th at 17h CET.
To attend the seminar please send an email to [email protected]