Show simple item record

Deep Learning without Poor Local Minima

dc.date.accessioned2016-05-24T20:45:08Z
dc.date.accessioned2018-11-26T22:27:36Z
dc.date.available2016-05-24T20:45:08Z
dc.date.available2018-11-26T22:27:36Z
dc.date.issued2016-05-23
dc.identifier.urihttp://hdl.handle.net/1721.1/102665
dc.identifier.urihttp://repository.aust.edu.ng/xmlui/handle/1721.1/102665
dc.description.abstractIn this paper, we prove a conjecture published in 1989 and also partially address an open problem announced at the Conference on Learning Theory (COLT) 2015. For an expected loss function of a deep nonlinear neural network, we prove the following statements under the independence assumption adopted from recent work: 1) the function is non-convex and non-concave, 2) every local minimum is a global minimum, 3) every critical point that is not a global minimum is a saddle point, and 4) the property of saddle points differs for shallow networks (with three layers) and deeper networks (with more than three layers). Moreover, we prove that the same four statements hold for deep linear neural networks with any depth, any widths and no unrealistic assumptions. As a result, we present an instance, for which we can answer to the following question: how difficult to directly train a deep model in theory? It is more difficult than the classical machine learning models (because of the non-convexity), but not too difficult (because of the nonexistence of poor local minima and the property of the saddle points). We note that even though we have advanced the theoretical foundations of deep learning, there is still a gap between theory and practice.en_US
dc.format.extent26 p.en_US
dc.rightsCreative Commons Attribution 4.0 Internationalen
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/
dc.subjectOptimizationen_US
dc.subjectNeural Networken_US
dc.subjectMachine Learningen_US
dc.subjectHigh Dimensionen_US
dc.subjectConvexen_US
dc.subjectNon-convexen_US
dc.subjectLocal minimumen_US
dc.subjectGlobal minimumen_US
dc.subjectSaddle pointen_US
dc.subjectCritical pointen_US
dc.titleDeep Learning without Poor Local Minimaen_US


Files in this item

FilesSizeFormatView
MIT-CSAIL-TR-2016-005.pdf344.5Kbapplication/pdfView/Open

This item appears in the following Collection(s)

Show simple item record

Creative Commons Attribution 4.0 International
Except where otherwise noted, this item's license is described as Creative Commons Attribution 4.0 International