Deep Learning without Poor Local Minima

Deep Learning without Poor Local Minima

dc.date.accessioned	2016-05-24T20:45:08Z
dc.date.accessioned	2018-11-26T22:27:36Z
dc.date.available	2016-05-24T20:45:08Z
dc.date.available	2018-11-26T22:27:36Z
dc.date.issued	2016-05-23
dc.identifier.uri	http://hdl.handle.net/1721.1/102665
dc.identifier.uri	http://repository.aust.edu.ng/xmlui/handle/1721.1/102665
dc.description.abstract	In this paper, we prove a conjecture published in 1989 and also partially address an open problem announced at the Conference on Learning Theory (COLT) 2015. For an expected loss function of a deep nonlinear neural network, we prove the following statements under the independence assumption adopted from recent work: 1) the function is non-convex and non-concave, 2) every local minimum is a global minimum, 3) every critical point that is not a global minimum is a saddle point, and 4) the property of saddle points differs for shallow networks (with three layers) and deeper networks (with more than three layers). Moreover, we prove that the same four statements hold for deep linear neural networks with any depth, any widths and no unrealistic assumptions. As a result, we present an instance, for which we can answer to the following question: how difficult to directly train a deep model in theory? It is more difficult than the classical machine learning models (because of the non-convexity), but not too difficult (because of the nonexistence of poor local minima and the property of the saddle points). We note that even though we have advanced the theoretical foundations of deep learning, there is still a gap between theory and practice.	en_US
dc.format.extent	26 p.	en_US
dc.rights	Creative Commons Attribution 4.0 International	en
dc.rights.uri	http://creativecommons.org/licenses/by/4.0/
dc.subject	Optimization	en_US
dc.subject	Neural Network	en_US
dc.subject	Machine Learning	en_US
dc.subject	High Dimension	en_US
dc.subject	Convex	en_US
dc.subject	Non-convex	en_US
dc.subject	Local minimum	en_US
dc.subject	Global minimum	en_US
dc.subject	Saddle point	en_US
dc.subject	Critical point	en_US
dc.title	Deep Learning without Poor Local Minima	en_US

Files in this item

Files	Size	Format	View
MIT-CSAIL-TR-2016-005.pdf	344.5Kb	application/pdf	View/Open

This item appears in the following Collection(s)

Computer Science and Artificial Intelligence Lab (CSAIL)2625

Show simple item record

Except where otherwise noted, this item's license is described as Creative Commons Attribution 4.0 International

Deep Learning without Poor Local Minima

Files in this item

This item appears in the following Collection(s)

Computer Science and Artificial Intelligence Lab (CSAIL)2625