On the Convergence of Stochastic Iterative Dynamic Programming Algorithms
dc.date.accessioned | 2004-10-20T20:49:46Z | |
dc.date.accessioned | 2018-11-24T10:23:21Z | |
dc.date.available | 2004-10-20T20:49:46Z | |
dc.date.available | 2018-11-24T10:23:21Z | |
dc.date.issued | 1993-08-01 | en_US |
dc.identifier.uri | http://hdl.handle.net/1721.1/7205 | |
dc.identifier.uri | http://repository.aust.edu.ng/xmlui/handle/1721.1/7205 | |
dc.description.abstract | Recent developments in the area of reinforcement learning have yielded a number of new algorithms for the prediction and control of Markovian environments. These algorithms, including the TD(lambda) algorithm of Sutton (1988) and the Q-learning algorithm of Watkins (1989), can be motivated heuristically as approximations to dynamic programming (DP). In this paper we provide a rigorous proof of convergence of these DP-based learning algorithms by relating them to the powerful techniques of stochastic approximation theory via a new convergence theorem. The theorem establishes a general class of convergent algorithms to which both TD(lambda) and Q-learning belong. | en_US |
dc.format.extent | 15 p. | en_US |
dc.format.extent | 77605 bytes | |
dc.format.extent | 356324 bytes | |
dc.language.iso | en_US | |
dc.subject | reinforcement learning | en_US |
dc.subject | stochastic approximation | en_US |
dc.subject | sconvergence | en_US |
dc.subject | dynamic programming | en_US |
dc.title | On the Convergence of Stochastic Iterative Dynamic Programming Algorithms | en_US |
Files in this item
Files | Size | Format | View |
---|---|---|---|
AIM-1441.pdf | 356.3Kb | application/pdf | View/ |
AIM-1441.ps.Z | 77.60Kb | application/octet-stream | View/ |