Exploration in Gradient-Based Reinforcement Learning
dc.date.accessioned | 2004-10-04T14:37:39Z | |
dc.date.accessioned | 2018-11-24T10:11:48Z | |
dc.date.available | 2004-10-04T14:37:39Z | |
dc.date.available | 2018-11-24T10:11:48Z | |
dc.date.issued | 2001-04-03 | en_US |
dc.identifier.uri | http://hdl.handle.net/1721.1/6076 | |
dc.identifier.uri | http://repository.aust.edu.ng/xmlui/handle/1721.1/6076 | |
dc.description.abstract | Gradient-based policy search is an alternative to value-function-based methods for reinforcement learning in non-Markovian domains. One apparent drawback of policy search is its requirement that all actions be 'on-policy'; that is, that there be no explicit exploration. In this paper, we provide a method for using importance sampling to allow any well-behaved directed exploration policy during learning. We show both theoretically and experimentally that using this method can achieve dramatic performance improvements. | en_US |
dc.format.extent | 5594043 bytes | |
dc.format.extent | 516972 bytes | |
dc.language.iso | en_US | |
dc.title | Exploration in Gradient-Based Reinforcement Learning | en_US |
Files in this item
Files | Size | Format | View |
---|---|---|---|
AIM-2001-003.pdf | 516.9Kb | application/pdf | View/ |
AIM-2001-003.ps | 5.594Mb | application/postscript | View/ |