Exploration in Gradient-Based Reinforcement Learning

Exploration in Gradient-Based Reinforcement Learning

Unknown author (2001-04-03)

Gradient-based policy search is an alternative to value-function-based methods for reinforcement learning in non-Markovian domains. One apparent drawback of policy search is its requirement that all actions be 'on-policy'; that is, that there be no explicit exploration. In this paper, we provide a method for using importance sampling to allow any well-behaved directed exploration policy during learning. We show both theoretically and experimentally that using this method can achieve dramatic performance improvements.

URI: http://hdl.handle.net/1721.1/6076
http://repository.aust.edu.ng/xmlui/handle/1721.1/6076

View/Open

AIM-2001-003.ps (5.334Mb)

AIM-2001-003.pdf (504.8Kb)

Show full item record

Collections:

Computer Science and Artificial Intelligence Lab (CSAIL)2625