Show simple item record

Automatic shaping and decomposition of reward functions

dc.date.accessioned2007-02-13T19:01:57Z
dc.date.accessioned2018-11-24T10:25:19Z
dc.date.available2007-02-13T19:01:57Z
dc.date.available2018-11-24T10:25:19Z
dc.date.issued2007-02-13
dc.identifier.urihttp://hdl.handle.net/1721.1/35890
dc.identifier.urihttp://repository.aust.edu.ng/xmlui/handle/1721.1/35890
dc.description.abstractThis paper investigates the problem of automatically learning how torestructure the reward function of a Markov decision process so as tospeed up reinforcement learning. We begin by describing a method thatlearns a shaped reward function given a set of state and temporalabstractions. Next, we consider decomposition of the per-timestepreward in multieffector problems, in which the overall agent can bedecomposed into multiple units that are concurrently carrying outvarious tasks. We show by example that to find a good rewarddecomposition, it is often necessary to first shape the rewardsappropriately. We then give a function approximation algorithm forsolving both problems together. Standard reinforcement learningalgorithms can be augmented with our methods, and we showexperimentally that in each case, significantly faster learningresults.
dc.format.extent8 p.
dc.titleAutomatic shaping and decomposition of reward functions


Files in this item

FilesSizeFormatView
MIT-CSAIL-TR-2007-010.pdf267.9Kbapplication/pdfView/Open
MIT-CSAIL-TR-2007-010.ps818.7Kbapplication/postscriptView/Open

This item appears in the following Collection(s)

Show simple item record