Relational Envelope-based Planning

Unknown author (2007-12-31)

This thesis proposes a synthesis of logic and probability for solving stochastic sequential decision-making problems. We address two main questions: How can we take advantage of logical structure to speed up planning in a principled way? And, how can probability inform the production of a more robust, yet still compact, policy? We can take as inspiration a mobile robot acting in the world: it is faced with a varied amount ofsensory data and uncertainty in its action outcomes. Or, consider a logistics planning system: it must deliver a large number of objects to the right place at the right time. Many interesting sequential decision-making domains involve large statespaces, large stochastic action sets, and time pressure to act. In this work, we show how structured representations of the environment's dynamics can constrain and speed up the planning process. We start with a problem domain described in a probabilistic logical description language.Our technique is based on, first, identifying the most parsimonious representation that permits solution of the described problem. Next, we take advantage of the structured problem description to dynamically partition the action space into a set of equivalence classes with respect to this minimal representation. The partitioned action space results in fewer distinctactions. This technique can yield significant gains in planning efficiency.Next, we develop an anytime technique to elaborate on this initial plan. Our approach uses the envelope MDP framework, which creates a Markov decision process out of a subset of the possible state space. This strategy lets an agent begin acting quicklywithin a restricted part of the full state space, as informed by the original plan,and to judiciously expand its envelope as resources permit.Finally, we show how the representation space itself can be elaborated within the anytime framework. This approach balances the need to respond to time-pressure and to produce the most robust policies possible. We present experimental results in some synthetic planning domains and in a simulated military logistics domain.