Skip to content
Snippets Groups Projects
Commit 72ccde4f authored by Rafael Fernandes Cunha's avatar Rafael Fernandes Cunha
Browse files

abstract

parent 73a43656
No related branches found
No related tags found
1 merge request!1Developing
......@@ -7,16 +7,7 @@ This repository contains the C++ implementation accompanying the AAAI-25 confere
**"Optimally Solving Simultaneous-Move Dec-POMDPs: The Sequential Central Planning Approach"**
The centralized training for decentralized execution paradigm emerged as the state-of-the-art approach to $\epsilon$-optimally solving decentralized partially observable Markov decision processes. However, scalability remains a significant issue.
This paper presents a novel and more scalable alternative, namely the sequential-move centralized training for decentralized execution.
First, it allows a central planner to reason upon sufficient sequential-move statistics instead of prior simultaneous-move ones.
Next, it proves that $\epsilon$-optimal value functions are piecewise linear and convex in such sufficient sequential-move statistics.
Finally, it drops the complexity of the backup operators from double exponential to polynomial at the expense of longer planning horizons.
Experiments on two- as well as many-agent domains from the literature against $\epsilon$-optimal simultaneous-move solvers confirm the superiority of our novel approach.
This paradigm opens the door for efficient planning and reinforcement learning methods for multi-agent systems.
The centralized training for decentralized execution paradigm emerged as the state-of-the-art approach to $\epsilon$-optimally solving decentralized partially observable Markov decision processes. However, scalability remains a significant issue. This paper presents a novel and more scalable alternative, namely the sequential-move centralized training for decentralized execution. First, it allows a central planner to reason upon sufficient sequential-move statistics instead of prior simultaneous-move ones. Next, it proves that $\epsilon$-optimal value functions are piecewise linear and convex in such sufficient sequential-move statistics. Finally, it drops the complexity of the backup operators from double exponential to polynomial at the expense of longer planning horizons. Experiments on two- as well as many-agent domains from the literature against $\epsilon$-optimal simultaneous-move solvers confirm the superiority of our novel approach. This paradigm opens the door for efficient planning and reinforcement learning methods for multi-agent systems.
## Overview
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment