diff --git a/README.md b/README.md index a33de51a26cccb6a21a2411f88a0706864415039..eadfca8d70310e93725542d014c5840abd58c02d 100644 --- a/README.md +++ b/README.md @@ -10,17 +10,10 @@ This repository contains the C++ implementation accompanying the AAAI-25 confere The centralized training for decentralized execution paradigm emerged as the state-of-the-art approach to $\epsilon$-optimally solving decentralized partially observable Markov decision processes. However, scalability remains a significant issue. This paper presents a novel and more scalable alternative, namely the sequential-move centralized training for decentralized execution. - -This paradigm further pushes the applicability of \citeauthor{bellman}'s principle of optimality, raising three new properties. - First, it allows a central planner to reason upon sufficient sequential-move statistics instead of prior simultaneous-move ones. - Next, it proves that $\epsilon$-optimal value functions are piecewise linear and convex in such sufficient sequential-move statistics. - Finally, it drops the complexity of the backup operators from double exponential to polynomial at the expense of longer planning horizons. -Besides, it makes it easy to use single-agent methods, \eg SARSA algorithm enhanced with these findings, while still preserving convergence guarantees. - Experiments on two- as well as many-agent domains from the literature against $\epsilon$-optimal simultaneous-move solvers confirm the superiority of our novel approach. This paradigm opens the door for efficient planning and reinforcement learning methods for multi-agent systems.