论文部分内容阅读
Q学习算法是一种最受欢迎的模型无关强化学习算法。本文通过对Q学习算法进行合适的扩充,提出了一种适合于多agent协作团队的共享经验元组的多agent协同强化学习算法,其中采用一种新的状态行为的知识表示方法使得状态行为空间得到缩减,采用相似性变换和经验元组的共享使得学习的效率得到提高。最后将该算法应用于猎人捕物问题域。实验结果表明该算法能够加快多个猎人合作抓捕猎物的进程,有利于协作任务的成功执行,并能提高多agent协作团队的协作效率,因此该算法是有效的。
Q learning algorithm is one of the most popular model-independent reinforcement learning algorithms. This paper proposes a new multi-agent cooperative reinforcement learning algorithm based on the shared experience tuple which is suitable for multi-agent collaborative team through the appropriate expansion of Q learning algorithm. A new knowledge representation method of state behavior makes the state behavior space Is reduced, the efficiency of learning is improved by the similarity transformation and the sharing of empirical tuples. Finally, the algorithm is applied to the hunter-catching problem domain. Experimental results show that this algorithm can speed up the process of multiple hunters cooperating to capture prey, which is beneficial to the successful execution of collaborative tasks and improves the collaboration efficiency of multi-agent collaborative teams. Therefore, this algorithm is effective.