论文部分内容阅读
This paper investigated how to le the optimal action policies in cooperative multiagent systems if the agents’ rewards are random variables, and proposed a general two-stage leing algorithm for cooperative multiagent decision processes. The algorithm first calculates the averaged immediate rewards, and considers these leed rewards as the agents’ immediate action rewards to le the optimal action policies. It is proved that the leing algorithm can find the optimal policies in stochastic environment. Extending the algorithm to stochastic Markov decision processes was also discussed.