论文部分内容阅读
MARKOV decision processes (MDPs) have been studied by mathematicians, probabilists, operation researchers and engineers since the late 1950s. In an MDPs a stochastic, dynamic system is controlled by a 'policy' selected by a decision-maker/controll