论文部分内容阅读
按TD误差标准,把Q学习系统的状态-动作空间粗略地划分为正负2类.为了描述分类的不确定性和避免简单分类导致的学习精度下降问题,利用概率型支持向量分类机(PSVCM)来使得样本的分类同时具有定性的解释和定量的评价.PSVCM的输入为系统的连续状态和离散动作,输出为带有概率值的类别标签.对由PSVCM判定为正类的离散动作按其概率值进行加权求和,即可得到连续动作空间下的Q学习控制策略.小船靠岸问题的仿真结果表明,与基于传统支持向量分类机的Q学习相比,所提方法不仅能够有效解决具有连续状态和连续动作的非线性系统的Q学习控制,而且其控制性能对初始动作的设置不敏感.
According to the TD error criterion, the state-action space of Q learning system is roughly divided into positive and negative category 2. To describe the uncertainty of classification and reduce the learning precision caused by simple classification, the PSVCM ) To make the classification of samples have qualitative interpretation and quantitative evaluation.PSVCM input is the continuous state of the system and discrete actions, the output is a label with the probability value of the classification by the PSVCM judged as positive discrete actions according to its We can get the Q learning control strategy under the continuous action space.The simulation results of the ship docking problem show that compared with the Q learning based on the traditional support vector machines, the proposed method not only can effectively solve the Q learning control problem with Q-learning control of nonlinear system with continuous state and continuous action, and its control performance is insensitive to the setting of initial action.