论文部分内容阅读
介绍一种新的在线自适应的动态模糊Q强化学习算法,系统根据从环境中得到的反馈评估已进行的决策,给予奖励和惩罚,更新系统的Q值,在线自动调整模糊控制的结构与参数。根据系统当前的环境状态以及模糊控制强化学习的Q值来决定当前规则的动作输出,并由模糊推理产生连续输出的动作,扩展贪心搜索策略,确保控制规则的各个输出动作在学习初期都被搜索过,避免陷入局部最优解。将有效跟踪算法和后设学习规则相结合,有效提高系统学习速率,在嵌入式平台中实时控制的实现以及和相关研究结论的对比验证该算法的优越性。
This paper introduces a new online adaptive dynamic fuzzy Q-reinforcement learning algorithm. The system evaluates the decision-making based on the feedback from the environment, rewards and penalizes, updates the Q value of the system, and automatically adjusts the structure and parameters of the fuzzy control online . According to the current state of the system and Q value of fuzzy control reinforcement learning to determine the current rules of the action output and fuzzy reasoning to produce continuous output action to expand the greedy search strategy to ensure that the control rules of the various output actions are searched early in the learning To avoid falling into the local optimal solution. Combining the effective tracking algorithm and the post-set learning rule, the system learning rate is effectively improved. The realization of real-time control in the embedded platform and the comparison with the relevant research conclusions verify the superiority of the algorithm.