Dynamic Programming and Policy Evaluation - 走看看

zoukankan html css js c++ java

Dynamic Programming and Policy Evaluation

Dynamic Programming divides the original problem into subproblems, and then complete the whole task by recursively conquering these subproblems. The key idea of DP, and of reinforcement learning generally, is the use of value functions to organize and structure the search for good policies. It assumes the full knowledge of the environment: someone tells us the state space, action space, transition struction, the reward structure, discounted factor...

We start with policy evaluation: given the MDP and an arbitary Policy π, we use Bellman Equation to recursively calculate the State-Value function:

And the policy evaluation algorithm is given by following:

The stop criteria is only very small change for the value state function.

The example is a GridWorld puzzle, the task is to reach grey cell with most reward. The policy for the possible actions (up,down,left,right) are equivalent, all 25%.

Like a random walk, after calculation, we got :

查看全文

相关阅读:
Clojure编写一个阶乘程序使用递归
 SSH框架学习步骤
 js需要清楚的内存模型
 SeaJS结合javascript面向对象使用笔记（一）
函数副作用
 linux笔记2
C#事件与接口编程实例
 C#的接口基础教程之七覆盖虚接口
 C#的接口基础教程之六接口转换
 C#的接口基础教程之五实现接口

原文地址：https://www.cnblogs.com/rhyswang/p/11161983.html

Copyright © 2011-2022 走看看