zoukankan      html  css  js  c++  java
  • Monte Carlo Control

    Problem of State-Value Function

    Similar as Policy Iteration in Model-Based Learning, Generalized Policy Iteration will be used in Monte Carlo Control. In Policy Iteration, we keep doing Policy Evaluation and Policy Improvement untill our policy converging to Optimal Policy.

    Every time when we improve the policy, the action that gives the best return(reward+value function of the next state) will be picked.

    The problem of this algorithm if we directly transfering to Monte Carlo is: it is based on the Transition Matrix.

    Monte Carlo Control based on Q function

    The idea of Policy Iteration can be used to Estimite Action-Value Function, and it is very useful for Model-Free problem. The process of choosing actions does not depend on State-Value function, because the return from a specific action is given by Monte Carlo estimation.

    Q function can be updated by:

    When we improve the policy, we just pick the action that produce the maximum Q value.

    Exploration-exploitation Dilemma and ε-Greedy Exploration:

    In Model-Based Policy Iteration algorithm, we update all State-Value function within a single policy evaluation process, so that we can choose the best actions from the whole action space  whiled improving policies. Nevertheless, Monte Carlo Learning only updates the Action-Value functions whose actions were taken on the previous episode. So there are probabily some actions having better returns than the actions we have tried. Sometimes we need to give them a trial. We call that problem the Exploration-Exploitation Delemma.

    It is necessary to try some new opened restaurant, rather than going to the usual place every day. 

    ε-Greedy Exploration is the algorithm that gives the agent probability=ε to choose randomly actions and 1-ε to stay on the optimal action.

  • 相关阅读:
    如何修改自定义Webpart的标题?(downmoon)
    vs2003 和vs2005下的发送SMTP邮件
    Entity Framework 4.1 之八:绕过 EF 查询映射
    Entity Framework 4.1 之七:继承
    Entity Framework 4.1 之四:复杂类型
    Entity Framework 4.1 之三 : 贪婪加载和延迟加载
    MVC2 强类型的 HTML Helper
    EF Code First 和 ASP.NET MVC3 工具更新
    Entity Framework 4.1 之六:乐观并发
    Entity Framework 4.1 之一 : 基础
  • 原文地址:https://www.cnblogs.com/rhyswang/p/11258273.html
Copyright © 2011-2022 走看看