zoukankan      html  css  js  c++  java
  • Monte Carlo Control

    Problem of State-Value Function

    Similar as Policy Iteration in Model-Based Learning, Generalized Policy Iteration will be used in Monte Carlo Control. In Policy Iteration, we keep doing Policy Evaluation and Policy Improvement untill our policy converging to Optimal Policy.

    Every time when we improve the policy, the action that gives the best return(reward+value function of the next state) will be picked.

    The problem of this algorithm if we directly transfering to Monte Carlo is: it is based on the Transition Matrix.

    Monte Carlo Control based on Q function

    The idea of Policy Iteration can be used to Estimite Action-Value Function, and it is very useful for Model-Free problem. The process of choosing actions does not depend on State-Value function, because the return from a specific action is given by Monte Carlo estimation.

    Q function can be updated by:

    When we improve the policy, we just pick the action that produce the maximum Q value.

    Exploration-exploitation Dilemma and ε-Greedy Exploration:

    In Model-Based Policy Iteration algorithm, we update all State-Value function within a single policy evaluation process, so that we can choose the best actions from the whole action space  whiled improving policies. Nevertheless, Monte Carlo Learning only updates the Action-Value functions whose actions were taken on the previous episode. So there are probabily some actions having better returns than the actions we have tried. Sometimes we need to give them a trial. We call that problem the Exploration-Exploitation Delemma.

    It is necessary to try some new opened restaurant, rather than going to the usual place every day. 

    ε-Greedy Exploration is the algorithm that gives the agent probability=ε to choose randomly actions and 1-ε to stay on the optimal action.

  • 相关阅读:
    sql注入的防护
    mysql及sql注入
    机器学习之新闻文本分类。
    python导入各种包的方法——2
    爬去搜狐新闻历史类
    前端展示
    热词分析前端设计
    爬虫经验总结二
    爬虫经验总结一
    SpringBoot配置Druid数据库连接池
  • 原文地址:https://www.cnblogs.com/rhyswang/p/11258273.html
Copyright © 2011-2022 走看看