zoukankan      html  css  js  c++  java
  • CS294-112 深度强化学习 秋季学期(伯克利)NO.6 Value functions introduction NO.7 Advanced Q learning

     --------------------------------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------------------------------

     

     

     

    understand that correlated samples cause problem. and how paralled solve the problem 

     another solution is replay buffers, fully ultilizing the advantage of off policy in Q-learning.

     

     

     

    there's still a problem: Q learning is not gradient descent

     

     

    divide Q function into two parts: the target net and the evolving net. 

    sacrifice speed to get the convergence.

     

     

     

     

     

     

     

    overestimation of Natural DQN

     

     

     

     

     

     

     

     

    get trouble in left and right dilemma of avoiding bumping on a tree

     

     

     

     

     

     

     

     

     

  • 相关阅读:
    Go 命令行参数,JSON 序列化与反序列化
    Go 文件操作
    Go 多态
    Go 接口
    Go 面向对象三大特性
    Go 面向对象编程应用
    Go 结构体方法
    Go 面向对象之结构体
    Go Map
    vue安装 vue-cli安装
  • 原文地址:https://www.cnblogs.com/ecoflex/p/9094123.html
Copyright © 2011-2022 走看看