初识ActorCritic

zoukankan html css js c++ java

初识ActorCritic
什么是Actor-Critic
- 之前通过李宏毅、莫烦Python的视频了解了Actor-Critic的基本概念。
- 现在看看Actor-Critic的论文继续了解一下。
Critic-Only and Actor-Only
- 这篇文章之前的算法是Critic-Only, or Actor-Only。
- Actor-only：使用policy gradient，通过simulation来评估gradient好坏。
  
  缺点：主要是estimation上的问题。
  
  gradient estimators可能由很大的方差variance。
  
  A new gradient is estimated independently of past estimates. 也就是说对gradient的评估并没有很好的用过去的经验，并没有很好的“learning”
- Critic-only：只有value function approximation，学习近似Bellman equation的solution, 希望学习出一个near-optimal policy。
  
  优点：
  
  也许可以在构造一个"good" approximation of value function上成功
  
  和actor-only相比收敛快 (due to variance reduction)。
  
  缺点：
  
  在得到一个near-optimality的resulting policy上缺少可靠的保证。
  
  Convergence is guaranteed in very limited settings.
Brief Introduction of Actor-Critic
- 用策略与环境互动。
  
  Critic：根据互动结果，用TD或者MC学习Value值。
  
  Agent：然后用这个Value值来policy gradient，更新policy。
  
  用更新了的policy再去环境互动。
代码
- 看莫烦python的代码
Reference
- Actor-Critic Algorithms https://proceedings.neurips.cc/paper/1999/hash/6449f44a102fde848669bdd9eb6b76fa-Abstract.html
- 李宏毅 Actor-Critic相关视频
查看全文

相关阅读:
6-2 播放厅模块基础环境构建
 6-1 Eureka实战之播放厅模块章节介绍
 5-18 影片模块作业思路讲解
 5-17 影院模块表现层构建
 5-16 影院模块基础环境构建
 5-15 影院模块表结构介绍
 5-14 影片模块整体测试
 5-13 影片保存实现
 5-12 影片保存准备
 5-11 影片详情查询实现

原文地址：https://www.cnblogs.com/xuwanwei/p/15720895.html

什么是Actor-Critic

Critic-Only and Actor-Only

Brief Introduction of Actor-Critic

代码

Reference