什么是Experience Replay和Seperate Target Networks

zoukankan html css js c++ java

什么是Experience Replay和Seperate Target Networks
什么是Experience Replay, Seperate Target Network
- 最近看到的一篇论文中提到的面对RL network不稳定甚至发散两个方法。
  
  non-linear function approximator is unstable or even to diverge.
  
  In RL, it's common to leverage a neural network as the function approximator.
- 阅读Human-level control through deep reinforcement learning的笔记。
Experience Replay

Goals
- 问题：
  
  ML都假设数据是IID。
  
  但是RL连续online训练中，连续的samples有着很强的关联，所以可能导致network会陷入局部最小值。
- 优点：
  
  To smooth out learning, avoid oscillations or divergence in the parameters.
  
  Randomizing the samples breaks these correlations and therefore reduces the variance of the updates.
  
  Experience replay使训练任务更加像常见的监督学习了，可以简化调试、测试算法。
Method
- 主要是用一个buffer存之前的experiences <s, a, r, s'>。
- 每次从update的时候均匀地从Buffer中随机sample来用。
Seperate Target Network

Goals and Strengths
- To improve the stability of method.
- Reduces oscillations or divergence of the policy.
Method
- 主要是每C次updates之后，就Clone Q网络作为target network Q'，Q' 用于生成targets（假定的现实值）。
- Target network是旧的参数network，prediction network是在更新的network。
- 用旧的参数来生成targets给更新Q和update影响targets之间产生了一个delay，因此making divergence or oscillations much more unlikely.
算法
- 使用experience replay and seperate target network.
Reference
- Human-level control through deep reinforcement learning
查看全文

相关阅读:
linux 时间设置
 linux
linux 关闭防火墙
 GIS-008-ArcGIS JS API 全图
 GIS-007-Terrain跨域访问
 GIS-006-ArcGIS API 空间关系
 Python 中文乱码
 GIS-005-Dojo & jQuery 事件处理
 GIS-004-Cesium版权信息隐藏
 GIS-003-在线地图下载及应用

原文地址：https://www.cnblogs.com/xuwanwei/p/15723759.html

什么是Experience Replay和Seperate Target Networks

什么是Experience Replay, Seperate Target Network

Experience Replay

Goals

Method

Seperate Target Network

Goals and Strengths

Method

算法

Reference