Deep RL Bootcamp Lecture 6: Nuts and Bolts of Deep RL Experimentation - 走看看

zoukankan html css js c++ java

Deep RL Bootcamp Lecture 6: Nuts and Bolts of Deep RL Experimentation

fast feedback to robot with better shape reward func, and learning could be much faster

open ai baseline

rllab



multiple tasks and multiple seeds to test the robustness.



don't believe only one trial's result, it could just be a fortunate trial, unless the imporvement is huge.

KL = 0.1 is a small update

KL = 10 is a large update

DQN is not effective enough in many problems, especially on continuous control problem.

But that doesn't mean it's a bad algorithm.

So you shouldn't expect an algorithm solving everything without tunning, at least now.

batch norm, dropout, or big networks? no, we try 2 layers with 64 units.

at least now these techniques are not suitable for RL.

if you don't care much about sample complexity,  PG are probably the way to go.

qlearning is more implicit what is going in it , while PG is just gradient descent

dqn and it relatives work well game like image as input, while policy based works better on continuous control tasks, like locomotion

https://en.wikipedia.org/wiki/Sample_complexity

I use randomly initialization of hyperparameters........

audience laugh....

查看全文

相关阅读:
证券交易买进卖出手续费公式
 iOS学习之 plist文件的读写
 蓝桥杯——基础练习之字母图形
 SNMP协议具体解释
 Android开发框架SmartAndroid2.0 强劲框架
 隐藏快捷方式扩展名（.lnk）
Filter及FilterChain的使用具体解释
 uva 1393
深入浅出Windows BATCH
科大讯飞2014公布会看点二：智能语音装进车载车机！

原文地址：https://www.cnblogs.com/ecoflex/p/8977582.html

Copyright © 2011-2022 走看看