Deep RL Bootcamp Lecture 8 Derivative Free Methods - 走看看

zoukankan html css js c++ java

Deep RL Bootcamp Lecture 8 Derivative Free Methods

you wouldn't try to explore any problem structure in DFO

low dimension policy

30 degrees of freedom

120 paramaters to tune

keep the positive results in a smooth way.

How does evolutionary method work well in high dimensional setting?

If you normalize the data well, evolutionary method could work well in MOJOCO, with random search.

Could always only get stuck at local minima.

humanoid 200k parameters need to be tuned, and it's learnt by evolutionary method.

The four videos are actually four different local minima, and once you get stuck on it, it can never get out of it.

evolutionary method is roughly 10 times worse than action space policy gradient.

evolutionary method is hard to tune because previously people didn't get it to work with deep net

查看全文

相关阅读:
Excel中substitute替换函数的使用方法
 如何在Excel中提取小数点后面的数字？
提升单元测试体验的利器--Mockito使用总结
 SpringMVC项目读取不到外部CSS文件的解决办法及总结
 java8 Lambda表达式的新手上车指南(1)--基础语法和函数式接口
 Spring-data-redis操作redis知识总结
 优雅高效的MyBatis-Plus工具快速入门使用
 Thrift入门初探(2)--thrift基础知识详解
 Thrift入门初探--thrift安装及java入门实例
 spring事件驱动模型--观察者模式在spring中的应用

原文地址：https://www.cnblogs.com/ecoflex/p/8979721.html

Copyright © 2011-2022 走看看