CS294-112深度增强学习课程（加州大学伯克利分校 2017）NO.4 Learning policies by imitating optimal controllers - 走看看

zoukankan html css js c++ java

CS294-112深度增强学习课程（加州大学伯克利分校 2017）NO.4 Learning policies by imitating optimal controllers

There are some problems: mismatch of model and reality; gradient explosion

so, the dynamics can be quite messy, and backpropogating can be quite problematic.

sudden change in velocity and so on. schochastic system. gradient descent can be tough.

can we apply this trajectory optimization method to optimize policy?



GPS: guided policy search

in this case, o_t is from the camera and the joint velocity







https://katefvision.github.io/katefSlides/imitate_controlers_katef.pdf

查看全文

相关阅读:
dd——留言板再加验证码功能
 怎样去除织梦版权信息中的Power by DedeCms
数据结构和算法的选择
 数据结构和算法9——哈希表
 数据结构与算法8——二叉树
 数据结构与算法7——高级排序
 数据结构与算法6——递归
 数据结构和算法5——链表
 数据结构与算法4——栈和队列
 数据结构与算法3——简单排序（冒泡、选择、插入排序）

原文地址：https://www.cnblogs.com/ecoflex/p/9078801.html

Copyright © 2011-2022 走看看