zoukankan      html  css  js  c++  java
  • CS294-112 深度强化学习 秋季学期(伯克利)NO.1 Introduction NO.2 Supervised learning and imitation

    前面弄错了,应该看2017的秋季课,结果看了春季课了。

     

     

     

     

     

     

     

     

     

     

     

           

     

      neural network control a virtual robot, by imitating human motion

      

     

    Domain shift cause the failure of supervised learning in imitation learning.

     

    human expert said "turn left!!!" (step 3)

     

     

     

     

     

      

    we don't want the average of the two expected behaviors. when the actions are discrete, the model works well.

      however, this is the gaussian output of continuous actions

     solution:

     

     

    add a noise input here.

    the defect is implicit density model is harder to train.

    recommend to look at VAE and GAN and stan??? variational gradient descent, which are three methods to train implicit density models

    upside: capable to mimic any form of function

    downside: much more complex to implement

     

    the second net is conditionally sampling from the first net

     

     It's time for case study

     

     

     

     

     

     

     

     

    this is a human with three go-pro on his head...

     

     

     

     

     

     

     

                    

     

     

     

    robot: 300 bucks

    game control: 100 bucks

     

     

     

     

     

     c for cost

    r for reward (the negative of cost)

     

     you know there maybe a little bit culture differences here. so like americans like to believe life is for reward, but maybe russians behavior more pessimistically. 

    HAhahahahahahaha....

     

     

      

      reinforcement learning in CS is exactly the same as optimal control in dynamic programming

  • 相关阅读:
    Pyhton学习blog地址(四十五)
    网络基础之网络协议篇(四十四)
    网络编程——socket(四十三)
    异常处理(四十二)
    Java 源代码和 C 源代码的运行区别
    Java 访问控制关键字
    Confluence 6 快捷键
    MariaDB 服务器在 MySQL Workbench 备份数据的时候出错如何解决
    Confluence 6 教程:在 Confluence 中导航
    Confluence 6 开始使用
  • 原文地址:https://www.cnblogs.com/ecoflex/p/9083688.html
Copyright © 2011-2022 走看看