zoukankan      html  css  js  c++  java
  • CS294-112 深度强化学习 秋季学期(伯克利)NO.1 Introduction NO.2 Supervised learning and imitation

    前面弄错了,应该看2017的秋季课,结果看了春季课了。

     

     

     

     

     

     

     

     

     

     

     

           

     

      neural network control a virtual robot, by imitating human motion

      

     

    Domain shift cause the failure of supervised learning in imitation learning.

     

    human expert said "turn left!!!" (step 3)

     

     

     

     

     

      

    we don't want the average of the two expected behaviors. when the actions are discrete, the model works well.

      however, this is the gaussian output of continuous actions

     solution:

     

     

    add a noise input here.

    the defect is implicit density model is harder to train.

    recommend to look at VAE and GAN and stan??? variational gradient descent, which are three methods to train implicit density models

    upside: capable to mimic any form of function

    downside: much more complex to implement

     

    the second net is conditionally sampling from the first net

     

     It's time for case study

     

     

     

     

     

     

     

     

    this is a human with three go-pro on his head...

     

     

     

     

     

     

     

                    

     

     

     

    robot: 300 bucks

    game control: 100 bucks

     

     

     

     

     

     c for cost

    r for reward (the negative of cost)

     

     you know there maybe a little bit culture differences here. so like americans like to believe life is for reward, but maybe russians behavior more pessimistically. 

    HAhahahahahahaha....

     

     

      

      reinforcement learning in CS is exactly the same as optimal control in dynamic programming

  • 相关阅读:
    无限级树结构
    Web Host下的URL路由
    EventBus
    C#与Java对比学习:类型判断、类与接口继承、代码规范与编码习惯、常量定义
    SQL语法的重要知识点总结
    【经典算法】——KMP,深入讲解next数组的求解
    多线程基础2
    IOS6:在你的APP内使用PASSBOOK
    缓存子系统如何设计
    趋势:Chrome为打包应用提供强大新特性
  • 原文地址:https://www.cnblogs.com/ecoflex/p/9083688.html
Copyright © 2011-2022 走看看