zoukankan      html  css  js  c++  java
  • CS294-112 深度强化学习 秋季学期(伯克利)NO.1 Introduction NO.2 Supervised learning and imitation

    前面弄错了,应该看2017的秋季课,结果看了春季课了。

     

     

     

     

     

     

     

     

     

     

     

           

     

      neural network control a virtual robot, by imitating human motion

      

     

    Domain shift cause the failure of supervised learning in imitation learning.

     

    human expert said "turn left!!!" (step 3)

     

     

     

     

     

      

    we don't want the average of the two expected behaviors. when the actions are discrete, the model works well.

      however, this is the gaussian output of continuous actions

     solution:

     

     

    add a noise input here.

    the defect is implicit density model is harder to train.

    recommend to look at VAE and GAN and stan??? variational gradient descent, which are three methods to train implicit density models

    upside: capable to mimic any form of function

    downside: much more complex to implement

     

    the second net is conditionally sampling from the first net

     

     It's time for case study

     

     

     

     

     

     

     

     

    this is a human with three go-pro on his head...

     

     

     

     

     

     

     

                    

     

     

     

    robot: 300 bucks

    game control: 100 bucks

     

     

     

     

     

     c for cost

    r for reward (the negative of cost)

     

     you know there maybe a little bit culture differences here. so like americans like to believe life is for reward, but maybe russians behavior more pessimistically. 

    HAhahahahahahaha....

     

     

      

      reinforcement learning in CS is exactly the same as optimal control in dynamic programming

  • 相关阅读:
    ASP.net 上传
    asp.net 上传
    asp.net dropdownlist和listbox
    jqeury之平移轮播
    vs2013的asp.net 管理
    jqeury轮播
    jqeury之轮播图
    重温委托(delegate)和事件(event)
    Log4Net
    解决SQL Server 阻止了对组件 'Ad Hoc Distributed Queries' 的 STATEMENT'OpenRowset/OpenDatasource' 的访问的方法
  • 原文地址:https://www.cnblogs.com/ecoflex/p/9083688.html
Copyright © 2011-2022 走看看