zoukankan      html  css  js  c++  java
  • CS294-112 深度强化学习 秋季学期(伯克利)NO.1 Introduction NO.2 Supervised learning and imitation

    前面弄错了,应该看2017的秋季课,结果看了春季课了。

     

     

     

     

     

     

     

     

     

     

     

           

     

      neural network control a virtual robot, by imitating human motion

      

     

    Domain shift cause the failure of supervised learning in imitation learning.

     

    human expert said "turn left!!!" (step 3)

     

     

     

     

     

      

    we don't want the average of the two expected behaviors. when the actions are discrete, the model works well.

      however, this is the gaussian output of continuous actions

     solution:

     

     

    add a noise input here.

    the defect is implicit density model is harder to train.

    recommend to look at VAE and GAN and stan??? variational gradient descent, which are three methods to train implicit density models

    upside: capable to mimic any form of function

    downside: much more complex to implement

     

    the second net is conditionally sampling from the first net

     

     It's time for case study

     

     

     

     

     

     

     

     

    this is a human with three go-pro on his head...

     

     

     

     

     

     

     

                    

     

     

     

    robot: 300 bucks

    game control: 100 bucks

     

     

     

     

     

     c for cost

    r for reward (the negative of cost)

     

     you know there maybe a little bit culture differences here. so like americans like to believe life is for reward, but maybe russians behavior more pessimistically. 

    HAhahahahahahaha....

     

     

      

      reinforcement learning in CS is exactly the same as optimal control in dynamic programming

  • 相关阅读:
    关于JsonObject的笔记
    addHeader() 与 setHeader() 区别
    BeanUtils.copyProperties(A,B)字段复制用法
    servletcontext的小结
    枚举笔记
    关于spring mvc接受前台参数的笔记
    关于session和cookie
    servlet学习
    tomcat到底是干嘛的
    .json文件报错 ,点进去是Expected value at 1:0
  • 原文地址:https://www.cnblogs.com/ecoflex/p/9083688.html
Copyright © 2011-2022 走看看