zoukankan      html  css  js  c++  java
  • Deep RL Bootcamp Lecture 7: SVG, DDPG, and Stochastic Computation Graphs

     

     

     

    ^ is the square root of epsilon

     

     

    a simplified version of hard version

    a more smooth way to find correct solution

     

    the first term is the REINFORCE term, and the seconde term is our grad log probability of our loss

     

    b is a stochastic node 

     

     

          

    more formula derivations are ignored.

  • 相关阅读:
    jQuery基础一
    JavaScript基础三
    JavaScript基础二
    JavaScript基础一
    POJ
    UVA
    HDU
    Gym
    POJ
    HDU
  • 原文地址:https://www.cnblogs.com/ecoflex/p/8977893.html
Copyright © 2011-2022 走看看