zoukankan      html  css  js  c++  java
  • 学习笔记 | CMU 10703: Deep Reinforcement Learning and Control, Spring 2017

    Homepage

    Warm up

    Schedule

    Date Topics Lecturer Readings Additional Material
    Wed Jan 18 Course Introduction Katerina    
    Mon Jan 23 Intro to MDPs, POMDPs Katerina Sutton & Barto Ch 3  
    Wed Jan 25 Solving known MDPs: Dynamic Programming, Value Iteration, Policy Iteration, Policy Evaluation Katerina Sutton & Barto Ch 4  
    Mon Jan 30 Monte Carlo Learning: value function estimation and optimization Russ Sutton & Barto Ch 5  
    Wed Feb 1 Temporal Difference Learning: value function estimation and optimization, Q learning, SARSA Russ Sutton & Barto Ch 6  
    Mon Feb 6 Planning and Learning(1): Tabular methods, Dyna, Monte Carlo Tree Search Katerina Sutton & Barto Ch 8 A Survey of Monte Carlo Tree Search Methods http://www.cameronius.com/cv/mcts-survey-master.pdf
    Wed Feb 8 Value function approximation, Deep Learning, Convnets, backpropagation Russ    
    Mon Feb 13 Value function approximation, Deep Learning, Convnets, backpropagation Russ    
    Wed Feb 15 Deep Q Learning : Double Q learning, replay memory Russ    
    Mon Feb 20 Policy Gradients (1): REINFORCE, Natural Policy gradients,Variance reduction in gradient estimation, Actor-Critic, Deep Actor-Critic, TRPO Russ Sutton & Barto Ch 13  
    Wed Feb 22 Policy Gradients (2) Russ    
    Mon Feb 27 Policy Gradients (3) Russ    
    Wed Mar 1 Closer look at Continuous Actions, Variational Autoencoders, multimodal stochastic policies Russ    
    Mon Mar 6 Exploration(1) Katerina   Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models
    https://arxiv.org/abs/1507.00814, Variational Information Maximizing Exploration https://arxiv.org/abs/1605.09674, visitation counts, hashing


    Wed Mar 8 Imitation learning(1): mimicking experts, behaviour cloning Katerina   An Invitation to Imitation http://www.ri.cmu.edu/publication_view.html?pub_id=7891 Generative adversarial imitation learning
    https://arxiv.org/abs/1606.03476
    Mon Mar 13 Spring break!      
    Wed Mar 15 Spring break!      
    Mon Mar 20 Imitation learning(2): Learning reward functions from demonstration, IOC, IRL     A Reduction of Imitation Learning and Structured Prediction
    to No-Regret Online Learning http://www.jmlr.org/proceedings/papers/v15/ross11a/ross11a.pdf, Generative adversarial imitation learning https://arxiv.org/abs/1606.03476, Maximum entropy inverse reinforcement learning http://www.aaai.org/Papers/AAAI/2008/AAAI08-227.pdf,Learning to search: Functional gradient techniques for imitation learning http://www.ri.cmu.edu/publication_view.html?pub_id=6410
    Wed Mar 22 Intro to optimal control, Differential Dynamic Programming, LQR, iterative-LQR Katerina    
    Mon Mar 27 Imitation learning(3): learning from optimal controllers, self trials Katerina   End-to-End Training of Deep Visuomotor Policies https://arxiv.org/pdf/1504.00702.pdf, PLATO: Policy Learning using Adaptive Trajectory Optimization, https://arxiv.org/pdf/1603.00622v3.pdf
    Wed Mar 29 Planning and Learning(2): Learning Forward/Backward Models from experience, Planning with learned forward models, simulation to real world adaptation Katerina   SE3-Nets: Learning Rigid Body Motion using Deep Neural Networks
    https://arxiv.org/pdf/1606.02378v2.pdf
    Mon Apr 3 Planning and Learning(3)      
    4 Case studies: Alpha Go, deep math Katerina    
    Mon Apr 10 Modular / Hierarchical RL (1): compositionality, temporal abstraction      
    Wed Apr 12 Modular / Hierarchical RL (2): Multi-task learning, curriculum learning Russ    
    Mon Apr 17 Exploration(2):Learning and exploration in 3D environments, Long Term Memory Russ    
    Wed Apr 19 Learning Motor Control: inspiration from Psychology   Sutton & Barto Ch 14,15  
    Mon Apr 24 Frontiers/Open Problems Katerina    
    Wed Apr 26 Project Presentations      
    Mon May 1 Project Presentations      
    Wed May 3 Project Presentations      

    Log

    Week 1:

    Jan 18 - Introduction

    Week 2:

    Jan 23 - Intro to MDPs, POMDPs

    • Slide
    • Sutton & Barto Ch 3
      • 3.1, 3.2, 3.3: 1/23/2017;

    Jan 25 - Solving known MDPs: Dynamic Programming, Value Iteration, Policy Iteration, Policy Evaluation

    • Slide
    • Sutton & Barto Ch 4
      • 4.1: 1/25/2017;
    • implement Markov Decision Processes in Python
      • AIMA Python file: mdp.py (code taken from Artificial Intelligence: A Modern Approach by Stuart Russell and Peter Norvig)

  • 相关阅读:
    Feli的生日礼物
    session cookie
    ArcGIS FLEXnet Licensing error:96,491错误解决
    CSS块元素与内联元素(转)
    铝伯世
    netsh修改IP及DNS

    JAVA代码查错(转)
    windows7查看占用端口的进程
    php 中文字符串截取子串
  • 原文地址:https://www.cnblogs.com/casperwin/p/6295396.html
Copyright © 2011-2022 走看看