zoukankan      html  css  js  c++  java
  • 强化学习 相关资源

      最近因为某个不可描述的原因需要迅速用强化学习完成一个小实例,但是之前完全不懂强化学习啊,虽然用了人家的代码但是在找代码的过程中还是发现了很多不错的强化学习资源,决定mark下来以后学习用

    【1】如何用简单例子讲解 Q - learning 的具体过程?

      https://www.zhihu.com/question/26408259

    【2】最简单的讲解Q-Learning过程的例子

      http://mnemstudio.org/path-finding-q-learning-tutorial.htm

      注:这个网站上还附带了代码,可惜都是用C++,java写的,看不懂,哎,感觉是一个不错的资源网站

      这篇博客是对应的中文翻译最简单的讲解Q-Learning过程的例子

      还有人用python按照上述教程完成了复现:

      https://github.com/JasonQSY/ML-Weekly/blob/master/P5-Reinforcement-Learning/Q-learning/Q-Learning-Get-Started.ipynb

    具体代码如下:

    import numpy as np
    import random
    
    In [44]:
    # initial
    q = np.zeros([6, 6])
    q = np.matrix(q)
    
    r = np.array([[-1, -1, -1, -1, 0, -1], [-1, -1, -1, 0, -1, 100], [-1, -1, -1, 0, -1, -1], [-1, 0, 0, -1, 0, -1], [0, -1, -1, 0, -1, 100], [-1, 0, -1, -1, 0, 100]])
    r = np.matrix(r)
    
    gamma = 0.8
    
    In [45]:
    # training
    for i in range(100):
        # one episode
        state = random.randint(0, 5)
        while (state != 5):
            # choose positive r-value action randomly
            r_pos_action = []
            for action in range(6):
                if r[state, action] >= 0:
                    r_pos_action.append(action)
            
            next_state = r_pos_action[random.randint(0, len(r_pos_action) - 1)]
            q[state, next_state] = r[state, next_state] + gamma * q[next_state].max()
            state = next_state
    
    In [46]:
    # verify
    for i in range(10):
        # one episode
        print("episode: " + str(i + 1))
        
        # random initial state
        state = random.randint(0, 5)
        print("the robot borns in " + str(state) + ".")
        count = 0
        while (state != 5):
            # prevent endless loop
            if count > 20:
                print('fails')
                break
                
            # choose maximal q-value action randomly
            q_max = -100
            for action in range(6):
                if q[state, action] > q_max:
                    q_max = q[state, action]
                
            q_max_action = []
            for action in range(6):
                if q[state, action] == q_max:
                    q_max_action.append(action)
                    
            next_state = q_max_action[random.randint(0, len(q_max_action) - 1)]
            
            print("the robot goes to " + str(next_state) + '.')
            state = next_state
            count = count + 1

    【3】这个人的博客有强化学习系列

      http://www.algorithmdog.com/ml/rl-series

    【4】http://blog.csdn.net/u012192662/article/category/6394979

      粗看感觉写的还可以

      

  • 相关阅读:
    HDU2036 计算多边形的面积
    poj 3648 线段树成段更新
    线段树基本知识
    计算几何基本模板
    最长递增子序列问题—LIS
    poj 2503
    Python基础(5)_字符编码、文件处理
    Python基础(4)_字典、集合、bool值
    Python基础(3)_可变对象与不可变对象、列表、元祖和字典
    流程控制练习
  • 原文地址:https://www.cnblogs.com/huanjing/p/6817941.html
Copyright © 2011-2022 走看看