zoukankan html css js c++ java

强化学习相关资源

　　最近因为某个不可描述的原因需要迅速用强化学习完成一个小实例，但是之前完全不懂强化学习啊，虽然用了人家的代码但是在找代码的过程中还是发现了很多不错的强化学习资源，决定mark下来以后学习用

【1】如何用简单例子讲解 Q - learning 的具体过程？

　　https://www.zhihu.com/question/26408259

【2】最简单的讲解Q-Learning过程的例子

　　http://mnemstudio.org/path-finding-q-learning-tutorial.htm

　　注：这个网站上还附带了代码，可惜都是用C++，java写的，看不懂，哎，感觉是一个不错的资源网站

　　这篇博客是对应的中文翻译最简单的讲解Q-Learning过程的例子

　　还有人用python按照上述教程完成了复现：

　　https://github.com/JasonQSY/ML-Weekly/blob/master/P5-Reinforcement-Learning/Q-learning/Q-Learning-Get-Started.ipynb

具体代码如下：

import numpy as np
import random

In [44]:

# initial
q = np.zeros([6, 6])
q = np.matrix(q)

r = np.array([[-1, -1, -1, -1, 0, -1], [-1, -1, -1, 0, -1, 100], [-1, -1, -1, 0, -1, -1], [-1, 0, 0, -1, 0, -1], [0, -1, -1, 0, -1, 100], [-1, 0, -1, -1, 0, 100]])
r = np.matrix(r)

gamma = 0.8

In [45]:

# training
for i in range(100):
    # one episode
    state = random.randint(0, 5)
    while (state != 5):
        # choose positive r-value action randomly
        r_pos_action = []
        for action in range(6):
            if r[state, action] >= 0:
                r_pos_action.append(action)
        
        next_state = r_pos_action[random.randint(0, len(r_pos_action) - 1)]
        q[state, next_state] = r[state, next_state] + gamma * q[next_state].max()
        state = next_state

In [46]:

# verify
for i in range(10):
    # one episode
    print("episode: " + str(i + 1))
    
    # random initial state
    state = random.randint(0, 5)
    print("the robot borns in " + str(state) + ".")
    count = 0
    while (state != 5):
        # prevent endless loop
        if count > 20:
            print('fails')
            break
            
        # choose maximal q-value action randomly
        q_max = -100
        for action in range(6):
            if q[state, action] > q_max:
                q_max = q[state, action]
            
        q_max_action = []
        for action in range(6):
            if q[state, action] == q_max:
                q_max_action.append(action)
                
        next_state = q_max_action[random.randint(0, len(q_max_action) - 1)]
        
        print("the robot goes to " + str(next_state) + '.')
        state = next_state
        count = count + 1

【3】这个人的博客有强化学习系列

　　http://www.algorithmdog.com/ml/rl-series

【4】http://blog.csdn.net/u012192662/article/category/6394979

　　粗看感觉写的还可以

查看全文

相关阅读:
客户端及服务端_终端模拟
 安装-homebrew
客户端及服务端_网页访问
 python server搭建
 ping不通localhost但是可以ping通ip地址
 CentOS7 修改Jenkins以root用户运行
 tomcat配置外部静态资源映射路径(windows和Linux部署)
Centos 6.10 安装 Jenkins
axios ios 微信浏览器session问题
 微信公众号分享接口签名通过分享无效果（JSSDK自定义分享接口的策略调整）

原文地址：https://www.cnblogs.com/huanjing/p/6817941.html

强化学习 相关资源

强化学习相关资源