Reinforcement Learning: An Introduction读书笔记(1)--Introduction

zoukankan html css js c++ java

Reinforcement Learning: An Introduction读书笔记(1)--Introduction
> 目录 <
> 笔记 <

learning & intelligence 的基本思想：learning from interaction

RL的定义：

RL is learning what to do--how to map situations and actions--so as to maximize a numerical reward signal.

RL problems: a learning agent interacting over time with its environment to achieve a goal.

（sensation，action & goal三要素： agent需要能够感知环境的states，采取actions来影响state，有1个or多个与环境中的state相关的目标。）

2个特点：

1. trial-and-error search：不告诉learner该如何做，而是让他通过不断地尝试来发现该采取什么行为来获得更多的奖励。

2. delayed reward: 行为不仅仅影响immediate reward，还影响next situation，甚至是随后所有的subsequent rewards。

RL四要素：

1. policy: 定义了learning agent在特定时刻的行为表现。

2. reward signal: 定义了RL problem的目标，反映了what is good in an immediate sense

3. value function：定义了what is good in the long run。也就是说，某一state的value指的是，agent从现在开始一直到未来可以得到的累计回报的期望。

4. model of the environment (optional, only for model-based methods)：它模仿了环境的行为，也就是说给出state和action，model可以预测next state和reward。

与其他learning methods比较：

1. RL不同于supervised learning，因为监督学习是learning for a training set of labelled examples provided by a knowledgeable external supervisor.

2. RL不同于unsupervised learning，因为非监督学习主要是finding structure hidden in collections of unlabeled data。虽然RL一定程度上可以看成是非监督学习 (∵不依赖examples of correct behavior)，但实际上两者并不相同，因为RL的目的是maximize a reward signal而非trying to find hidden structure. 此外，RL和时间有很大的关系，而且反馈都是具有时间效应的。

3. RL其他特点：

(1) trade-off between exploration and exploitation是其独有的challenge;

(2) 关注的不是isolated subproblems，而是whole problem of a goal-directed agent interacting with an uncertain environment；

(3) 多学科交叉：数学、心理学、神经科学......

与evolutionary methods (e.g. 遗传算法)的比较：

在(1) 问题空间不大 or 有足够时间去搜索的情况下, (2)或者learning agent不能获知环境完整state的情况下，evolutionary methods比较有效。

但是，RL利用了每个个体与环境交互所得到的信息去学习，因此多数情况下RL更好。

具体例子—井字棋(tic-tac-toe)：

分析了用不同的方法 (e.g. minimax、动态规划、进化方法、RL )来解决

RL早期发展史：

略
查看全文

相关阅读:
Linux下安装配置词典GoldenDict
ubuntu 安装LaTex
ubuntu 安装Opencv2.4.7
Ubuntu 安装Matlab2010a
Ubuntu 挂载ISO文件的方法
 ubuntu安装Java jdk1.7.0
VDI转vmdk(VirtualBox与VMware硬盘格式转换)[转]
oracle忘记密码，修改密码，解锁
 SQL Server 2008中文企业版下载地址和序列号[转]
HTTP 错误 500.19 – Internal Server Error web.config 文件的 system.webServer/httpErrors 节中不允许绝对物理路径“C:inetpubcusterr”[转]

原文地址：https://www.cnblogs.com/HappyLion-ve/p/9811646.html