【Machine Learning】决策树之简介（1）

zoukankan html css js c++ java

【Machine Learning】决策树之简介（1）
Content

1.decision tree representation

2.ID3：a top down learning algorithm

3.expressiveness of data 可表达性

4.bias of ID3 偏差

5.best attributes 最佳属性

Gain（S,A）信息增益

6.dealing with overfitting 避免过拟合

一、简介 Decision Trees （决策树）

1.1 Steps

1.pick best attribute(挑选最佳属性)

2.Ask Question

3.follow the answer path

4.repeat,go back to 1 until got an answer

1.2 决策树可表达性
- A AND B
- A XOR B 异或
exclusive OR，或缩写成xor异或（xor）

理解：

1.异或的数学符号为“⊕”，即模2加

2.相异出 “或”的结果—— 1 （理解简记法）

3.当人们英语表达的 or —— means either……or ，actually it's xor in math

eg. 你是想去游泳，还是想去看电影？

二者选其一，相异时output 为1。（你不可能同时去两个地方，相同时output为0）

二、决策树算法之ID3

三、其他注意事项

3.1 when do we stop？

1.what about noise

2.overfitting（过拟合）

树过大，过复杂，违反了奥卡姆剃刀定律

3.哪些方法可以帮助避免过拟合？

1)交叉验证（cross-validation）

2)剪枝(Pruning)——缩小决策树

3)output : vote

3.2 在同一路径上重复一个询问属性有意义吗？

解答：

1.对于非连续属性，没必要

2.对于连续属性，有必要

例如属性为age ，node：20<age<30?

if no ,则还需要再问询age属性
eg. node: age<20?
查看全文

相关阅读:
聊聊Flame Graph（火焰图）的那些事
 Dynamometer：HDFS性能扩展测试工具
 论分布式系统中单一锁控制的优化
 聊聊磁盘数据的损坏
 分级副本存储：一种更具效益成本的数据容错策略
 分布式存储系统中的Data Scrubbing机理
 论一个成熟分布式系统的工具类设计
 聊聊Raft一致性协议以及Apache Ratis
ListenableFuture的状态同步和原子更新
 2018-9-1-win10-uwp-轻量级-MVVM-框架入门-2.1.5.3199

原文地址：https://www.cnblogs.com/Neo007/p/8257295.html

【Machine Learning】决策树之简介（1）

1.decision tree representation

2.ID3：a top down learning algorithm

3.expressiveness of data 可表达性

4.bias of ID3 偏差

5.best attributes 最佳属性

Gain（S,A） 信息增益

6.dealing with overfitting 避免过拟合

一、简介 Decision Trees （决策树）

1.1 Steps

1.2 决策树可表达性

二、决策树算法之ID3

三、其他注意事项

3.1 when do we stop？

3.2 在同一路径上重复一个询问属性有意义吗？

Gain（S,A）信息增益