Decision tree and the ID3 algorithm - 走看看

zoukankan html css js c++ java

Decision tree and the ID3 algorithm
In data mining there are four main problems, clustering, classifying, regression and dimension reduce, to be discussed. And this issue is mainly about Decision Tree in classification. For some data that we’ve known, calculate the decision tree, and use the tree to deal with new points, telling which group the new, coming points belong to.

For one decision tree, each node is one decision and its leaves are the final decisions. Taking the following graph as an example:

Figure1

To solve the problem that split the dataset randomly making the decision tree too high, Figure2 e.g., we use the ID3 algorithm to split the dataset into subsets having an information entropy as small as possible.

Figure2

Among all the possible splits, ID3 pick the one that maximize the entropy gain.

Examples:
1. 8bits strings:
dataset1 {11110000, 10101010, 11000011}

dataset2 {00000000, 11111111}

For every element in dataset1 the entropy is:

                            E(S1) =-(4/8)*log(4/8)-(1-4/8)*log(1-4/8)= 1

For every element in dataset2 the entropy is:

                            E(S2) =-(8/8)*log(8/8)= 0

　　2.　8 points, 2 of them are red and 6 of them are blue. Split the dataset by line x=2

  When classify the data above, the decision tree can be like this:

References:

http://www.marmakoide.org/download/teaching/dm/dm-decision-trees.pdf

More information about decision tree demo using ID3 algorithm.
查看全文

相关阅读:
帧锁定同步算法
 为 Raft 引入 leader lease 机制解决集群脑裂时的 stale read 问题
 etcd：从应用场景到实现原理的全方位解读
 给定一个二叉搜索树(BST)，找到树中第 K 小的节点
 UDP如何实现可靠传输
 理解TCP/IP三次握手与四次挥手的正确姿势
 Redis持久化
 Redis提供的持久化机制（RDB和AOF）
redis渐进式 rehash
redis rehash

原文地址：https://www.cnblogs.com/sunshinewill/p/2985734.html

Copyright © 2011-2022 走看看