zoukankan      html  css  js  c++  java
  • Decision Tree

    Decision Tree builds classification or regression models in the form of a tree structure. It break down dataset into smaller and smaller subsets while an associated decision tree in incrementally developed at the same time.

    Decision Tree learning use top-down recursive method. The basic idea is to construct one tree with a fastest declines of information entropy, the entropy value of all instance in each leaf nodes is zero. Each internal node of the tree corresponding to an attribute, and each leaf node corresponding to a class label.
    Advantages:

    • Decision is easy to explain. It results in a set of rules. It is the same approach as humans generally follow while making decisions.
    • Interpretation of a complex Decision Tree can be simplified into visualization.It can be understood by everyone.
    • It almost have no hyper-parameter.

    Infomation Gain

    • The entropy is:
    • By the information entropy, we can calculate their Experience entropy:

      where:
    • we can also calculate their Experience conditions entropy:
    • By the information entropy, we can calculate their information gain:
    • Information gain ratio:
    • Gini index:

      For binary classification:

      For binary classification and on the condition of feature A:

    Three Building Algorithm

    • ID3: maximizing information gain
    • C4.5: maximizing the ratio of information gain
    • CART
      • Regression Tree: minimizing the square error.
      • Classification Tree: minimizing the Gini index.

    Decision Tree Algorithm Pseudocode

    • Place the best attribute of the dataset at the root of tree.The way to the selection of best attribute is shown in Three Building Algorithm above.
    • Split the train set into subset by the best attribute.
    • Repeat Step 1 and Step 2 on each subset until you find leaf nodes in all the branches of the tree.

    Random Forest

    Random Forest classifiers work around that limitation by creating a whole bunch of decision trees(hence 'forest'), each trained on random subsets of training samples(bagging, drawn with replacement) and features(drawn without replacement).Make the decision tree work together to get result.
    In one word, it build on CART with randomness.

    • Randomness 1:train the tree on the subsets of train set selected by bagging(sampling with replacement).

    • Randomness 2:train the tree on the subsets of features(sampling without replacement). For example, select 10 features from 100 features in dataset.

    • Randomness 3:add new feature by low-dimensional projection.

    后记

    装逼想用英文写博客,想借此锻炼自己的写作能力,无情打脸( ̄ε(# ̄)

    Ref:https://clyyuanzi.gitbooks.io/julymlnotes/content/rf.html
    http://www.saedsayad.com/decision_tree.htm
    http://dataaspirant.com/2017/01/30/how-decision-tree-algorithm-works/
    统计学习方法(李航)

  • 相关阅读:
    JavaScript可以做嵌入式开发了
    将指定字符串按指定长度进行剪切
    ASP.NET MVC Controller向View传值的几种方式
    SqlServer将数据库中的表复制到另一个数据库
    PAYPAL 支付,sandbox测试的时候遇到异常:请求被中止: 未能创建 SSL/TLS 安全通道,以及解决方法。
    c# ref与out的区别
    浅谈Tuple之C#4.0新特性
    CentOS7系列学习--修改用户密码
    关于页面多个ajax请求阻塞的问题
    关于overflow的学习
  • 原文地址:https://www.cnblogs.com/mengnan/p/9307613.html
Copyright © 2011-2022 走看看