zoukankan      html  css  js  c++  java
  • 决策树分类

    本文旨在演示rpart包的决策树分类用法,以及利用rpart.plot对结果进行可视化。决策树(分类树)是一种十分常用的分类方法,是一种监管学习;所谓监管学习就是给定一堆样本,每个样本都有一组属性和一个类别,这些类别是事先确定的,那么通过学习得到一个分类器,这个分类器能够对新出现的对象给出正确的分类。这样的机器学习就被称之为监督学习。

    测试数据选用MushroomDataSet(蘑菇数据集),其数据属性如下:

    Attribute Information:

    1. cap-shape: bell=b,conical=c,convex=x,flat=f, knobbed=k,sunken=s
    2. cap-surface: fibrous=f,grooves=g,scaly=y,smooth=s
    3. cap-color: brown=n,buff=b,cinnamon=c,gray=g,green=r,pink=p,purple=u,red=e,white=w,yellow=y
    4. bruises?: bruises=t,no=f
    5. odor: almond=a,anise=l,creosote=c,fishy=y,foul=f, musty=m,none=n,pungent=p,spicy=s
    6. gill-attachment: attached=a,descending=d,free=f,notched=n
    7. gill-spacing: close=c,crowded=w,distant=d
    8. gill-size: broad=b,narrow=n
    9. gill-color: black=k,brown=n,buff=b,chocolate=h,gray=g,green=r,orange=o,pink=p,purple=u,red=e, white=w,yellow=y
    10. stalk-shape: enlarging=e,tapering=t
    11. stalk-root: bulbous=b,club=c,cup=u,equal=e, rhizomorphs=z,rooted=r,missing=?
    12. stalk-surface-above-ring: fibrous=f,scaly=y,silky=k,smooth=s
    13. stalk-surface-below-ring: fibrous=f,scaly=y,silky=k,smooth=s
    14. stalk-color-above-ring: brown=n,buff=b,cinnamon=c,gray=g,orange=o,pink=p,red=e,white=w,yellow=y
    15. stalk-color-below-ring: brown=n,buff=b,cinnamon=c,gray=g,orange=o,pink=p,red=e,white=w,yellow=y
    16. veil-type: partial=p,universal=u
    17. veil-color: brown=n,orange=o,white=w,yellow=y
    18. ring-number: none=n,one=o,two=t
    19. ring-type: cobwebby=c,evanescent=e,flaring=f,large=l,none=n,pendant=p,sheathing=s,zone=z
    20. spore-print-color: black=k,brown=n,buff=b,chocolate=h,green=r,orange=o,purple=u,white=w,yellow=y
    21. population: abundant=a,clustered=c,numerous=n, scattered=s,several=v,solitary=y
    22. habitat: grasses=g,leaves=l,meadows=m,paths=p, urban=u,waste=w,woods=d
      data数据下载地址参见关联规则-R语言实现一文。

    代码

    library(rpart)
    library(rpart.plot)
    data =read.csv(file.choose(),head=F)
    str(data_ms)
    
    table(data_ms$X1)
    
       e    p 
    4208 3916 
    
    prop.table(table(data_ms$X1))
    
            e         p 
    0.5179714 0.4820286 
    
    prop.table(table(data_ms$X1,data_ms$X2),2)
    
        b         c         f         k         s         x
      e 0.8938053 0.0000000 0.5063452 0.2753623 1.0000000 0.5328228
      p 0.1061947 1.0000000 0.4936548 0.7246377 0.0000000 0.4671772
    
    fit <- rpart(X1 ~.,
                 data=data_ms,
                 method="class")
    #分类结果可视化 
    rpart.plot(reg, type=4, extra=1,shadow.col="gray", box.col="green",
               border.col="blue", split.col="red",split.cex=1.2,main="决策树")
    

    可以看出,蘑菇数据更适合通过决策树算法进行分类处理,分类规则评判蘑菇有毒与否清晰明了。

    反馈与建议

  • 相关阅读:
    2020/5/18
    2020/5/17
    2020/5/15
    2020/5/13
    2020/5/12
    服务器环境配置五大免费主机系统
    6:运算符
    5:练习题
    4:Python的while循环
    3:Python条件语句
  • 原文地址:https://www.cnblogs.com/shangfr/p/4922896.html
Copyright © 2011-2022 走看看