zoukankan      html  css  js  c++  java
  • 机器学习之路: python k近邻分类器 KNeighborsClassifier 鸢尾花分类预测

    使用python语言 学习k近邻分类器的api

    欢迎来到我的git查看源代码: https://github.com/linyi0604/MachineLearning

      1 from sklearn.datasets import load_iris
      2 from sklearn.cross_validation import train_test_split
      3 from sklearn.preprocessing import StandardScaler
      4 from sklearn.neighbors import KNeighborsClassifier
      5 from sklearn.metrics import classification_report
      6 
      7 '''
      8 k近邻分类器
      9 通过数据的分布对预测数据做出决策
     10 属于无参数估计的一种
     11 非常高的计算复杂度和内存消耗
     12 '''
     13 
     14 '''
     15 1 准备数据
     16 '''
     17 # 读取鸢尾花数据集
     18 iris = load_iris()
     19 # 检查数据规模
     20 # print(iris.data.shape)    # (150, 4)
     21 # 查看数据说明
     22 # print(iris.DESCR)
     23 '''
     24 Iris Plants Database
     25 ====================
     26 
     27 Notes
     28 -----
     29 Data Set Characteristics:
     30     :Number of Instances: 150 (50 in each of three classes)
     31     :Number of Attributes: 4 numeric, predictive attributes and the class
     32     :Attribute Information:
     33         - sepal length in cm
     34         - sepal width in cm
     35         - petal length in cm
     36         - petal width in cm
     37         - class:
     38                 - Iris-Setosa
     39                 - Iris-Versicolour
     40                 - Iris-Virginica
     41     :Summary Statistics:
     42 
     43     ============== ==== ==== ======= ===== ====================
     44                     Min  Max   Mean    SD   Class Correlation
     45     ============== ==== ==== ======= ===== ====================
     46     sepal length:   4.3  7.9   5.84   0.83    0.7826
     47     sepal     2.0  4.4   3.05   0.43   -0.4194
     48     petal length:   1.0  6.9   3.76   1.76    0.9490  (high!)
     49     petal     0.1  2.5   1.20  0.76     0.9565  (high!)
     50     ============== ==== ==== ======= ===== ====================
     51 
     52     :Missing Attribute Values: None
     53     :Class Distribution: 33.3% for each of 3 classes.
     54     :Creator: R.A. Fisher
     55     :Donor: Michael Marshall (MARSHALL%PLU@io.arc.nasa.gov)
     56     :Date: July, 1988
     57 
     58 This is a copy of UCI ML iris datasets.
     59 http://archive.ics.uci.edu/ml/datasets/Iris
     60 
     61 The famous Iris database, first used by Sir R.A Fisher
     62 
     63 This is perhaps the best known database to be found in the
     64 pattern recognition literature.  Fisher's paper is a classic in the field and
     65 is referenced frequently to this day.  (See Duda & Hart, for example.)  The
     66 data set contains 3 classes of 50 instances each, where each class refers to a
     67 type of iris plant.  One class is linearly separable from the other 2; the
     68 latter are NOT linearly separable from each other.
     69 
     70 References
     71 ----------
     72    - Fisher,R.A. "The use of multiple measurements in taxonomic problems"
     73      Annual Eugenics, 7, Part II, 179-188 (1936); also in "Contributions to
     74      Mathematical Statistics" (John Wiley, NY, 1950).
     75    - Duda,R.O., & Hart,P.E. (1973) Pattern Classification and Scene Analysis.
     76      (Q327.D83) John Wiley & Sons.  ISBN 0-471-22361-1.  See page 218.
     77    - Dasarathy, B.V. (1980) "Nosing Around the Neighborhood: A New System
     78      Structure and Classification Rule for Recognition in Partially Exposed
     79      Environments".  IEEE Transactions on Pattern Analysis and Machine
     80      Intelligence, Vol. PAMI-2, No. 1, 67-71.
     81    - Gates, G.W. (1972) "The Reduced Nearest Neighbor Rule".  IEEE Transactions
     82      on Information Theory, May 1972, 431-433.
     83    - See also: 1988 MLC Proceedings, 54-64.  Cheeseman et al"s AUTOCLASS II
     84      conceptual clustering system finds 3 classes in the data.
     85    - Many, many more ...
     86    
     87    共有150个数据样本
     88    均匀分布在3个亚种上
     89    每个样本采样4个花瓣、花萼的形状描述
     90 '''
     91 
     92 '''
     93 2 划分训练集合和测试集合
     94 '''
     95 x_train, x_test, y_train, y_test = train_test_split(iris.data,
     96                                                     iris.target,
     97                                                     test_size=0.25,
     98                                                     random_state=33)
     99 
    100 '''
    101 3 k近邻分类器 学习模型和预测
    102 '''
    103 # 训练数据和测试数据进行标准化
    104 ss = StandardScaler()
    105 x_train = ss.fit_transform(x_train)
    106 x_test = ss.transform(x_test)
    107 
    108 # 建立一个k近邻模型对象
    109 knc = KNeighborsClassifier()
    110 # 输入训练数据进行学习建模
    111 knc.fit(x_train, y_train)
    112 # 对测试数据进行预测
    113 y_predict = knc.predict(x_test)
    114 
    115 '''
    116 4 模型评估
    117 '''
    118 print("准确率:", knc.score(x_test, y_test))
    119 print("其他指标:
    ", classification_report(y_test, y_predict, target_names=iris.target_names))
    120 '''
    121 准确率: 0.8947368421052632
    122 其他指标:
    123               precision    recall  f1-score   support
    124 
    125      setosa       1.00      1.00      1.00         8
    126  versicolor       0.73      1.00      0.85        11
    127   virginica       1.00      0.79      0.88        19
    128 
    129 avg / total       0.92      0.89      0.90        38
    130 '''
  • 相关阅读:
    python 中 print函数的用法详解
    可转债操作一览
    Python基本数据类型
    python的列表
    理财的方法
    92、Multiple commands produce Info.plist 报错
    91、最新cocoaPods安装与使用
    90、引入头文件不提示
    89、instancetype和id的区别
    88、const、static、extern介绍
  • 原文地址:https://www.cnblogs.com/Lin-Yi/p/8970527.html
Copyright © 2011-2022 走看看