zoukankan      html  css  js  c++  java
  • 阈值分类法

     数据集:seeds.tsv

    15.26    14.84    0.871    5.763    3.312    2.221    5.22    Kama
    14.88    14.57    0.8811    5.554    3.333    1.018    4.956    Kama
    14.29    14.09    0.905    5.291    3.337    2.699    4.825    Kama
    13.84    13.94    0.8955    5.324    3.379    2.259    4.805    Kama
    16.14    14.99    0.9034    5.658    3.562    1.355    5.175    Kama
    14.38    14.21    0.8951    5.386    3.312    2.462    4.956    Kama
    14.69    14.49    0.8799    5.563    3.259    3.586    5.219    Kama
    14.11    14.1    0.8911    5.42    3.302    2.7    5.0    Kama
    16.63    15.46    0.8747    6.053    3.465    2.04    5.877    Kama
    16.44    15.25    0.888    5.884    3.505    1.969    5.533    Kama
    15.26    14.85    0.8696    5.714    3.242    4.543    5.314    Kama
    14.03    14.16    0.8796    5.438    3.201    1.717    5.001    Kama
    13.89    14.02    0.888    5.439    3.199    3.986    4.738    Kama
    13.78    14.06    0.8759    5.479    3.156    3.136    4.872    Kama
    13.74    14.05    0.8744    5.482    3.114    2.932    4.825    Kama
    14.59    14.28    0.8993    5.351    3.333    4.185    4.781    Kama
    13.99    13.83    0.9183    5.119    3.383    5.234    4.781    Kama
    15.69    14.75    0.9058    5.527    3.514    1.599    5.046    Kama
    14.7    14.21    0.9153    5.205    3.466    1.767    4.649    Kama
    12.72    13.57    0.8686    5.226    3.049    4.102    4.914    Kama
    14.16    14.4    0.8584    5.658    3.129    3.072    5.176    Kama
    14.11    14.26    0.8722    5.52    3.168    2.688    5.219    Kama
    15.88    14.9    0.8988    5.618    3.507    0.7651    5.091    Kama
    12.08    13.23    0.8664    5.099    2.936    1.415    4.961    Kama
    15.01    14.76    0.8657    5.789    3.245    1.791    5.001    Kama
    16.19    15.16    0.8849    5.833    3.421    0.903    5.307    Kama
    13.02    13.76    0.8641    5.395    3.026    3.373    4.825    Kama
    12.74    13.67    0.8564    5.395    2.956    2.504    4.869    Kama
    14.11    14.18    0.882    5.541    3.221    2.754    5.038    Kama
    13.45    14.02    0.8604    5.516    3.065    3.531    5.097    Kama
    13.16    13.82    0.8662    5.454    2.975    0.8551    5.056    Kama
    15.49    14.94    0.8724    5.757    3.371    3.412    5.228    Kama
    14.09    14.41    0.8529    5.717    3.186    3.92    5.299    Kama
    13.94    14.17    0.8728    5.585    3.15    2.124    5.012    Kama
    15.05    14.68    0.8779    5.712    3.328    2.129    5.36    Kama
    16.12    15.0    0.9    5.709    3.485    2.27    5.443    Kama
    16.2    15.27    0.8734    5.826    3.464    2.823    5.527    Kama
    17.08    15.38    0.9079    5.832    3.683    2.956    5.484    Kama
    14.8    14.52    0.8823    5.656    3.288    3.112    5.309    Kama
    14.28    14.17    0.8944    5.397    3.298    6.685    5.001    Kama
    13.54    13.85    0.8871    5.348    3.156    2.587    5.178    Kama
    13.5    13.85    0.8852    5.351    3.158    2.249    5.176    Kama
    13.16    13.55    0.9009    5.138    3.201    2.461    4.783    Kama
    15.5    14.86    0.882    5.877    3.396    4.711    5.528    Kama
    15.11    14.54    0.8986    5.579    3.462    3.128    5.18    Kama
    13.8    14.04    0.8794    5.376    3.155    1.56    4.961    Kama
    15.36    14.76    0.8861    5.701    3.393    1.367    5.132    Kama
    14.99    14.56    0.8883    5.57    3.377    2.958    5.175    Kama
    14.79    14.52    0.8819    5.545    3.291    2.704    5.111    Kama
    14.86    14.67    0.8676    5.678    3.258    2.129    5.351    Kama
    14.43    14.4    0.8751    5.585    3.272    3.975    5.144    Kama
    15.78    14.91    0.8923    5.674    3.434    5.593    5.136    Kama
    14.49    14.61    0.8538    5.715    3.113    4.116    5.396    Kama
    14.33    14.28    0.8831    5.504    3.199    3.328    5.224    Kama
    14.52    14.6    0.8557    5.741    3.113    1.481    5.487    Kama
    15.03    14.77    0.8658    5.702    3.212    1.933    5.439    Kama
    14.46    14.35    0.8818    5.388    3.377    2.802    5.044    Kama
    14.92    14.43    0.9006    5.384    3.412    1.142    5.088    Kama
    15.38    14.77    0.8857    5.662    3.419    1.999    5.222    Kama
    12.11    13.47    0.8392    5.159    3.032    1.502    4.519    Kama
    11.42    12.86    0.8683    5.008    2.85    2.7    4.607    Kama
    11.23    12.63    0.884    4.902    2.879    2.269    4.703    Kama
    12.36    13.19    0.8923    5.076    3.042    3.22    4.605    Kama
    13.22    13.84    0.868    5.395    3.07    4.157    5.088    Kama
    12.78    13.57    0.8716    5.262    3.026    1.176    4.782    Kama
    12.88    13.5    0.8879    5.139    3.119    2.352    4.607    Kama
    14.34    14.37    0.8726    5.63    3.19    1.313    5.15    Kama
    14.01    14.29    0.8625    5.609    3.158    2.217    5.132    Kama
    14.37    14.39    0.8726    5.569    3.153    1.464    5.3    Kama
    12.73    13.75    0.8458    5.412    2.882    3.533    5.067    Kama
    17.63    15.98    0.8673    6.191    3.561    4.076    6.06    Rosa
    16.84    15.67    0.8623    5.998    3.484    4.675    5.877    Rosa
    17.26    15.73    0.8763    5.978    3.594    4.539    5.791    Rosa
    19.11    16.26    0.9081    6.154    3.93    2.936    6.079    Rosa
    16.82    15.51    0.8786    6.017    3.486    4.004    5.841    Rosa
    16.77    15.62    0.8638    5.927    3.438    4.92    5.795    Rosa
    17.32    15.91    0.8599    6.064    3.403    3.824    5.922    Rosa
    20.71    17.23    0.8763    6.579    3.814    4.451    6.451    Rosa
    18.94    16.49    0.875    6.445    3.639    5.064    6.362    Rosa
    17.12    15.55    0.8892    5.85    3.566    2.858    5.746    Rosa
    16.53    15.34    0.8823    5.875    3.467    5.532    5.88    Rosa
    18.72    16.19    0.8977    6.006    3.857    5.324    5.879    Rosa
    20.2    16.89    0.8894    6.285    3.864    5.173    6.187    Rosa
    19.57    16.74    0.8779    6.384    3.772    1.472    6.273    Rosa
    19.51    16.71    0.878    6.366    3.801    2.962    6.185    Rosa
    18.27    16.09    0.887    6.173    3.651    2.443    6.197    Rosa
    18.88    16.26    0.8969    6.084    3.764    1.649    6.109    Rosa
    18.98    16.66    0.859    6.549    3.67    3.691    6.498    Rosa
    21.18    17.21    0.8989    6.573    4.033    5.78    6.231    Rosa
    20.88    17.05    0.9031    6.45    4.032    5.016    6.321    Rosa
    20.1    16.99    0.8746    6.581    3.785    1.955    6.449    Rosa
    18.76    16.2    0.8984    6.172    3.796    3.12    6.053    Rosa
    18.81    16.29    0.8906    6.272    3.693    3.237    6.053    Rosa
    18.59    16.05    0.9066    6.037    3.86    6.001    5.877    Rosa
    18.36    16.52    0.8452    6.666    3.485    4.933    6.448    Rosa
    16.87    15.65    0.8648    6.139    3.463    3.696    5.967    Rosa
    19.31    16.59    0.8815    6.341    3.81    3.477    6.238    Rosa
    18.98    16.57    0.8687    6.449    3.552    2.144    6.453    Rosa
    18.17    16.26    0.8637    6.271    3.512    2.853    6.273    Rosa
    18.72    16.34    0.881    6.219    3.684    2.188    6.097    Rosa
    16.41    15.25    0.8866    5.718    3.525    4.217    5.618    Rosa
    17.99    15.86    0.8992    5.89    3.694    2.068    5.837    Rosa
    19.46    16.5    0.8985    6.113    3.892    4.308    6.009    Rosa
    19.18    16.63    0.8717    6.369    3.681    3.357    6.229    Rosa
    18.95    16.42    0.8829    6.248    3.755    3.368    6.148    Rosa
    18.83    16.29    0.8917    6.037    3.786    2.553    5.879    Rosa
    18.85    16.17    0.9056    6.152    3.806    2.843    6.2    Rosa
    17.63    15.86    0.88    6.033    3.573    3.747    5.929    Rosa
    19.94    16.92    0.8752    6.675    3.763    3.252    6.55    Rosa
    18.55    16.22    0.8865    6.153    3.674    1.738    5.894    Rosa
    18.45    16.12    0.8921    6.107    3.769    2.235    5.794    Rosa
    19.38    16.72    0.8716    6.303    3.791    3.678    5.965    Rosa
    19.13    16.31    0.9035    6.183    3.902    2.109    5.924    Rosa
    19.14    16.61    0.8722    6.259    3.737    6.682    6.053    Rosa
    20.97    17.25    0.8859    6.563    3.991    4.677    6.316    Rosa
    19.06    16.45    0.8854    6.416    3.719    2.248    6.163    Rosa
    18.96    16.2    0.9077    6.051    3.897    4.334    5.75    Rosa
    19.15    16.45    0.889    6.245    3.815    3.084    6.185    Rosa
    18.89    16.23    0.9008    6.227    3.769    3.639    5.966    Rosa
    20.03    16.9    0.8811    6.493    3.857    3.063    6.32    Rosa
    20.24    16.91    0.8897    6.315    3.962    5.901    6.188    Rosa
    18.14    16.12    0.8772    6.059    3.563    3.619    6.011    Rosa
    16.17    15.38    0.8588    5.762    3.387    4.286    5.703    Rosa
    18.43    15.97    0.9077    5.98    3.771    2.984    5.905    Rosa
    15.99    14.89    0.9064    5.363    3.582    3.336    5.144    Rosa
    18.75    16.18    0.8999    6.111    3.869    4.188    5.992    Rosa
    18.65    16.41    0.8698    6.285    3.594    4.391    6.102    Rosa
    17.98    15.85    0.8993    5.979    3.687    2.257    5.919    Rosa
    20.16    17.03    0.8735    6.513    3.773    1.91    6.185    Rosa
    17.55    15.66    0.8991    5.791    3.69    5.366    5.661    Rosa
    18.3    15.89    0.9108    5.979    3.755    2.837    5.962    Rosa
    18.94    16.32    0.8942    6.144    3.825    2.908    5.949    Rosa
    15.38    14.9    0.8706    5.884    3.268    4.462    5.795    Rosa
    16.16    15.33    0.8644    5.845    3.395    4.266    5.795    Rosa
    15.56    14.89    0.8823    5.776    3.408    4.972    5.847    Rosa
    15.38    14.66    0.899    5.477    3.465    3.6    5.439    Rosa
    17.36    15.76    0.8785    6.145    3.574    3.526    5.971    Rosa
    15.57    15.15    0.8527    5.92    3.231    2.64    5.879    Rosa
    15.6    15.11    0.858    5.832    3.286    2.725    5.752    Rosa
    16.23    15.18    0.885    5.872    3.472    3.769    5.922    Rosa
    13.07    13.92    0.848    5.472    2.994    5.304    5.395    Canadian
    13.32    13.94    0.8613    5.541    3.073    7.035    5.44    Canadian
    13.34    13.95    0.862    5.389    3.074    5.995    5.307    Canadian
    12.22    13.32    0.8652    5.224    2.967    5.469    5.221    Canadian
    11.82    13.4    0.8274    5.314    2.777    4.471    5.178    Canadian
    11.21    13.13    0.8167    5.279    2.687    6.169    5.275    Canadian
    11.43    13.13    0.8335    5.176    2.719    2.221    5.132    Canadian
    12.49    13.46    0.8658    5.267    2.967    4.421    5.002    Canadian
    12.7    13.71    0.8491    5.386    2.911    3.26    5.316    Canadian
    10.79    12.93    0.8107    5.317    2.648    5.462    5.194    Canadian
    11.83    13.23    0.8496    5.263    2.84    5.195    5.307    Canadian
    12.01    13.52    0.8249    5.405    2.776    6.992    5.27    Canadian
    12.26    13.6    0.8333    5.408    2.833    4.756    5.36    Canadian
    11.18    13.04    0.8266    5.22    2.693    3.332    5.001    Canadian
    11.36    13.05    0.8382    5.175    2.755    4.048    5.263    Canadian
    11.19    13.05    0.8253    5.25    2.675    5.813    5.219    Canadian
    11.34    12.87    0.8596    5.053    2.849    3.347    5.003    Canadian
    12.13    13.73    0.8081    5.394    2.745    4.825    5.22    Canadian
    11.75    13.52    0.8082    5.444    2.678    4.378    5.31    Canadian
    11.49    13.22    0.8263    5.304    2.695    5.388    5.31    Canadian
    12.54    13.67    0.8425    5.451    2.879    3.082    5.491    Canadian
    12.02    13.33    0.8503    5.35    2.81    4.271    5.308    Canadian
    12.05    13.41    0.8416    5.267    2.847    4.988    5.046    Canadian
    12.55    13.57    0.8558    5.333    2.968    4.419    5.176    Canadian
    11.14    12.79    0.8558    5.011    2.794    6.388    5.049    Canadian
    12.1    13.15    0.8793    5.105    2.941    2.201    5.056    Canadian
    12.44    13.59    0.8462    5.319    2.897    4.924    5.27    Canadian
    12.15    13.45    0.8443    5.417    2.837    3.638    5.338    Canadian
    11.35    13.12    0.8291    5.176    2.668    4.337    5.132    Canadian
    11.24    13.0    0.8359    5.09    2.715    3.521    5.088    Canadian
    11.02    13.0    0.8189    5.325    2.701    6.735    5.163    Canadian
    11.55    13.1    0.8455    5.167    2.845    6.715    4.956    Canadian
    11.27    12.97    0.8419    5.088    2.763    4.309    5.0    Canadian
    11.4    13.08    0.8375    5.136    2.763    5.588    5.089    Canadian
    10.83    12.96    0.8099    5.278    2.641    5.182    5.185    Canadian
    10.8    12.57    0.859    4.981    2.821    4.773    5.063    Canadian
    11.26    13.01    0.8355    5.186    2.71    5.335    5.092    Canadian
    10.74    12.73    0.8329    5.145    2.642    4.702    4.963    Canadian
    11.48    13.05    0.8473    5.18    2.758    5.876    5.002    Canadian
    12.21    13.47    0.8453    5.357    2.893    1.661    5.178    Canadian
    11.41    12.95    0.856    5.09    2.775    4.957    4.825    Canadian
    12.46    13.41    0.8706    5.236    3.017    4.987    5.147    Canadian
    12.19    13.36    0.8579    5.24    2.909    4.857    5.158    Canadian
    11.65    13.07    0.8575    5.108    2.85    5.209    5.135    Canadian
    12.89    13.77    0.8541    5.495    3.026    6.185    5.316    Canadian
    11.56    13.31    0.8198    5.363    2.683    4.062    5.182    Canadian
    11.81    13.45    0.8198    5.413    2.716    4.898    5.352    Canadian
    10.91    12.8    0.8372    5.088    2.675    4.179    4.956    Canadian
    11.23    12.82    0.8594    5.089    2.821    7.524    4.957    Canadian
    10.59    12.41    0.8648    4.899    2.787    4.975    4.794    Canadian
    10.93    12.8    0.839    5.046    2.717    5.398    5.045    Canadian
    11.27    12.86    0.8563    5.091    2.804    3.985    5.001    Canadian
    11.87    13.02    0.8795    5.132    2.953    3.597    5.132    Canadian
    10.82    12.83    0.8256    5.18    2.63    4.853    5.089    Canadian
    12.11    13.27    0.8639    5.236    2.975    4.132    5.012    Canadian
    12.8    13.47    0.886    5.16    3.126    4.873    4.914    Canadian
    12.79    13.53    0.8786    5.224    3.054    5.483    4.958    Canadian
    13.37    13.78    0.8849    5.32    3.128    4.67    5.091    Canadian
    12.62    13.67    0.8481    5.41    2.911    3.306    5.231    Canadian
    12.76    13.38    0.8964    5.073    3.155    2.828    4.83    Canadian
    12.38    13.44    0.8609    5.219    2.989    5.472    5.045    Canadian
    12.67    13.32    0.8977    4.984    3.135    2.3    4.745    Canadian
    11.18    12.72    0.868    5.009    2.81    4.051    4.828    Canadian
    12.7    13.41    0.8874    5.183    3.091    8.456    5.0    Canadian
    12.37    13.47    0.8567    5.204    2.96    3.919    5.001    Canadian
    12.19    13.2    0.8783    5.137    2.981    3.631    4.87    Canadian
    11.23    12.88    0.8511    5.14    2.795    4.325    5.003    Canadian
    13.2    13.66    0.8883    5.236    3.232    8.315    5.056    Canadian
    11.84    13.21    0.8521    5.175    2.836    3.598    5.044    Canadian
    12.3    13.34    0.8684    5.243    2.974    5.637    5.063    Canadian
    View Code

    第一步:加载数据 

     load.py

    import numpy as np
    
    def load_dataset(dataset_name):
        data = []
        label = []
        with open('{0}.tsv'.format(dataset_name),'r') as f:
            lines = f.readlines()
            for line in lines:
                linedata = line.strip().split('	')
                data.append([float(da) for da in linedata[:-1]])
                label.append(linedata[-1])
            data = np.array(data)
            label = np.array(label)
        return data,label

    第二步:设计分类模型

    阈值分类模型是在所有的训练数据中找最佳的阈值,这个阈值使得训练集的预测效果最好。

    threshold.py

    #coding:utf-8
    import numpy as np
    
    def learn_model(features,labels):
        best_acc = -1.0
        thresh = features.copy()
        for fi in range(features.shape[1]): # 逐列
            thresh = features[:,fi].copy()
            thresh.sort()
            for t in thresh: # 列中每一个元素
                pred = (features[:,fi]>t)
                acc = (pred == labels).mean()
                if acc > best_acc:
                    best_acc = acc
                    best_fi = fi
                    best_t = t
        print 'model->best_fi,t,acc:',best_fi,best_t,best_acc
        return best_t,best_fi
        
    def apply_model(features,model):
        t,fi = model
        return features[:,fi] > t    
        
    def accurcy(features,labels,model):
        predictions = apply_model(features,model)
        return (predictions == labels).mean() #prediction == labels 同为真或同为假
        

    第三步:测试模型的预测准确性

    在这里采用十折交叉验证,即把样本数据分成10份,每次取其中一份作为测试数据,其余9份作为训练数据。这种方法的优点是充分利用了数据样本资源,缺点是计算量大。

    seeds_threshold.py

    #coding:utf-8
    from load import load_dataset
    import numpy as np
    from threshold import learn_model,accurcy,apply_model
    
    features,labels = load_dataset('seeds')
    labels = (labels =='Canadian') #相等就为 True 不相等就为 False
    sumacc = 0.0
    for flod in xrange(10):
        print '',flod+1,' 次交叉验证'
        training = np.ones(len(features),bool)
        training[flod::10] = False
        testing = ~training
        model = learn_model(features[training],labels[training])
        acc = accurcy(features[testing],labels[testing],model)
        print '测试集预测准确率{0:.1%}'.format(acc)
        sumacc += acc
    sumacc /= 10
    print '平均测试集预测准确率{0:.1%}'.format(sumacc)

    运行 seeds_threshold.py

    这样下来一个简单的阈值分类模型就建好了。

    第五次交叉验证的准确率都是 81% 分类的阈值分别是第 fi=5 列,t = 4.308.我们就可以用这个阈值预测给定种子是否是 Canadian.

    现在我们在已有的基础上把三种 seed 的分类阈值都求出来:

    seeds.py

    #coding:utf-8
    from load import load_dataset
    import numpy as np
    from threshold import learn_model,accurcy,apply_model
    features,rawlabels = load_dataset('seeds')
    labelset  = set(rawlabels)
    print labelset
    #labels = (labels =='Canadian') #相等就为 True 不相等就为 False
    for label in labelset:
        print label
        labels = rawlabels.copy()
        labels = (labels == label)
        sumacc = 0.0
        bestacc = 0.0
        for flod in xrange(10):
            print '',flod+1,' 次交叉验证'
            training = np.ones(len(features),bool)
            training[flod::10] = False
            testing = ~training
            model = learn_model(features[training],labels[training])
            acc = accurcy(features[testing],labels[testing],model)
            print '测试集预测准确率{0:.1%}'.format(acc)
            if acc > bestacc:
                bestacc = acc
                bestmodel = model
            sumacc += acc
        sumacc /= 10
        print '平均测试集预测准确率{0:.1%}'.format(sumacc)
        print '最佳模型:',model    
    View Code

    分别求得各个类别的分类阈值。

    根据阈值可以画出分类树。

    用data表示一个待分类数据 data 有 7个元素,分别表示 7 个不同的特征。

    分类树:

  • 相关阅读:
    PL/SQL Developer保存自定义界面布局
    SQL Server 2008中SQL增强之二:Top新用途
    泛型和集合
    Go语言
    软件架构师培训
    using的几种用法
    【十五分钟Talkshow】如何善用你的.NET开发环境
    心的感谢
    【缅怀妈妈系列诗歌】之四:妈妈,对不起
    PDA开发经验小结 (转共享)
  • 原文地址:https://www.cnblogs.com/yuanzhenliu/p/5468074.html
Copyright © 2011-2022 走看看