zoukankan      html  css  js  c++  java
  • 机器学习-模型评价指标

    对于二分类问题,它的样本只有正样本和负样本两类。测试样本中,正样本被分类器判定为正样本的数量记为TP(true positive),被判定为负样本的数量记为FP(false negative)。负样本被分类器判定为负样本的数量记为TN(true negative),被判定为正样本的数量记为FP(false positive)。如图所示,A,B两组样本总数量各为100。

    精度定义: TP/(TP+FP)

    召回率定义:TP/(TP+FN)

    虚景率: 1 - TP/(TP+FP)

    真阳率:TPR =TP/(TP +FN)

    假阳率:FPR = FP/(FP+TN)

    ROS曲线的横轴为假阳率,纵轴为真阳率。

    一个好的分类曲线应该让假阳率低,真阳率高,理想情况下应该是接近于y=1 的直线,即让曲线下的面积尽可能的大。

    例子:

    生成两组正态分布样本,两组样本对应的标签分别表示正样本,和负样本;资源链接如下:

    链接:https://pan.baidu.com/s/1X4hHygzSQHB3f8_kepxE8A
    提取码:6uvg

    # -*- coding: utf-8 -*-
    """
    Spyder Editor
    
    This is a temporary script file.
    """
    import numpy as np
    import matplotlib.pyplot as plt
    from scipy import stats
    
    def floatrange(start,stop,steps):
        return [start+float(i)*(stop-start)/(float(steps)-1) for i in range(steps)]
    
    """读取数据"""
    data = np.loadtxt('data.txt')
    
    """"计算不同类别的正态参数"""
    totalCount = len(data[:,0])
    positiveCount =np.sum(data[:,1])
    negativeCount = totalCount - positiveCount
    
    #正标本均值,方差
    positiveIndex= np.where(data[:,1] ==1)
    positiveSum = np.sum(data[positiveIndex,0])
    positive_u =positiveSum / positiveCount
    positive_derta =np.sqrt(np.sum(np.square(data[positiveIndex,0] - positive_u )) / positiveCount)
    
    #负标本均值,方差
    negativeIndex= np.where(data[:,1] ==0)
    negativeSum = np.sum(data[negativeIndex,0])
    negative_u =negativeSum / negativeCount
    negative_derta =np.sqrt(np.sum(np.square(data[negativeIndex,0] - negative_u )) / negativeCount)
    
    #概率密度 曲线绘制
    x = floatrange(2,25,1000)
    print(positive_u,positive_derta)
    pd = np.exp(-1.0*np.square(x-positive_u) / (2*np.square(positive_derta))) /(positive_derta*np.sqrt(2*np.pi))
    nd = np.exp(-1.0*np.square(x-negative_u) / (2*np.square(negative_derta))) /(negative_derta*np.sqrt(2*np.pi))
    plt.figure(1)
    plt.plot(x,pd,'r')   
    plt.plot(x,nd,'b') 
        
    
    #概率分布构建
    positiveFun = stats.norm(positive_u,positive_derta)
    negativeFun = stats.norm(negative_u,negative_derta)
    
    positiveValue = positiveFun.cdf(x)
    negativeValue = negativeFun.cdf(x)
    
    
    #真阳率,假阳率
    positiveRate = 1 -positiveFun.cdf(x)
    negativeRate = 1 -negativeFun.cdf(x)
    
    #阀值
    disvalue =positiveFun.cdf(x) +1 -negativeFun.cdf(x)
    minvalue = np.min(disvalue)
    index = np.where(disvalue == minvalue)
    indexvalue =int(index[0])
    
    xvalue = x[indexvalue]
    
    #混淆矩阵
    positivevalue = 1 -positiveFun.cdf(xvalue)
    negativevalue = 1 -negativeFun.cdf(xvalue)
    v00= int(positivevalue * positiveCount)
    v01= positiveCount -v00
    v10 =int(negativevalue* negativeCount)
    v11 =negativeCount -v10
    print("disvalue:",xvalue)
    print("positiverate:",positivevalue,"negativerate:",negativevalue)
    print(v00,",",v01)
    print(v10,",",v11)
    
    
    xdis = [xvalue,xvalue] 
    ydis = [0,0.2]  
    plt.plot(xdis,ydis,'g')
    """ros 曲线"""
    plt.figure(2)
    plt.plot(negativeRate,positiveRate,'r')

    运行结果如下所示:

  • 相关阅读:
    [codevs 1243][网络提速(最短路分层思想)
    [codevs 1183][泥泞的道路(二分+spfa)
    [codevs 2488]绿豆蛙的归宿(拓扑排序)
    [codevs 1961]躲避大龙(dfs)
    4、userCF和itemCF对比,冷启动
    query简洁弹出层代码
    css 积累1
    localStorage,sessionStorage
    tr th td
    (转存)面向切面编程(AOP)的理解
  • 原文地址:https://www.cnblogs.com/kabe/p/10512474.html
Copyright © 2011-2022 走看看