zoukankan      html  css  js  c++  java
  • ROC曲线评估和异常点去除

    1、详细链接见 https://www.cnblogs.com/mdevelopment/p/9456486.html

    复习ROC曲线:

          ROC曲线是一个突出ADS分辨能力的曲线,用来区分正常点和异常点。ROC曲线将TPR召回率描绘为FPR假阳性率的函数。

      曲线下的面积(AUC)越大,曲线越接近水平渐近线,ADS效果越好。

    def evaluate(scores, labels):
         """
         It retures the auc and prauc scores.
         :param scores: list<float> | the anomaly scores predicted by CellPAD.
         :param labels: list<float> | the true labels.
         :return: the auc, prauc.
         """
        from sklearn import metrics          调用方式为:metrics.评价指标函数名称(parameter)

         fpr, tpr, thresholds = metrics.roc_curve(labels, scores, pos_label=1)

         计算ROC曲线的横纵坐标值,TPR,FPR  

         TPR = TP/(TP+FN) = recall(真正例率,敏感度)       FPR = FP/(FP+TN)(假正例率,1-特异性)
        precision, recall, thresholds = metrics.precision_recall_curve(labels, scores, pos_label=1)

          使用python画precision-recall曲线的代码
         auc = metrics.auc(fpr, tpr)

         auc(xyreorder=False) : ROC曲线下的面积;较大的AUC代表了较好的performance
        pruc = metrics.auc(recall, precision)
       return auc, pruc

    2、

    def detect_anomaly(self, predicted_series, practical_series):

       通过比较预测值和实际值来计算每个点的掉落率。
       然后,它运行filter_anomaly()函数以通过参数“ rule”过滤掉异常。

         """
         It calculates the drop ratio of each point by comparing the predicted value and practical value.
         Then it runs filter_anomaly() function to filter out the anomalies by the parameter "rule".
         :param predicted_series: the predicted values of a KPI series
         :param practical_series: the practical values of a KPI series
         :return: drop_ratios, drop_labels and drop_scores
         """
         drop_ratios = []
         for i in range(len(practical_series)):

              dp=(实际值-预测值)/(预测值+10的7次方)
              dp = (practical_series[i] - predicted_series[i]) / (predicted_series[i] + 1e-7)
              drop_ratios.append(dp)
         drop_scores = []

         如有负数,改为正数
         for r in drop_ratios:
              if r < 0:
                  drop_scores.append(-r)
             else:
                 drop_scores.append(0.0)

        drop_labels = self.filter_anomaly(drop_ratios) 
        return drop_ratios, drop_labels, drop_scores

    3、由2调用filter_anomaly()函数

    def filter_anomaly(self, drop_ratios):

        """

         它计算不同方法的阈值(规则),然后调用filter_by_threshold()。
         It calculates the threshold for different approach(rule) and then calls filter_by_threshold().
         - gauss: threshold = mean - self.sigma * std
         - threshold: the given threshold variable
         - proportion: threshold = sort_scores[threshold_index]
         :param drop_ratios: list<float> | a measure of predicted drop anomaly degree
         :return: list<bool> | the drop labels
        """
        if self.rule == 'gauss':
            mean = np.mean(drop_ratios)
            std = np.std(drop_ratios)    方差, 总体标准偏差
            threshold = mean - self.sigma * std 阈值=平均数-方差*sigma
            drop_labels = self.filter_by_threshold(drop_ratios, threshold)
            return drop_labels

        if self.rule == "threshold":
             threshold = self.threshold
            drop_labels = self.filter_by_threshold(drop_ratios, threshold)
            return drop_labels

        if self.rule == "proportion":
            sort_scores = sorted(np.array(drop_ratios))    从小到大排序
            threshold_index = int(len(drop_ratios) * self.proportion)
           threshold = sort_scores[threshold_index]
           drop_labels = self.filter_by_threshold(drop_ratios, threshold)
           return drop_labels

    4、由3调用filter_by_threshold函数

    def filter_by_threshold(self, drop_scores, threshold):
         """

          通过比较其下降分数和阈值来判断一个点是否为异常。
         It judges whether a point is an anomaly by comparing its drop score and the threshold.
         :param drop_scores: list<float> | a measure of predicted drop anomaly degree.
         :param threshold: float | the threshold to filter out anomalies.
         :return: list<bool> | a list of labels where a point with a "true" label is an anomaly.
         """
         drop_labels = []
         for r in drop_scores:
               if r < threshold:
                   drop_labels.append(True)
             else:
                  drop_labels.append(False)
         return drop_labels

  • 相关阅读:
    Code Reading chap2
    Code Reading chap4
    Code Reading chap6
    常用的一些sql基础语句汇总
    20170322、Linux常用命令汇总
    在windows上部署使用Redis
    20170322、php基础语法
    20170822、在Linux上部署使用Redis
    Linux安装配置SVN服务器
    Linux安装配置MySQL
  • 原文地址:https://www.cnblogs.com/0211ji/p/13294596.html
Copyright © 2011-2022 走看看