ROC曲线评估和异常点去除

zoukankan html css js c++ java

ROC曲线评估和异常点去除

1、详细链接见 https://www.cnblogs.com/mdevelopment/p/9456486.html

复习ROC曲线：

ROC曲线是一个突出ADS分辨能力的曲线，用来区分正常点和异常点。ROC曲线将TPR召回率描绘为FPR假阳性率的函数。

  曲线下的面积(AUC)越大，曲线越接近水平渐近线，ADS效果越好。

def evaluate(scores, labels):
"""
It retures the auc and prauc scores.
:param scores: list<float> | the anomaly scores predicted by CellPAD.
:param labels: list<float> | the true labels.
:return: the auc, prauc.
"""
from sklearn import metrics   调用方式为：metrics.评价指标函数名称（parameter）

   fpr, tpr, thresholds = metrics.roc_curve(labels, scores, pos_label=1)

计算ROC曲线的横纵坐标值，TPR，FPR

TPR = TP/(TP+FN) = recall(真正例率，敏感度) FPR = FP/(FP+TN)(假正例率，1-特异性)
precision, recall, thresholds = metrics.precision_recall_curve(labels, scores, pos_label=1)

  使用python画precision-recall曲线的代码
   auc = metrics.auc(fpr, tpr)

auc(x, y, reorder=False) : ROC曲线下的面积;较大的AUC代表了较好的performance
pruc = metrics.auc(recall, precision)
return auc, pruc

2、

def detect_anomaly(self, predicted_series, practical_series):

通过比较预测值和实际值来计算每个点的掉落率。
然后，它运行filter_anomaly（）函数以通过参数“ rule”过滤掉异常。

"""
It calculates the drop ratio of each point by comparing the predicted value and practical value.
Then it runs filter_anomaly() function to filter out the anomalies by the parameter "rule".
:param predicted_series: the predicted values of a KPI series
:param practical_series: the practical values of a KPI series
:return: drop_ratios, drop_labels and drop_scores
"""
drop_ratios = []
for i in range(len(practical_series)):

dp=（实际值-预测值）/（预测值+10的7次方）
dp = (practical_series[i] - predicted_series[i]) / (predicted_series[i] + 1e-7)
drop_ratios.append(dp)
drop_scores = []

如有负数，改为正数
for r in drop_ratios:
if r < 0:
drop_scores.append(-r)
else:
drop_scores.append(0.0)

drop_labels = self.filter_anomaly(drop_ratios)
return drop_ratios, drop_labels, drop_scores

3、由2调用filter_anomaly（）函数

def filter_anomaly(self, drop_ratios):

"""

它计算不同方法的阈值（规则），然后调用filter_by_threshold（）。
It calculates the threshold for different approach(rule) and then calls filter_by_threshold().
- gauss: threshold = mean - self.sigma * std
- threshold: the given threshold variable
- proportion: threshold = sort_scores[threshold_index]
:param drop_ratios: list<float> | a measure of predicted drop anomaly degree
:return: list<bool> | the drop labels
"""
if self.rule == 'gauss':
mean = np.mean(drop_ratios)
std = np.std(drop_ratios)   方差，总体标准偏差
threshold = mean - self.sigma * std 阈值=平均数-方差*sigma
drop_labels = self.filter_by_threshold(drop_ratios, threshold)
return drop_labels

if self.rule == "threshold":
threshold = self.threshold
drop_labels = self.filter_by_threshold(drop_ratios, threshold)
return drop_labels

if self.rule == "proportion":
sort_scores = sorted(np.array(drop_ratios))   从小到大排序
threshold_index = int(len(drop_ratios) * self.proportion)
threshold = sort_scores[threshold_index]
drop_labels = self.filter_by_threshold(drop_ratios, threshold)
return drop_labels

4、由3调用filter_by_threshold函数

def filter_by_threshold(self, drop_scores, threshold):
"""

  通过比较其下降分数和阈值来判断一个点是否为异常。
It judges whether a point is an anomaly by comparing its drop score and the threshold.
:param drop_scores: list<float> | a measure of predicted drop anomaly degree.
:param threshold: float | the threshold to filter out anomalies.
:return: list<bool> | a list of labels where a point with a "true" label is an anomaly.
"""
drop_labels = []
for r in drop_scores:
if r < threshold:
drop_labels.append(True)
else:
drop_labels.append(False)
return drop_labels

查看全文

相关阅读:
linux 命令
 Linux中zip压缩和unzip解压缩命令详解
 Sublime Text2.0.2注册码
 Yii框架入门教程（博客教程、权威指南、类手册）
Redis在PHP中的基本使用案例
 Yii MemCache 应用实例
 javascript数组操作汇总
 CSS进阶学习
 暑期周总结八（2018.8.27-2018.9.2）
3D轮播图

原文地址：https://www.cnblogs.com/0211ji/p/13294596.html