zoukankan      html  css  js  c++  java
  • 学习笔记之Model selection and evaluation

    学习笔记之scikit-learn - 浩然119 - 博客园

    • https://www.cnblogs.com/pegasus923/p/9997485.html
    • 3. Model selection and evaluation — scikit-learn 0.20.3 documentation
      • https://scikit-learn.org/stable/model_selection.html#model-selection

    Accuracy paradox - Wikipedia

    • https://en.wikipedia.org/wiki/Accuracy_paradox
    • The accuracy paradox is the paradoxical finding that accuracy is not a good metric for predictive models when classifying in predictive analytics. This is because a simple model may have a high level of accuracy but be too crude to be useful. For example, if the incidence of category A is dominant, being found in 99% of cases, then predicting that every case is category A will have an accuracy of 99%. Precision and recall are better measures in such cases.[1][2] The underlying issue is that class priors need to be accounted for in error analysis. Precision and recall help, but precision too can be biased by very unbalanced class priors in the test sets.

    Confusion matrix - Wikipedia

    • https://en.wikipedia.org/wiki/Confusion_matrix
    • In the field of machine learning and specifically the problem of statistical classification, a confusion matrix, also known as an error matrix,[4] is a specific table layout that allows visualization of the performance of an algorithm, typically a supervised learning one (in unsupervised learning it is usually called a matching matrix). Each row of the matrix represents the instances in a predicted class while each column represents the instances in an actual class (or vice versa).[2] The name stems from the fact that it makes it easy to see if the system is confusing two classes (i.e. commonly mislabeling one as another).
    • It is a special kind of contingency table, with two dimensions ("actual" and "predicted"), and identical sets of "classes" in both dimensions (each combination of dimension and class is a variable in the contingency table).
    • condition positive (P) the number of real positive cases in the data
    • condition negative (N) the number of real negative cases in the data
    • true positive (TP) eqv. with hit
    • true negative (TN) eqv. with correct rejection
    • false positive (FP) eqv. with false alarmType I error
    • false negative (FN) eqv. with miss, Type II error
    • sensitivityrecallhit rate, or true positive rate (TPR)
      • {displaystyle mathrm {TPR} ={frac {mathrm {TP} }{P}}={frac {mathrm {TP} }{mathrm {TP} +mathrm {FN} }}=1-mathrm {FNR} }
    • specificityselectivity or true negative rate (TNR)
      • {displaystyle mathrm {TNR} ={frac {mathrm {TN} }{N}}={frac {mathrm {TN} }{mathrm {TN} +mathrm {FP} }}=1-mathrm {FPR} }

    Sensitivity and specificity - Wikipedia

    • https://en.wikipedia.org/wiki/Sensitivity_and_specificity
    • Sensitivity and specificity are statistical measures of the performance of a binary classificationtest, also known in statistics as a classification function:
      • Sensitivity (also called the true positive rate, the recall, or probability of detection[1] in some fields) measures the proportion of actual positives that are correctly identified as such (e.g., the percentage of sick people who are correctly identified as having the condition).
      • Specificity (also called the true negative rate) measures the proportion of actual negatives that are correctly identified as such (e.g., the percentage of healthy people who are correctly identified as not having the condition).
    • In general, Positive = identified and negative = rejected. Therefore:
      • True positive = correctly identified
      • False positive = incorrectly identified
      • True negative = correctly rejected
      • False negative = incorrectly rejected

     
    Precision and recall - Wikipedia
    • https://en.wikipedia.org/wiki/Precision_and_recall
    • In pattern recognitioninformation retrieval and binary classificationprecision (also called positive predictive value) is the fraction of relevant instances among the retrieved instances, while recall (also known as sensitivity) is the fraction of relevant instances that have been retrieved over the total amount of relevant instances. Both precision and recall are therefore based on an understanding and measure of relevance.
    • Suppose a computer program for recognizing dogs in photographs identifies 8 dogs in a picture containing 12 dogs and some cats. Of the 8 identified as dogs, 5 actually are dogs (true positives), while the rest are cats (false positives). The program's precision is 5/8 while its recall is 5/12. When a search engine returns 30 pages only 20 of which were relevant while failing to return 40 additional relevant pages, its precision is 20/30 = 2/3 while its recall is 20/60 = 1/3. So, in this case, precision is "how useful the search results are", and recall is "how complete the results are".
    • In statistics, if the null hypothesis is that all items are irrelevant (where the hypothesis is accepted or rejected based on the number selected compared with the sample size), absence of type I and type II errors(i.e.: perfect sensitivity and specificity of 100% each) corresponds respectively to perfect precision (no false positive) and perfect recall (no false negative). The above pattern recognition example contained 8 − 5 = 3 type I errors and 12 − 5 = 7 type II errors. Precision can be seen as a measure of exactness or quality, whereas recall is a measure of completeness or quantity. The exact relationship between sensitivity and specificity to precision depends on the percent of positive cases in the population.
    • In simple terms, high precision means that an algorithm returned substantially more relevant results than irrelevant ones, while high recall means that an algorithm returned most of the relevant results.

      True condition
      Total population Condition positive Condition negative Prevalence = Σ Condition positive/Σ Total population Accuracy (ACC) = Σ True positive + Σ True negative/Σ Total population
    Predicted
    condition
    Predicted condition
    positive
    True positive,
    Power
    False positive,
    Type I error
    Positive predictive value (PPV), Precision = Σ True positive/Σ Predicted condition positive False discovery rate (FDR) = Σ False positive/Σ Predicted condition positive
    Predicted condition
    negative
    False negative,
    Type II error
    True negative False omission rate (FOR) = Σ False negative/Σ Predicted condition negative Negative predictive value (NPV) = Σ True negative/Σ Predicted condition negative
      True positive rate (TPR), RecallSensitivity, probability of detection = Σ True positive/Σ Condition positive False positive rate (FPR), Fall-out, probability of false alarm = Σ False positive/Σ Condition negative Positive likelihood ratio (LR+) = TPR/FPR Diagnostic odds ratio (DOR) = LR+/LR− F1 score = 2 · Precision · Recall/Precision + Recall
    False negative rate (FNR), Miss rate = Σ False negative/Σ Condition positive Specificity (SPC), Selectivity, True negative rate (TNR) = Σ True negative/Σ Condition negative Negative likelihood ratio (LR−) = FNR/TNR

    Receiver operating characteristic - Wikipedia

    • https://en.wikipedia.org/wiki/Receiver_operating_characteristic
    • receiver operating characteristic curve, or ROC curve, is a graphical plot that illustrates the diagnostic ability of a binary classifier system as its discrimination threshold is varied.
    • The ROC curve is created by plotting the true positive rate (TPR) against the false positive rate (FPR) at various threshold settings. The true-positive rate is also known as sensitivityrecall or probability of detection[4] in machine learning. The false-positive rate is also known as the fall-out or probability of false alarm[4] and can be calculated as (1 − specificity). It can also be thought of as a plot of the power as a function of the Type I Error of the decision rule (when the performance is calculated from just a sample of the population, it can be thought of as estimators of these quantities). The ROC curve is thus the sensitivity as a function of fall-out. In general, if the probability distributions for both detection and false alarm are known, the ROC curve can be generated by plotting the cumulative distribution function (area under the probability distribution from {displaystyle -infty }-infty  to the discrimination threshold) of the detection probability in the y-axis versus the cumulative distribution function of the false-alarm probability on the x-axis.
    • ROC analysis provides tools to select possibly optimal models and to discard suboptimal ones independently from (and prior to specifying) the cost context or the class distribution. ROC analysis is related in a direct and natural way to cost/benefit analysis of diagnostic decision making.
    • The ROC curve was first developed by electrical engineers and radar engineers during World War II for detecting enemy objects in battlefields and was soon introduced to psychology to account for perceptual detection of stimuli. ROC analysis since then has been used in medicineradiologybiometricsforecasting of natural hazards,[5]meteorology,[6] model performance assessment,[7] and other areas for many decades and is increasingly used in machine learning and data mining research.
    • The ROC is also known as a relative operating characteristic curve, because it is a comparison of two operating characteristics (TPR and FPR) as the criterion changes.[8]

    Machine Learning with Python: Confusion Matrix in Machine Learning with Python

    • https://www.python-course.eu/confusion_matrix.php

    学习笔记之Machine Learning Crash Course | Google Developers - 浩然119 - 博客园

    • https://www.cnblogs.com/pegasus923/p/10508444.html
    • Classification: ROC Curve and AUC  |  Machine Learning Crash Course  |  Google Developers
      • https://developers.google.com/machine-learning/crash-course/classification/roc-and-auc  
      • 一键入门型介绍,基础知识介绍得很系统。

    精确率与召回率,RoC曲线与PR曲线 - 刘建平Pinard - 博客园

    • https://www.cnblogs.com/pinard/p/5993450.html
    • 主要是概念上的介绍。

    机器学习之分类性能度量指标 : ROC曲线、AUC值、正确率、召回率 - 简书

    • https://www.jianshu.com/p/c61ae11cc5f6
    • https://zhwhong.cn/2017/04/14/ROC-AUC-Precision-Recall-analysis/
    • 详细介绍ROC/Theshold影响/AUC并有配图,能更好理解。

    精确率、召回率、F1 值、ROC、AUC 各自的优缺点是什么? - 知乎

    • https://www.zhihu.com/question/30643044
    • 对ROC / PR / Threshold的调整 解释得很详细很到位。

    模型评估方法基础总结 - AI遇见机器学习

    • https://mp.weixin.qq.com/s/nZfu90fOwfNXx3zRtRlHFA
    • 基础概念介绍。
    • 一、留出法
    • 二、交叉验证
      • 1.简单交叉验证
      • 2.S折交叉验证
      • 3.留一交叉验证
    • 三、自助法
    • 四、调参与最终模型
      • 我们在算法学习中,还经常会遇到有参数(parameter)需要设定(像是梯度上升的步长),参数配置的不同,往往也会影响到模型的性能。这中对算法参数的设定,就是我们通常所说的“参数调节”,简称调参(parameter tuning)。
      • 而机器学习涉及的参数有两种:
        • 第一种是我们需要人为设置的参数,这种参数称为超参数,数目通常在10个以内
        • 另一类是模型参数,数目可能很多,在大型深度学习模型中甚至会有上百亿个参数。

    全面理解模型性能评估方法 - 机器学习算法与自然语言处理

    • https://mp.weixin.qq.com/s/5kWdmi8LgdDTjJ40lqz9_A
    • 总结介绍各个方法,并有公式配图。
    • 评估模型,不仅需要有效可行的实验估计方法,还需要有衡量模型泛化能力的评价标准,这便是性能度量(performance measure)。
    • 性能度量反映任务需求,在对比不同模型的能力时,使用不同的性能度量往往会导致不同的评判结果,也即是说,模型的好坏其实也是相对的,什么样的模型是“合适”的,不仅和算法与数据有关,还和任务需求有关,而本章所述的性能度量,便是由任务需求出发,用于衡量模型的方法。
    • 一、均方误差
    • 二、错误率与精度
    • 三、查准率、查全率
    • 四、平衡点(Break-Even Point , BEP)与F1
    • 五、多个二分类混淆矩阵的综合考查
    • 六、ROC与AUC
    • 七、代价敏感错误率与代价曲线

    How to tune threshold to get different confusion matrix ?

    • Note : be careful to avoid overfitting.
    • classification - Scikit - changing the threshold to create multiple confusion matrixes - Stack Overflow
      • https://stackoverflow.com/questions/32627926/scikit-changing-the-threshold-to-create-multiple-confusion-matrixes
    • python - scikit .predict() default threshold - Stack Overflow
      • https://stackoverflow.com/questions/19984957/scikit-predict-default-threshold
    • python - How to set a threshold for a sklearn classifier based on ROC results? - Stack Overflow
      • https://stackoverflow.com/questions/41864083/how-to-set-a-threshold-for-a-sklearn-classifier-based-on-roc-results?noredirect=1&lq=1
    • python - how to set threshold to scikit learn random forest model - Stack Overflow
      • https://stackoverflow.com/questions/49785904/how-to-set-threshold-to-scikit-learn-random-forest-model
  • 相关阅读:
    通过编程添加Outlook联系人和通讯组
    一个比较完整的WindowsFormsApplication实现
    读书笔记:《粘住:为什么我们记住了这些,忘掉了那些?》
    最新购书
    新买的2本书都不错
    压榨机器,Hack,设计极限强度的网络应用
    方向越来越明确了
    思想上激进,行为上保守
    一种遗失了很久的感觉正在慢慢回归
    物极必反,滥用闭包的结果就是回归结构化编程
  • 原文地址:https://www.cnblogs.com/pegasus923/p/10469919.html
Copyright © 2011-2022 走看看