zoukankan      html  css  js  c++  java
  • 混淆矩阵在Matlab中PRtools模式识别工具箱的应用

    声明:本文用到的代码均来自于PRTools(http://www.prtools.org)模式识别工具箱,并以matlab软件进行实验。

          混淆矩阵是模式识别中的常用工具,在PRTools工具箱中有直接的函数confmat可供引用。具体使用方法如下所示:

      [C,NE,LABLIST] = CONFMAT(LAB1,LAB2,METHOD,FID)
     
      INPUT
       LAB1        Set of labels
       LAB2        Set of labels
       METHOD      'count' (default) to count number of co-occurences in
                      LAB1 and LAB2, 'disagreement' to count relative
                        non-co-occurrence.
       FID         Write text result to file
     
      OUTPUT
       C           Confusion matrix
       NE          Total number of errors (empty labels are neglected)
       LABLIST     Unique labels in LAB1 and LAB2

       

           首先简单理解一些词语:

          

           TP(True Positive):被分类器正确分类的正元组。

           TN(True Negative):被分类器正确分类的负元组。

           FP(False Positive):被错误标记为正元组的负元组。

           FN(False Negative):被错误标记为负元组的正元组。

           TP与TN告诉我们分类器何时分类正确,FP与FN告诉我们分类器何时分类错误。

           对一个M类的数据集, 混淆矩阵(Confusion Matrix)是一个至少M×M的表,它的第i行第j列的数值表示为第i类的元组被标记为第j类的个数。

           一个例子,以UCI数据集中的Ionosphere数据集为例,调用PRtools工具箱中的混淆矩阵函数:

    (1)首先初始化ionosphere数据集合:

    data=load('ionosphere.txt');
    [m,k]=size(data);
    data1=ones(m,k-1);
    for i=1:k-1
        data1(:,i)=(data(:,i)-min(data(:,i)))/(max(data(:,i))-min(data(:,i)));
    end
    label=data(:,k);
    [Y,I]=min(label);
    if Y(1)==0
        for i=1:m
               label(i)=label(i)+1;
        end
    end
    a=dataset(data1,label);

    (2)然后调用confmat.m函数:

    [train,test]=gendat(a,0.5);
    w=treec(train);
    conf=confmat(test*w)

    运行结果:

    conf就是混淆矩阵,其矩阵数值含义对应上述表格。

    如果不想用PRtools工具箱中的混淆矩阵函数,可以直接自行编写混淆矩阵代码,如下所示,运行时可直接调用。

    function [confmatrix] = cfmatrix(actual, predict, classlist, per)
    % CFMATRIX calculates the confusion matrix for any prediction 
    % algorithm that generates a list of classes to which the test 
    % feature vectors are assigned
    %
    % Outputs: confusion matrix
    %
    %                 Actual Classes
    %                   p       n
    %              ___|_____|______| 
    %    Predicted  p'|     |      |
    %      Classes  n'|     |      |
    % 
    % Inputs: 
    % 1. actual / 2. predict
    % The inputs provided are the 'actual' classes vector
    % and the 'predict'ed classes vector. The actual classes are the classes
    % to which the input feature vectors belong. The predicted classes are the 
    % class to which the input feature vectors are predicted to belong to, 
    % based on a prediction algorithm. 
    % The length of actual class vector and the predicted class vector need to 
    % be the same. If they are not the same, an error message is displayed. 
    % 3. classlist
    % The third input provides the list of all the classes {p,n,...} for which 
    % the classification is being done. All classes are numbers.
    % 4. per = 1/0 (default = 0)
    % This parameter when set to 1 provides the values in the confusion matrix 
    % as percentages. The default provides the values in numbers.
    %
    % Example:
    % >> a = [ 1 2 3 1 2 3 1 1 2 3 2 1 1 2 3];
    % >> b = [ 1 2 3 1 2 3 1 1 1 2 2 1 2 1 3];
    % >> Cf = cfmatrix(a, b);
    %
    % [Avinash Uppuluri: avinash_uv@yahoo.com: Last modified: 08/21/08]
    
    % If classlist not entered: make classlist equal to all 
    % unique elements of actual
    if (nargin < 2)
       error('Not enough input arguments.');
    elseif (nargin == 2)
        classlist = unique(actual); % default values from actual
        per = 0; % default is numbers and input 1 for percentage
    elseif (nargin == 3)
        per = 0; % default is numbers and input 1 for percentage
    end
    
    if (length(actual) ~= length(predict))
        error('First two inputs need to be vectors with equal size.');
    elseif ((size(actual,1) ~= 1) && (size(actual,2) ~= 1))
        error('First input needs to be a vector and not a matrix');
    elseif ((size(predict,1) ~= 1) && (size(predict,2) ~= 1))
        error('Second input needs to be a vector and not a matrix');
    end
    format short g;
    n_class = length(classlist);
    line_two = '----------';
    line_three = '_________|';
    for i = 1:n_class
        obind_class_i = find(actual == classlist(i));
        prind_class_i = find(predict == classlist(i));
        confmatrix(i,i) = length(intersect(obind_class_i,prind_class_i));
        for j = 1:n_class
            %if (j ~= i)
            if (j < i)
            % observed j predicted i
            confmatrix(i,j) = length(find(actual(prind_class_i) == classlist(j))); 
            % observed i predicted j
            confmatrix(j,i) = length(find(predict(obind_class_i) == classlist(j)));
            end
        end
        line_two = strcat(line_two,'---',num2str(classlist(i)),'-----');
        line_three = strcat(line_three,'__________');
    end
    
    if (per == 1)
        confmatrix = (confmatrix ./ length(actual)).*100;
    end
    
    % output to screen
    disp('------------------------------------------');
    disp('             Actual Classes');
    disp(line_two);
    disp('Predicted|                     ');
    disp('  Classes|                     ');
    disp(line_three);
    
    for i = 1:n_class
        temps = sprintf('       %d             ',i);
        for j = 1:n_class
        temps = strcat(temps,sprintf(' |    %2.1f    ',confmatrix(i,j)));
        end
        disp(temps);
        clear temps
    end
    disp('------------------------------------------');
            
    View Code

          混淆矩阵的概念其实很好理解,接下来引申几个很好理解的术语的概念(P:正元组数目,N:负元组数目):
          准确率:TP+TN/P+N    

          错误率:FP+FN/P+N

          敏感度、召回率:TP/P

          精度:TP/TP+FP

          本文主要是从PRtools工具箱中混淆矩阵函数的使用来简单介绍了解混淆矩阵的概念,如有不正确的地方,欢迎指正。       

  • 相关阅读:
    php基本语法学习
    mac下charles使用
    python--logging模块
    CSS样式(二)
    CSS样式(一)
    HTML、CSS基础知识(四)
    HTML、CSS基础知识(三)
    HTML、CSS基础知识(二)
    HTML、CSS基础知识(一)
    python学习(三十四)第一个框架
  • 原文地址:https://www.cnblogs.com/luyaoblog/p/6648040.html
Copyright © 2011-2022 走看看