zoukankan      html  css  js  c++  java
  • 离散序列的标准互信息计算(转载)

                                                                                                        离散序列的标准互信息计算

                                                                    来源:http://www.cnblogs.com/ziqiao/archive/2011/12/13/2286273.html

    一、离散序列样本

      X = [1 1 1 1 1 1   2 2 2 2 2 2    3 3 3 3 3];

      Y= [1 2 1 1 1 1   1 2 2 2 2 3    1 1 3 3 3];

    二、计算离散序列X与Y的互信息(Mutual information

      MI可以按照下面的公式(1)计算:

                                                           

    X和Y的联合分布概率p(x,y)和边缘分布律p(x)、p(y)如下:

                                                     

    其中,分子p(x,y)为x和y的联合分布概率:

                                                               p(1,1)=5/17, p(1,2)=1/17, p(1,3)=0;

                                                               p(2,1)=1/17, p(2,2)=4/17, p(2,3)=1/17;

                                                               p(3,1)=2/17, p(3,2)=0, p(3,3)=3/17;                  

    分母p(x)为x的概率函数,p(y)为y的概率函数,

                                                    对p(x):

                                                               p(1)=6/17,p(2)=6/17,p(3)=5/17  

                                                    对p(y):

                                                               p(1)=8/17,p(2)=5/17,P(3)=4/17 

    把上述概率代入公式(1),就可以算出MI。

    三、计算标准化互信息NMI(Normalized Mutual information)

      标准化互信息,即用熵做分母将MI值调整到0与1之间。一个比较多见的实现是下面所示:

                                                                 

    H(X)和H(Y)分别为X和Y的熵,H(X)计算公式如下,公式中log的底b=2。

                                                          

    例如,H(X) =  -p(1)*log2(p(1)) - -p(2)*log2(p(2)) -p(3)*log2(p(3))。

    四、计算标准互信息的MATLAB程序

    function MIhat = nmi( A, B ) %NMI Normalized mutual information
    % http://en.wikipedia.org/wiki/Mutual_information
    % http://nlp.stanford.edu/IR-book/html/htmledition/evaluation-of-clustering-1.html
    % Author: http://www.cnblogs.com/ziqiao/   [2011/12/13] if length( A ) ~= length( B)
        error('length( A ) must == length( B)');
    end
    total = length(A);
    A_ids = unique(A);
    B_ids = unique(B);
    
    % Mutual information
    MI = 0;
    for idA = A_ids
        for idB = B_ids
             idAOccur = find( A == idA );
             idBOccur = find( B == idB );
             idABOccur = intersect(idAOccur,idBOccur); 
             
             px = length(idAOccur)/total;
             py = length(idBOccur)/total;
             pxy = length(idABOccur)/total;
             
             MI = MI + pxy*log2(pxy/(px*py)+eps); % eps : the smallest positive number
    
        end
    end
    
    % Normalized Mutual information
    Hx = 0; % Entropies
    for idA = A_ids
        idAOccurCount = length( find( A == idA ) );
        Hx = Hx - (idAOccurCount/total) * log2(idAOccurCount/total + eps);
    end
    Hy = 0; % Entropies
    for idB = B_ids
        idBOccurCount = length( find( B == idB ) );
        Hy = Hy - (idBOccurCount/total) * log2(idBOccurCount/total + eps);
    end
    
    MIhat = 2 * MI / (Hx+Hy);
    end
    
    % Example :  
    % (http://nlp.stanford.edu/IR-book/html/htmledition/evaluation-of-clustering-1.html)
    % A = [1 1 1 1 1 1   2 2 2 2 2 2    3 3 3 3 3];
    % B = [1 2 1 1 1 1   1 2 2 2 2 3    1 1 3 3 3];
    % nmi(A,B)% ans = 0.3646

      为了节省运行时间,将for循环用矩阵运算代替,1百万的数据量运行 1.795723second,上面的方法运行3.491053 second;  但是这种方法太占内存空间, 五百万时,利用matlab2011版本的内存设置就显示Out of memory了。

    版本一:

    function MIhat = nmi( A, B )
    %NMI Normalized mutual information
    % http://en.wikipedia.org/wiki/Mutual_information
    % http://nlp.stanford.edu/IR-book/html/htmledition/evaluation-of-clustering-1.html
    % Author: http://www.cnblogs.com/ziqiao/   [2011/12/15] if length( A ) ~= length( B)
        error('length( A ) must == length( B)');
    end
    total = length(A);
    A_ids = unique(A);
    A_class = length(A_ids);
    B_ids = unique(B);
    B_class = length(B_ids);
    % Mutual information
    idAOccur = double (repmat( A, A_class, 1) == repmat( A_ids', 1, total ));
    idBOccur = double (repmat( B, B_class, 1) == repmat( B_ids', 1, total ));
    idABOccur = idAOccur * idBOccur';
    Px = sum(idAOccur') / total;
    Py = sum(idBOccur') / total;
    Pxy = idABOccur / total;
    MImatrix = Pxy .* log2(Pxy ./(Px' * Py)+eps);
    MI = sum(MImatrix(:))
    % Entropies
    Hx = -sum(Px .* log2(Px + eps),2);
    Hy = -sum(Py .* log2(Py + eps),2);
    %Normalized Mutual information
    MIhat = 2 * MI / (Hx+Hy);
    % MIhat = MI / sqrt(Hx*Hy); another version of NMIend
    
    % Example :  
    % (http://nlp.stanford.edu/IR-book/html/htmledition/evaluation-of-clustering-1.html)
    % A = [1 1 1 1 1 1   2 2 2 2 2 2    3 3 3 3 3];
    % B = [1 2 1 1 1 1   1 2 2 2 2 3    1 1 3 3 3];
    % nmi(A,B) % ans =  0.3646

    版本二:

    A = [1 1 1 1 1 1   2 2 2 2 2 2    3 3 3 3 3];
    B = [1 2 1 1 1 1   1 2 2 2 2 3    1 1 3 3 3];
     
     if length( A ) ~= length( B)
        error('length( A ) must == length( B)');
     end
     
    total = length(A);                  %17
    A_ids = unique(A);                %[1,2,3]
    A_class = length(A_ids);       % 3
    B_ids = unique(B);                 %[1,2,3]
    B_class = length(B_ids);        %3
    
    % Mutual information
    idAOccur = double (repmat( A, A_class, 1) == repmat( A_ids', 1, total ));
    idBOccur = double (repmat( B, B_class, 1) == repmat( B_ids', 1, total ));
    idABOccur = idAOccur * idBOccur';
    Px = sum(idAOccur') / total;
    Py = sum(idBOccur') / total;
    Pxy = idABOccur / total;
    miMatrix = Pxy .* log2(Pxy ./(Px' * Py)+eps);  %加上一个很小的数,log2的真数部分不能<=0.
    mi = sum(miMatrix(:));
    
    % Entropies(熵)
    Hx = -sum(Px .* log2(Px + eps),2);
    Hy = -sum(Py .* log2(Py + eps),2);
    
    %Normalized Mutual information
    nmi= 2 * mi / (Hx+Hy)   % nmi = mi / sqrt(Hx*Hy); another version of nmi
  • 相关阅读:
    Windows Store App 主题动画
    Windows Store App 过渡动画
    Windows Store App 控件动画
    Windows Store App 近期访问列表
    Windows Store App 文件选取器
    Windows Store App 访问应用内部文件
    Windows Store App 用户库文件分组
    Windows Store App 获取文件及文件夹列表
    Windows Store App 用户库文件夹操作
    Windows Store App 用户库文件操作
  • 原文地址:https://www.cnblogs.com/hezhiyao/p/7208615.html
Copyright © 2011-2022 走看看