zoukankan      html  css  js  c++  java
  • Mutual Information

    Mutal Information, MI, 中文名称:互信息. 用于描述两个概率分布的相似/相关程度. 常用于衡量两个不同聚类算法在同一个数据集的聚类结果的相似性/共享的信息量.
    给定两种聚类结果(X,Y), 现在用MI来衡量它们之间的相似程度 计算方式为:

    [MI(X, Y) = sum_{u in U} sum_{v in V} p(u, v)log frac{p(u, v)}{p(u)p(v)} ]

    其中(U=set(X), V = set(Y))(set()为去重操作).
    从概率论的角度来理解, (frac{p(u, v)}{p(u)p(v)})描述了(u, v)之间的相关性: 相关性越大, 值越大(大于1);若独立, 则为1. 从整体来看, (X, Y)的distribution pattern越相似, MI越大.

    下面是摘自http://www.cnblogs.com/ziqiao/archive/2011/12/13/2286273.html的matlab代码, 可帮助理解.

    function MIhat = nmi( A, B ) %NMI Normalized mutual information
    % http://en.wikipedia.org/wiki/Mutual_information
    % http://nlp.stanford.edu/IR-book/html/htmledition/evaluation-of-clustering-1.html
    % Author: http://www.cnblogs.com/ziqiao/   [2011/12/13] 
    if length( A ) ~= length( B)
        error('length( A ) must == length( B)');
    end
    total = length(A);
    A_ids = unique(A);
    B_ids = unique(B);
    
    % Mutual information
    MI = 0;
    for idA = A_ids
        for idB = B_ids
             idAOccur = find( A == idA );
             idBOccur = find( B == idB );
             idABOccur = intersect(idAOccur,idBOccur); 
             
             px = length(idAOccur)/total;
             py = length(idBOccur)/total;
             pxy = length(idABOccur)/total;
             
             MI = MI + pxy*log2(pxy/(px*py)+eps); % eps : the smallest positive number
    
        end
    end
    
    % Normalized Mutual information
    Hx = 0; % Entropies
    for idA = A_ids
        idAOccurCount = length( find( A == idA ) );
        Hx = Hx - (idAOccurCount/total) * log2(idAOccurCount/total + eps);
    end
    Hy = 0; % Entropies
    for idB = B_ids
        idBOccurCount = length( find( B == idB ) );
        Hy = Hy - (idBOccurCount/total) * log2(idBOccurCount/total + eps);
    end
    
    MIhat = 2 * MI / (Hx+Hy);
    end
    
    % Example :  
    % (http://nlp.stanford.edu/IR-book/html/htmledition/evaluation-of-clustering-1.html)
    % A = [1 1 1 1 1 1   2 2 2 2 2 2    3 3 3 3 3];
    % B = [1 2 1 1 1 1   1 2 2 2 2 3    1 1 3 3 3];
    % nmi(A,B)% ans = 0.3646
    
  • 相关阅读:
    静态代码块、非静态代码块、构造函数之间的执行顺序
    Linux跨主机传输文件
    🗒 Linux 系统监控
    Mysql Mode
    Mysql 表锁行锁
    Centos 下修改时区
    Redis 解决内存过大
    Mysql 表达式
    Centos 二进制包安装Mysql5.7
    Vim 快捷键
  • 原文地址:https://www.cnblogs.com/dengdan890730/p/6280051.html
Copyright © 2011-2022 走看看