zoukankan      html  css  js  c++  java
  • MATLAB聚类有效性评价指标(外部)

    MATLAB聚类有效性评价指标(外部)

    作者:凯鲁嘎吉 - 博客园 http://www.cnblogs.com/kailugaji/

    更多内容,请看:MATLAB聚类MATLAB聚类有效性评价指标(外部 成对度量)MATLAB: Clustering Algorithms

    前提:数据的真实标签已知!

    1. 归一化互信息(Normalized Mutual information)

    定义

     

    程序

    function MIhat = nmi(A, B)
    %NMI Normalized mutual information
    % A, B: 1*N;
    if length(A) ~= length(B)
        error('length( A ) must == length( B)');
    end
    N = length(A);
    A_id = unique(A);
    K_A = length(A_id);
    B_id = unique(B);
    K_B = length(B_id);
    % Mutual information
    A_occur = double (repmat( A, K_A, 1) == repmat( A_id', 1, N ));
    B_occur = double (repmat( B, K_B, 1) == repmat( B_id', 1, N ));
    AB_occur = A_occur * B_occur';
    P_A= sum(A_occur') / N;
    P_B = sum(B_occur') / N;
    P_AB = AB_occur / N;
    MImatrix = P_AB .* log(P_AB ./(P_A' * P_B)+eps);
    MI = sum(MImatrix(:));
    % Entropies
    H_A = -sum(P_A .* log(P_A + eps),2);
    H_B= -sum(P_B .* log(P_B + eps),2);
    %Normalized Mutual information
    MIhat = MI / sqrt(H_A*H_B); 

    结果

    >> A = [1 1 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3];
    >> B = [1 2 1 1 1 1 1 2 2 2 2 3 1 1 3 3 3];
    >> MIhat = nmi(A, B)
    
    MIhat =
    
        0.3646

    2. Rand统计量(Rand index)

    定义

    程序

    function [AR,RI,MI,HI]=RandIndex(c1,c2)
    %RANDINDEX - calculates Rand Indices to compare two partitions
    % ARI=RANDINDEX(c1,c2), where c1,c2 are vectors listing the 
    % class membership, returns the "Hubert & Arabie adjusted Rand index".
    % [AR,RI,MI,HI]=RANDINDEX(c1,c2) returns the adjusted Rand index, 
    % the unadjusted Rand index, "Mirkin's" index and "Hubert's" index.
    
    if nargin < 2 || min(size(c1)) > 1 || min(size(c2)) > 1
       error('RandIndex: Requires two vector arguments')
       return
    end
    
    C=Contingency(c1,c2);	%form contingency matrix
    
    n=sum(sum(C));
    nis=sum(sum(C,2).^2);		%sum of squares of sums of rows
    njs=sum(sum(C,1).^2);		%sum of squares of sums of columns
    
    t1=nchoosek(n,2);		%total number of pairs of entities
    t2=sum(sum(C.^2));	%sum over rows & columnns of nij^2
    t3=.5*(nis+njs);
    
    %Expected index (for adjustment)
    nc=(n*(n^2+1)-(n+1)*nis-(n+1)*njs+2*(nis*njs)/n)/(2*(n-1));
    
    A=t1+t2-t3;		%no. agreements
    D=  -t2+t3;		%no. disagreements
    
    if t1==nc
       AR=0;			%avoid division by zero; if k=1, define Rand = 0
    else
       AR=(A-nc)/(t1-nc);		%adjusted Rand - Hubert & Arabie 1985
    end
    
    RI=A/t1;			%Rand 1971		%Probability of agreement
    MI=D/t1;			%Mirkin 1970	%p(disagreement)
    HI=(A-D)/t1;	%Hubert 1977	%p(agree)-p(disagree)
    
    function Cont=Contingency(Mem1,Mem2)
    
    if nargin < 2 || min(size(Mem1)) > 1 || min(size(Mem2)) > 1
       error('Contingency: Requires two vector arguments')
       return
    end
    
    Cont=zeros(max(Mem1),max(Mem2));
    
    for i = 1:length(Mem1)
       Cont(Mem1(i),Mem2(i))=Cont(Mem1(i),Mem2(i))+1;
    end
    

    程序中包含了四种聚类度量方法:Adjusted Rand index、Rand index、Mirkin index、Hubert index。

    结果

    >> A = [1 1 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3];
    >> B = [1 2 1 1 1 1 1 2 2 2 2 3 1 1 3 3 3];
    >> [AR,RI,MI,HI]=RandIndex(A,B)
    
    AR =
    
        0.2429
    
    
    RI =
    
        0.6765
    
    
    MI =
    
        0.3235
    
    
    HI =
    
        0.3529

    3. 参考文献

    (simple) Tool for estimating the number of clusters

    Mutual information and Normalized Mutual information 互信息和标准化互信息

    Evaluation of clustering

  • 相关阅读:
    4种排序实践
    redis 应用场景和数据类型
    建立三个线程,A线程打印10次A,B线程打印10次B,C线程打印10次C
    分布式线程安全(redis、zookeeper、数据库)
    [专项]3道改错题
    kafka 业务埋点
    spring boot集成kafka
    kafka本地调试
    C语言 gets()和scanf()函数的区别
    EOF
  • 原文地址:https://www.cnblogs.com/kailugaji/p/11003974.html
Copyright © 2011-2022 走看看