zoukankan      html  css  js  c++  java
  • PRML 3: Linear Discriminants

      As an alternative for generative models and discriminative models, a discriminant directly assigns a feature vector to one of K classes. One of the simplest discriminant function for 2-class problems should be something like $y(vec{x})=sign(vec{w}^Tcdotvec{x}+b)$, where $vec{w}$ is the pending parameter vector and b is a pending bias. Here $vec{x}$ is different from the one we talk about in regression models since it no more comprises a bias term.

      To obtain proper parameters, we can draw on a simple algorithm called Perceptron, which gurantees all the training data shall be correctly classified. This is done by minimizing an error function, each of whose terms should be something like $-(vec{w}^Tcdotvec{x}_n+b)cdot t_n$, in an iterative way, and this procedure will never terminate if the problem is not linearly separable.

     1 function w = percept(X,t)
     2     % Peceptron Algorithm for Linear Classification
     3     % Precondtion: X is a set of data columns,
     4     %       row vector t is the labels of X (+1 or -1)
     5     % Postcondition: w is the linear model parameter
     6     %       such that y = sign(w'* x)
     7     [m,n] = size(X);
     8     w = zeros(m,1);
     9     cnt = 0;    % consecutive hit number
    10     cur = 1;    % current data item
    11     while (cnt<n)
    12         % until no misclassification exists
    13         if (t(cur)*w'*X(:,cur)<=0)
    14             % error correction, step = 0.2
    15             w = w + 0.2*t(cur)*X(:,cur);
    16             cnt = 0;
    17         else
    18             cnt = cnt+1;
    19         end
    20         cur = mod(cur,n)+1;
    21     end
    22 end

       Fisher's Linear Discriminant is another linear classifier, which makes every endeavor to maximize the class separation by choosing a deisirable direction on which the projections of two mean vectors have the largest distance. This target is attained by finding a maximum point for the Fisher criterion: $J(vec{w})=frac{(m_2-m_1)^2}{S_1^2+S_2^2}$, where $m_1$, $m_2$ and $S_1$, $S_2$ are the means and variances of the projected data respectively.

     1 function w = fisher(X,t)
     2    % Fisher's Linear Discriminant for 2-class problems
     3    % Precondtion: X is a set of data columns,
     4    %       row vector t is the labels of X (+1 or -1)
     5    % Postcondition: w is the linear model parameter
     6    %       such that y = sign(w'* x)
     7    d = size(X,1)-1;
     8    % calculate the mean vectors of the 2 classes:
     9    m1 = zeros(d,1);
    10    m2 = zeros(d,1);
    11    n1 = 0; n2 = 0;
    12    for i = 1:size(t,2)
    13        if (t(1,i)>0)
    14            n1 = n1+1;
    15            m1 = m1+X(1:d,i);
    16        else
    17            n2 = n2+1;
    18            m2 = m2+X(1:d,i);
    19        end
    20    end
    21    m1 = m1/n1;
    22    m2 = m2/n2;
    23    % calculate the within-class covariance matrix:
    24    Sw = zeros(d);
    25    for i = 1:size(t,2)
    26        if (t(1,i)>0)
    27            Sw = Sw+(X(1:d,i)-m1)*(X(1:d,i)-m1)';
    28        else
    29            Sw = Sw+(X(1:d,i)-m2)*(X(1:d,i)-m2)';
    30        end
    31    end
    32    w = Sw(m1-m2);
    33    % choose a proper threshold:
    34    w0Min = inf;
    35    w0Max = -inf;
    36    for i = 1:size(t,2)
    37        y = w'*X(1:d,i);
    38        if (t(1,i)>0 & y+w0Max<0)
    39            w0Max = -y;
    40        elseif (t(1,i)<0 & y+w0Min>0)
    41            w0Min = -y;
    42        end
    43    end
    44    w = [w;(w0Min+w0Max)/2];
    45 end

      Support Vector Machine (SVM) is another linear discriminant classifier, whose objective is to maximize the geometric margin of the training set, i.e. $gamma = mathop{min}_n frac{vec{w}^Tvec{x}_n+b}{||vec{w}||}$. This is equivalent to the optimization problem of minimizing $frac{1}{2} ||vec{w}||^2$ given the restrictions $y_n(vec{w}^Tcdotvec{x}_n+b)geq 1$ for $n=1,2,...,N$:

     1 function w = supvect(X,t)
     2     % Support Vector Machine for Linear Classification
     3     % Precondtion: X is a set of data columns,
     4     %       row vector t is the labels of X (+1 or -1)
     5     % Postcondition: w is the linear model parameter
     6     %       such that y = sign(w'* x)
     7     [m,n] = size(X);
     8     x0 = zeros(m,1);
     9     A = zeros(n,m);
    10     for i = 1:n
    11         A(i,:) = -t(i)*X(:,i)';
    12     end
    13     b = -ones(n,1);
    14     w = fmincon('norm',x0,A,b);
    15 end

      This is also equivalent to finding $minmathop{max}_{vec{w},b} frac{1}{2}||vec{w}||+sum_{n=1}^Nalpha_n[1-y_n(vec{w}^Tvec{x}_n+b)]$, where $alpha_ngeq 0$ for $n=1,2,...,N$ are Lagrangian multipliers. Since the problem satisfies Karush-Kuhn-Tucker (KKT) Conditions, we can solve its dual problem instead, which seems relatively easier. Also, we can refomulate it with safe margins so as to fit for non-linearly separable datasets:

        $minfrac{1}{2}sum_{i=1}^Nsum_{j=1}^Nalpha_ialpha_j y_i y_j(vec{x}_i^Tvec{x}_j)-sum_{i=1}^Nalpha_i$

        $s.t.sum_{n=1}^Nalpha_n y_n=0$  and $0leqalpha_nleq C$ for $n=1,2,...,N$

      This problem can be solved by using SMO algorithm, where we iteratively use some heuristics to select two $alpha$s and re-optimize the problem with respect to them. As we shall see, the optimal $vec{w}$ should be a linear combination of the support vectors, and thus we can make a prediction for new data with only the support vectors: $y=sign(sum_{nin SV}alpha_n y_n(vec{x}_n^Tvec{x}_{N+1})+b)$.

    References:

      1. Bishop, Christopher M. Pattern Recognition and Machine Learning [M]. Singapore: Springer, 2006

  • 相关阅读:
    HDevEngine in .NET Applications MultiThreading
    C# 打开以对话框,获取文件夹路径 、文件的路径、文件名
    C#设计模式总结
    C#使用Aspose.Cells导出Excel简单实现
    [相机选型] 双目视觉系统的器材选型和搭建
    08 Django组件-Forms组件
    MySql数据库基础知识
    MySql数据库多表操作
    补充01 Django 类视图
    07 Django组件-中间件
  • 原文地址:https://www.cnblogs.com/DevinZ/p/4472477.html
Copyright © 2011-2022 走看看