zoukankan      html  css  js  c++  java
  • 李宏毅老师机器学习第二课classification

    1.Classification

    classification:   x->function->class n

    how to do classification?

    train data for classification:

      (x1,y^1(x2,y^2 (x3,y^3) (x4,y^4)

    ideal alternatives:

    *function (model):

      x->g(x)->g(x)>0------->class 1

         ->g(x)<0-------->class 2

    *loss function

          L(f)=∑δ(f(xn)!=y^n)      the number of times f get incorrect results on training data

    *find the best function

     example:perceptron,svm

    2.Gaussian distribution

    *Gaussian distribution  fuction      fu,Σ(x)=(2π)-1/2Σ-1/2exp(-1/2(x-u)TΣ-1(x-u))

    input vector x     output:probability of sampling x

    the shape of the function determines by vector mean u and covariance matrix Σ

    *maxinum likeihood

    the Gaussian with any mean u and covariance matrix Σ can generate these point but with different likehood

    likehood of a Gaussian with mean u and covariance matrix Σ = the probability of the Gaussion sample x1,x2,x3.....xn

    loss function       L(u,Σ)=fu,Σ(x1)fu,Σ(x2)fu,Σ(x3).......fu,Σ(x4)

    find best parameters    u*,Σ*=argmaxL(u,Σ)     u*=1/n∑xi     Σ*=1/n∑(xi-u*)(xi-u*)T

    *classification with Gaussion distribution

    Naive Bayes     P(c1|x)=P(x|c1)P(c1)/P(x|c2)P(c2)+P(x|c1)P(c1)

     P(x|c1):fuc1c1(x)             P(x|c2):fuc2c2(x)

    *Modifying model

    use different uc1,uc2,but use the same Σc1, Σc2,due to less parameters, Σ parameters number proportional to (x parameter)2

    Modifying           ∑new=(m/m+n)∑c1+(n/m+n)∑c2

    *model flaw

    use Naive Bayes classifier,all the dimensions are independent

    *posterior probability:

    P(c1|x)=P(x|c1)P(c1)/P(x|c1)P(c1)+P(x|c2)P(c2)=1/1+P(x|c2)P(c2)/P(x|c1)P(c1)=1/1+exp(-z)=σ(z)=sigmod(z)

    z=ln(P(x|c1)P(c1)/P(x|c2)P(c2))

    *mathematical derivation

    z=wx+b

    3.Logistic Regression

    Pw,b(c1|x)=σ(z)    z=ln(P(x|c1)P(c1)/P(x|c2)P(c2))=wx+b  σ(z)=1/1+exp(-z)

    *step1  function set:     fw,b(x)=Pw,b(c1|x)

    *step 2 loss function of Logistic Regression

    train data    x      x1 x2 x3 x4.....xn                     x1 x2 x3 x4.....xn

                       y^    c1 c c1 c1...... c2       ——>      1   0    1    1 ......0

    Assume the data is generated based on fw,b(x)=Pw,b(c1|x)

    L(w,b)=fw,b(x1)(1-fw,b(x2))fw,b(x3)fw,b(x4).....(1-fw,b(xn))

    L(w,b)=Πfw,b(xi)    w*,b*=argmaxL(w,b)=argmin(-lnL(w,b))

    -lnL(w,b)=-lnfw,b(x1)-ln(1-fw,b(x2))-lnfw,b(x3)-lnfw,b(x4)........-ln(1-fw,b(xn))

                  =∑-(y^lnfw,b(xi)+(1-y^)(ln(1-fw,b(xi))))     cross entropy between two Bernoulli distribution

    *step3find the best function

    δlnfw,b(xn)/δwi=(1-σ(z))xi

    δln(1-fw,b(xn))/δwi=-σ(z)

    δlnL(w,b)/δwi=∑-(y^n-fw,b(xn))xin

    4.Multi-class classification

    *softmax

    c1:w1,b1     z1=w1+b1       ——> ez1/∑ezj

    c2:w2,b2      z2=w2+b2       ——>ez2/∑ezj

    c3:w3,b3      z3=w3+b     ——>ez2/∑ezj

    softmax     zi——>ezi/∑ezi

    probability of softmax:  0<yi<1   ∑yi=1

         ——>z1 ——>softmax——>y1      loss fuction    y^1=[1 0 0]T

    x   ——>z2 ——>softmax——>y2     <————>   y^2=[1 0 0]T

         ——>z3 ——>softmax——>y3      -∑y^ilnyi         y^3=[1 0 0]T

    *once Logistic Regression can transformat feature

    *cascading logistic regression models

    x1 ——>z1——>softmax——>x1'

                                                            ——>z3——>softmax——>y

    x2 ——>z2——>softmax——>x2'

               feature transformat          Neual                 classification

     

  • 相关阅读:
    Global Citizenship
    Eng Stu
    说说
    C#编程远程控制机械手臂
    切割系统
    C#编码 画图控件
    编程Sourceforge
    C#编程线程
    空间点的几何关系
    一台普通电脑通过设置连接到公司网络
  • 原文地址:https://www.cnblogs.com/SAM-CJM/p/13932096.html
Copyright © 2011-2022 走看看