zoukankan      html  css  js  c++  java
  • Logistic Regression cost function and Maximum likehood estimate

    Logistic Regression cost function

    The original form is y^, here we simplified by using y’ because of the latex grammar.

    I f y = 1 : p ( y ∣ x ) = y ′ If y = 1: p(y|x) = y' Ify=1:p(yx)=y
    I f y = 0 : p ( y ∣ x ) = 1 − y ′ If y = 0: p(y|x) = 1-y' Ify=0:p(yx)=1y

    S u m m a r i z e − > p ( y ∣ x ) = y ′ y ( 1 − y ) 1 − y Summarize -> p(y|x) = y'^y (1-y)^{1-y} Summarize>p(yx)=yy(1y)1y

    This one equation can express that:
    I f y = 1 : p ( y ∣ x ) = y ′ If y = 1: p(y|x) = y' Ify=1:p(yx)=y
    I f y = 0 : p ( y ∣ x ) = 1 − y ′ If y = 0: p(y|x) = 1-y' Ify=0:p(yx)=1y

    The log function is a strictly monotonically increasing.
    Maximizing l o g ( p ( y ∣ x ) ) log(p(y|x)) log(p(yx)) give you a similar result that is optimizing p(y|x) and if you compute log of p(y|x)
    ->
    l o g   p ( y ∣ x ) = l o g ( y ′ y ( 1 − y ) 1 − y ) = y l o g y ′ + ( 1 − y ) l o g ( 1 − y ′ ) log p(y|x)=log (y'^y (1-y)^{1-y}) =ylogy' +(1-y)log(1-y') log p(yx)=log(yy(1y)1y)=ylogy+(1y)log(1y)
    = − l ( y ′ , y ) =-l(y',y) =l(y,y) note: l represents loss function here.
    Minimizing the loss function corresponds to maximum the log of the probability.
    This is what the loss funcion on a single example looks like.

    Cost on m examples

    l o g   p ( l a b e l s   i n   t h e t r a i n i n g   s e t ) = l o g ∏ i = 1 m p ( y ′ i , y ′ ) log p(labels in the training set) = log prod_{i=1}^mp(y'^i,y') log p(labels in thetraining set)=logi=1mp(yi,y)
    l o g   p ( . . . ) = ∑ i = 1 m l o g   p ( y i ∣ x i ) = − ∑ i = 1 m l ( y ′ i , y i ) log p(...) = sum_{i=1}^mlog p(y^i|x^i)=-sum_{i=1}^ml(y'^i,y^i) log p(...)=i=1mlog p(yixi)=i=1ml(yi,yi)

    Maximum likelihood estimation

    And so in statistics, there’s a principle called the principal of maximum likelihood estimation,which just means choose the parameters that maximizes this thing(refer to above).

    Cost function:
    Because we want to minimize the cost, instead of maximizing likelihood we’ve got rid of negative. And then finally for convenience, we make sure that our quantities are better scale, we just add a 1 over m extra scaling factor there.
    J ( w , b ) = 1 m ∑ i = 1 m l ( y ′ i , y i ) J(w,b) =frac{1}{m}sum_{i=1}^ml(y'^i,y^i) J(w,b)=m1i=1ml(yi,yi)

    But to summarize, by minimizing this cost function J(w,b), we’re really carrying out maximum likelihood estimation Under the assumption that our training examples were IID or identically independently distributed.

    Reference

    https://mooc.study.163.com/learn/2001281002?tid=2001392029#/learn/content?type=detail&id=2001702014
    Maximum likelihood Estimate
    https://blog.csdn.net/zengxiantao1994/article/details/72787849

  • 相关阅读:
    [自定义服务器控件] 第三步:CheckBoxList。
    Flex构建WebService应用
    ServletActionContext.getRequest().getServletPath()错误的解决方法
    MyEclipse 8.6 安装 SVN 插件
    [转]hibernateHQL总结
    Struts入门实例
    错误:“Cannot load JDBC driver class 'com.mysql.jdbc.Driver”的解决方法
    Target runtime Apache Tomcat v6.0 is not defined.错误解决方法
    错误:“已有打开的与此命令相关联的 DataReader,必须首先将它关闭”的解决方法。
    [转]删除Windows 7 C:/Users/用户名/AppData里面的文件
  • 原文地址:https://www.cnblogs.com/wanghongze95/p/13842554.html
Copyright © 2011-2022 走看看