zoukankan      html  css  js  c++  java
  • logistic回归

    (原创文章,转载请注明出处!)

    用logistic回归来解决分类问题。

    模型的值域是[0,1],用0.5作为分类的阈值。

    模型的输出是:P(y=1|x;θ),即:对给定的输入x,和确定的参数θ,事件“y=1”的概率。

    那么可以选择sigmoid函数: 1/(1+e-z) ,z∈R,值域为[0,1],在logistic回归中 z=θTx。(也可以选择其他函数)

     即: P(y=1|x;θ) = 1/(1+e-z),其中z=θTx 。

    学习目标函数(代价函数)的选择

    目标函数会使用最优化的算法来进行训练,求解最优解。

    凸函数在最优化时,局部最优化解就是全局最优化解,方便求解,所以在选择目标函数时,考虑选择一个凸函数。选择如下的损失函数:

    对y=1, -log[1/(1+e-z)] ;  对y=0,  -log[1-(1/(1+e-z))];(对数损失函数)。这两个函数在sigmoid函数的值域上都是单调凸函数。

    将两个损失函数合并起来:L(x,y;θ) = -ylog[1/(1+e-z)] - (1-y)log[1-(1/(1+e-z))]        

    而训练数据是由多个样本点组成的,所以训练数据的平均损失为:

    L(x,y;θ) = -(1/m)∑i=1m{y(i)log[1/(1+e-z)] + (1-y(i))log[1-(1/(1+e-z))]}, z=θTx(i), m是样本点的个数

    (logistic回归模型参数的估计问题,本质上是点估计,可以使用极大似然估计来(MLE)估计模型中的参数。

    似然函数写成:L = ∏i=1m{y(i)[1/(1+e-z)] + (1-y(i))[1 - (1/(1+e-z))]}, z=θTx(i),

    对数似然函数写成:L =∑i=1m{y(i)log[1/(1+e-z)] + (1-y(i))log[1-(1/(1+e-z))]}, z=θTx(i)  )

    使用样本的平均损失来代替总体分布的期望损失,由大数定律知,在样本点充分多的情况下是可行的。

    如果样本点不是充分的多怎么办?这个时候还使用平均损失代替期望损失就可能出现overfitting。

    解决的方法是正则化,给平均损失函数增加惩罚项:

    L(x,y;θ) = -(1/m)∑i=1m{y(i)log[1/(1+e-z)] + (1-y(i))log[1-(1/(1+e-z))]} + (λ/2m)[∑j=1nj2)], n是参数个数。

    惩罚项是在代价函数中加入的模型复杂度(比如:参数多)带来的惩罚,用以将指定的参数将来训练得到的值减小(减小参数对模型的影响),以此来控制模型的复杂度,避免overfitting。(除了上面平均损失函数中添加的2-范数惩罚项,还有其它形式的惩罚项,如:1-范数惩罚项)

    代价函数中的第一项来最小化模型误差,第二项来控制模型复杂度带来的影响,系数λ≥0,用来平衡代价函数中的第一项和第二项;另外样本点的增加能减少惩罚项带来的影响(m ↑,(λ/2m) ↓),而受惩罚的参数将会变大(如果将惩罚项系数分母中的m去掉,则就消除这种影响)。

    可以使用梯度下降,或者牛顿法来求解此最优化问题。采用牛顿法,参数的计算公式为:

    Θ = Θ - H-1Θ,其中H-1是平均损失函数对Θ的Hessian矩阵的逆矩阵,∇Θ是平均损失函数对Θ的梯度向量。

    使用R编程实现的logistic回归算法。

      1 ##
      2 ##Function: logistic_fun_sigmoid
      3 logistic_fun_sigmoid <- function (z) 
      4 {
      5     h_theta <- 1.0/(1.0 + exp((-1)*z))
      6     h_theta
      7 }
      8 ##
      9 ##Function: logistic_fun_map_feature
     10 logistic_fun_map_feature <- function(u, v, degree)
     11 {
     12     result <- matrix(1, nrow = length(u), ncol=1)    
     13     for (i in 1:degree) {
     14         for (j in i:0) {
     15             result <- cbind(result, u^j * v^(i-j))
     16         }
     17     }
     18     result
     19 }
     20 
     21 ##
     22 ## load data
     23 ##
     24 logisticx <- read.table(file="x.dat", sep=",")
     25 attr(logisticx, "names") <- c("u", "v")
     26 logisticy <- read.table(file="y.dat")
     27 
     28 logistic_x <- as.matrix(logisticx) ;  attr(logistic_x,"dimnames")[2] <- NULL
     29 logistic_y <- as.matrix(logisticy) ;  attr(logistic_y,"dimnames")[2] <- NULL
     30 logistic_y_factor <- as.factor( logistic_y )
     31 
     32 ## plot the samples' data
     33 plot( x = c(floor(min(logistic_x[,1])), ceiling(max(logistic_x[,1]))),
     34       y = c(floor(min(logistic_x[,2])), ceiling(max(logistic_x[,2]))),
     35       xlab = "u", ylab = "v", type = "n")  # setting up coord. system
     36 points(logistic_x[,1], logistic_x[,2], col = as.integer(logistic_y)+1, pch = as.integer(logistic_y)+1)
     37 legend("topright", legend = c("y=1", "y=0"), col = c(2, 1), pch = c(2,1))
     38 
     39 ##
     40 ## map original x to 28-feature vector
     41 ##
     42 logistic_x_mapped <- logistic_fun_map_feature( logistic_x[,1], logistic_x[,2], 6 )
     43 
     44 
     45 logistic_lamda                  <-     0
     46 logistic_num_of_samples         <-     dim(logistic_x_mapped)[1]
     47 logistic_num_of_features        <-     dim(logistic_x_mapped)[2]
     48 logistic_theta                  <-     matrix(0, nrow=logistic_num_of_features, ncol=1 )   # 28 rows, 1 column
     49 logistic_h_theta                <-     logistic_fun_sigmoid( logistic_x_mapped %*% logistic_theta )
     50 logistic_gradient_of_J_theta    <-     numeric(logistic_num_of_features)
     51 logistic_diag_28by28            <-     diag( c(0, rep(1, logistic_num_of_features - 1)) )  # logistic_diag_28by28[1,1] = 0
     52 logistic_H                      <-     matrix(0, nrow=logistic_num_of_features, ncol=logistic_num_of_features) # hessian matrix
     53 logistic_J_theta                <-     0
     54 logistic_J_theta_array          <-     numeric(1)
     55 logistic_last_J_theta           <-     -1
     56 logistic_num_of_loop            <-     0
     57 logistic_theta_list             <-     list(logistic_theta, logistic_theta, logistic_theta)
     58 logistic_lamda_vec              <-     c(0, 1, 10)
     59 
     60 
     61 for (index_i in 1:length(logistic_theta_list) )  {
     62     logistic_lamda <- logistic_lamda_vec[index_i]  #  0, 1, 10   
     63     
     64     logistic_num_of_samples         <-     dim(logistic_x_mapped)[1]
     65     logistic_num_of_features        <-     dim(logistic_x_mapped)[2]
     66     logistic_theta                  <-     matrix(0, nrow=logistic_num_of_features, ncol=1 )   # 28 rows, 1 column
     67     logistic_h_theta                <-     logistic_fun_sigmoid( logistic_x_mapped %*% logistic_theta )
     68     logistic_gradient_of_J_theta    <-     numeric(logistic_num_of_features)
     69     logistic_diag_28by28            <-     diag( c(0, rep(1, logistic_num_of_features - 1)) )  # logistic_diag_28by28[1,1] = 0
     70     logistic_H                      <-     matrix(0, nrow=logistic_num_of_features, ncol=logistic_num_of_features) # hessian matrix
     71     logistic_J_theta                <-     0
     72     logistic_J_theta_array          <-     numeric(1)
     73     logistic_last_J_theta           <-     -1
     74     logistic_num_of_loop            <-     0
     75 
     76     
     77     while (logistic_J_theta != logistic_last_J_theta) {
     78         logistic_num_of_loop <- logistic_num_of_loop + 1
     79         cat("----------------------------------------------------------loop ", logistic_num_of_loop, " begin 
    ")
     80         logistic_last_J_theta <- logistic_J_theta
     81         
     82         ## calculate gradient of the J of theta
     83         logistic_gradient_of_J_theta[1] <-  crossprod( (logistic_h_theta - logistic_y[,1]), logistic_x_mapped[,1] )   /   logistic_num_of_samples
     84         for (i in 2:logistic_num_of_features) {
     85             logistic_gradient_of_J_theta[i] <-  ( crossprod( (logistic_h_theta - logistic_y[,1]), logistic_x_mapped[,i] )
     86                                               + logistic_theta[i,1] * logistic_lamda          )  /  logistic_num_of_samples
     87         }
     88         #print("logistic_gradient_of_J_theta") ; browser()
     89 
     90         ## calculate Hessian matrix
     91         for (i in 1:logistic_num_of_samples) {
     92             logistic_H <- logistic_h_theta[i,1]*(1-logistic_h_theta[i,1])*tcrossprod( as.matrix(logistic_x_mapped[i,]) )  + logistic_H
     93         }
     94         logistic_H <- (logistic_H + logistic_lamda * logistic_diag_28by28) / logistic_num_of_samples
     95         
     96         ## update theta
     97         logistic_theta <- logistic_theta - solve(logistic_H) %*% as.matrix(logistic_gradient_of_J_theta)
     98         
     99         
    100         ## update h of theta with the latest theta
    101         logistic_h_theta <- logistic_fun_sigmoid( logistic_x_mapped %*% logistic_theta )
    102         
    103         ## calculate J of theta
    104         logistic_J_theta <- ( 
    105                           sum( as.vector(log(logistic_h_theta)) * as.vector(logistic_y)
    106                                + as.vector(log(1- logistic_h_theta)) * as.vector((1 - logistic_y))  )
    107                           + sum(logistic_theta^2) * logistic_lamda / 2 
    108                         )     /     logistic_num_of_samples
    109         
    110         logistic_J_theta_array[logistic_num_of_loop] <- logistic_J_theta
    111         cat("----------------------------------------------------------loop ", logistic_num_of_loop, " end 
    ")
    112     }
    113     logistic_theta_list[[index_i]] <- logistic_theta
    114 }
    115 
    116 
    117 ##
    118 ## plot graphic
    119 ##
    120 
    121 ## one_theta is 28-by-1 matrix
    122 logistic_plot_contour <- function(one_theta) 
    123 {
    124     u <- seq(-1, 1.5, by=.05)
    125     v <- seq(-1, 1.5, by=.05)
    126     z <- matrix(0, nrow=length(u), ncol=length(v))
    127     
    128     for (i in 1:length(u)) {
    129         for (j in 1:length(v)) {
    130             z[j,i] <- logistic_fun_map_feature( u[i], v[j], 6 ) %*% one_theta
    131         }
    132     }
    133     
    134     contour(u, v, t(z), col = "green", add = TRUE, method = "simple", nlevels=1, lty="solid")
    135 }
    136 
    137 op <- par( mfrow = c(1, 3), # 1 x 3 pictures on one plot
    138            pty = "s" ) 
    139 for (index_i in 1:length(logistic_theta_list) )  {
    140     plot( x = c(-1, 1.5),
    141           y = c(-1, 1.5),
    142           xlab = "u", ylab = "v", type = "n")  # setting up coord. system
    143     points(logistic_x[,1], logistic_x[,2], col = as.integer(logistic_y)+1, pch = as.integer(logistic_y)+1)
    144     legend("topright", legend = c("y=1", "y=0","Decision boundary"), col = c(2, 1, "green" ), pch = c(2,1,"-"))
    145     
    146     logistic_theta  <- logistic_theta_list[[index_i]]
    147     logistic_plot_contour(logistic_theta)    
    148 }
    149 par(op)
    View Code
  • 相关阅读:
    python os
    python random
    python 内置函数
    python 介绍,环境配置
    利用Python批量重命名文件夹下文件
    go语言学习基础-编译文件
    Django-on_delete
    Django]models中定义的choices 字典在页面中显示值
    Django-User
    12.1 flask基础之简单实用
  • 原文地址:https://www.cnblogs.com/activeshj/p/3898225.html
Copyright © 2011-2022 走看看