逻辑回归
Logistic Regression
1 分类 Classification
首先我们来看看使用线性回归来解决分类会出现的问题。下图中,我们加入了一个训练集,产生的新的假设函数使得我们进行分类出现了错误;而且线性回归计算的结果往往会远小于0或者远大于1,这对于0,1分类变得很奇怪。可见线性回归并不适用与分类。下面介绍的逻辑回归的结果总是在[0,1],适用于分类,其实逻辑回归是一种分类算法。
2 假设函数Hypothesis Representation
逻辑回归假设函数为:
其中 是参数向量,特征向量。该函数叫做Logistic function(也叫做Sigmoid function)。函数图像如下,范围(0,1)
对于分类问题,它可以表示给定特征,参数属于类别1的概率,那么属于另一类的概率自然也就是。
3 决策边界Decision Boundary
根据逻辑函数的特性可以看出,当,当
我们把 就称为决策边界(Decision Boundary)。
我们可以通过多项式组合来制定更加复杂的决策边界。
4 代价函数 Cost Function
代价函数 ,在线性回归中,我们的,如果在逻辑回归中也使用这个函数,那么代价函数会是一个非凸函数,无法使用梯度下降去求解参数,所以我们要寻找一些函数使得代价函数为凸函数。
我们来看一下这个代价函数:
我们可以将它合并为一个公式:
5 梯度下降 Gradient Desecent
这里我们把代价函数写成这种形式是由最大似然估计得到了,当然也还有其他的形式。
梯度下降算法:
可以看到这里推导出来的公式看起来和线性回归梯度下降中推导出来的公式是一样的,但是要注意已经是sigmod函数而不是线性公式了,所以他们是两码事。
6 多分类 Multi-class classification: one-vs-all
我们可以降逻辑回归用于多分类问题,假设有K类,我们可以训练出K个分类器,每个分类器,将其中一种类别作为正类,其余的都是负类来训练,然后再预测时,类别属于概率最大的那个类。
7 正则化regularization
7.1 过拟合 overfitting
7.2 代价函数cost function
在代价函数中加入正则项,注意,这里对计算,而不是从开始。其中是正则项参数,如果太大,那么 会趋向于0,使得 ,导致欠拟合。
7.3 正则化线性回归 Regularized linear regression
代价函数:
Gradeint descent
Repeat{
}
这里是一个比1小一点点的数。
在线性回归中,我们除了梯度下降,还有正规方程的方法,正规方法加入正则项后:
7.4 正则化逻辑回归Regularized logistic regression
代价函数:
Gradient descent
Repeat{
}
实验代码
正则化逻辑回归
1 function [J, grad] = costFunctionReg(theta, X, y, lambda) 2 %COSTFUNCTIONREG Compute cost and gradient for logistic regression with regularization 3 % J = COSTFUNCTIONREG(theta, X, y, lambda) computes the cost of using 4 % theta as the parameter for regularized logistic regression and the 5 % gradient of the cost w.r.t. to the parameters. 6 7 % Initialize some useful values 8 m = length(y); % number of training examples 9 10 % You need to return the following variables correctly 11 J = 0; 12 grad = zeros(size(theta)); 13 14 % ====================== YOUR CODE HERE ====================== 15 % Instructions: Compute the cost of a particular choice of theta. 16 % You should set J to the cost. 17 % Compute the partial derivatives and set grad to the partial 18 % derivatives of the cost w.r.t. each parameter in theta 19 hx=sigmoid(X*theta); 20 Jnorm=(-1/m)*(y'*log(hx)+(1-y)'*log(1-hx)); 21 theta0=theta(1); %注意theta0不用正则化 22 theta1=theta(2:end); 23 Jreg=(lambda/(2*m))*sum(theta1.^2); 24 J=Jnorm+Jreg; 25 26 grad0=(hx-y)'*X(:,1)./m; 27 grad1=((hx-y)'*X(:,2:end)./m)'+(lambda/m).*theta1; 28 grad=[grad0;grad1]; 29 % ============================================================= 30 31 end
sigmoid函数
1 function g = sigmoid(z) 2 %SIGMOID Compute sigmoid functoon 3 % J = SIGMOID(z) computes the sigmoid of z. 4 5 % You need to return the following variables correctly 6 g = zeros(size(z)); 7 8 % ====================== YOUR CODE HERE ====================== 9 % Instructions: Compute the sigmoid of each value of z (z can be a matrix, 10 % vector or scalar). 11 g=1./(1+exp((-1).*z)); 12 % ============================================================= 13 end
特征
1 function out = mapFeature(X1, X2) 2 % MAPFEATURE Feature mapping function to polynomial features 3 % 4 % MAPFEATURE(X1, X2) maps the two input features 5 % to quadratic features used in the regularization exercise. 6 % 7 % Returns a new feature array with more features, comprising of 8 % X1, X2, X1.^2, X2.^2, X1*X2, X1*X2.^2, etc.. 9 % 10 % Inputs X1, X2 must be the same size 11 % 12 13 degree = 6;%6次函数 14 out = ones(size(X1(:,1))); 15 for i = 1:degree 16 for j = 0:i 17 out(:, end+1) = (X1.^(i-j)).*(X2.^j); 18 end 19 end 20 21 end
主函数
1 %% Machine Learning Online Class - Exercise 2: Logistic Regression 2 % 3 % Instructions 4 % ------------ 5 % 6 % This file contains code that helps you get started on the second part 7 % of the exercise which covers regularization with logistic regression. 8 % 9 % You will need to complete the following functions in this exericse: 10 % 11 % sigmoid.m 12 % costFunction.m 13 % predict.m 14 % costFunctionReg.m 15 % 16 % For this exercise, you will not need to change any code in this file, 17 % or any other files other than those mentioned above. 18 % 19 20 %% Initialization 21 clear ; close all; clc 22 23 %% Load Data 24 % The first two columns contains the X values and the third column 25 % contains the label (y). 26 27 data = load('ex2data2.txt'); 28 X = data(:, [1, 2]); y = data(:, 3); 29 30 plotData(X, y); 31 32 % Put some labels 33 hold on; 34 35 % Labels and Legend 36 xlabel('Microchip Test 1') 37 ylabel('Microchip Test 2') 38 39 % Specified in plot order 40 legend('y = 1', 'y = 0') 41 hold off; 42 43 44 %% =========== Part 1: Regularized Logistic Regression ============ 45 % In this part, you are given a dataset with data points that are not 46 % linearly separable. However, you would still like to use logistic 47 % regression to classify the data points. 48 % 49 % To do so, you introduce more features to use -- in particular, you add 50 % polynomial features to our data matrix (similar to polynomial 51 % regression). 52 % 53 54 % Add Polynomial Features 55 56 % Note that mapFeature also adds a column of ones for us, so the intercept 57 % term is handled 58 X = mapFeature(X(:,1), X(:,2)); 59 60 % Initialize fitting parameters 61 initial_theta = zeros(size(X, 2), 1); 62 63 % Set regularization parameter lambda to 1 64 lambda = 1; 65 66 % Compute and display initial cost and gradient for regularized logistic 67 % regression 68 [cost, grad] = costFunctionReg(initial_theta, X, y, lambda); 69 70 fprintf('Cost at initial theta (zeros): %f ', cost); 71 72 fprintf(' Program paused. Press enter to continue. '); 73 pause; 74 75 %% ============= Part 2: Regularization and Accuracies ============= 76 % Optional Exercise: 77 % In this part, you will get to try different values of lambda and 78 % see how regularization affects the decision coundart 79 % 80 % Try the following values of lambda (0, 1, 10, 100). 81 % 82 % How does the decision boundary change when you vary lambda? How does 83 % the training set accuracy vary? 84 % 85 86 % Initialize fitting parameters 87 initial_theta = zeros(size(X, 2), 1); 88 89 % Set regularization parameter lambda to 1 (you should vary this) 90 lambda = 1; 91 92 % Set Options 93 options = optimset('GradObj', 'on', 'MaxIter', 400); 94 95 % Optimize 96 [theta, J, exit_flag] = ... 97 fminunc(@(t)(costFunctionReg(t, X, y, lambda)), initial_theta, options); 98 99 % Plot Boundary 100 plotDecisionBoundary(theta, X, y); 101 hold on; 102 title(sprintf('lambda = %g', lambda)) 103 104 % Labels and Legend 105 xlabel('Microchip Test 1') 106 ylabel('Microchip Test 2') 107 108 legend('y = 1', 'y = 0', 'Decision boundary') 109 hold off; 110 111 % Compute accuracy on our training set 112 p = predict(theta, X); 113 114 fprintf('Train Accuracy: %f ', mean(double(p == y)) * 100);
实验中参数lamda的大小也十分重要,不同的lamda可能会过拟合或者欠拟合。