zoukankan      html  css  js  c++  java
  • 机器学习--线性回归的实践

    1.鉴于之前提到的房价的问题,使用线性回归该如何解决呢?

    首先我们假设有如下的数据方便计算机进行学习

    面积卧室价格
    2140 3 400
    1600 3 330
    2400 3 369
    1416 2 232
    ... ... ...

    根据之前的演算过程(使房价与面积和卧室数目线性相关):

    hθ(x)=θ0 +θ1x1 +θ2x2 

    θ为计算时的权重,x1为房间面积,x2为我是数目。
    为了降低计算的模糊程度,将hθ(x)变成h(x)来进行计算,这时计算公式为:

    n为学习次数。


    2. 有了相关数据之后就要开始训练算法了(ex1a_linreg.m

     1 <span style="font-size:24px;">%
     2 %This exercise uses a data from the UCI repository:
     3 % Bache, K. & Lichman, M. (2013). UCI Machine Learning Repository
     4 % http://archive.ics.uci.edu/ml
     5 % Irvine, CA: University of California, School of Information and Computer Science.
     6 %
     7 %Data created by:
     8 % Harrison, D. and Rubinfeld, D.L.
     9 % ''Hedonic prices and the demand for clean air''
    10 % J. Environ. Economics & Management, vol.5, 81-102, 1978.
    11 %
    12 addpath ../common
    13 addpath ../common/minFunc_2012/minFunc
    14 addpath ../common/minFunc_2012/minFunc/compiled
    15 
    16 % Load housing data from file.
    17 data = load('housing.data');  % housing data  506x14 
    18 data=data'; % put examples in columns  14x506  一般这里将每一个样本放在每一列
    19 
    20 % Include a row of 1s as an additional intercept feature.
    21 data = [ ones(1,size(data,2)); data ];  % 15x506    增加intercept term 
    22 
    23 % Shuffle examples. 乱序 目的在于之后能够随机选取training set和test sets
    24 data = data(:, randperm(size(data,2))); %randperm(n)用于随机生成1到n的排列
    25 
    26 % Split into train and test sets
    27 % The last row of 'data' is the median home price.
    28 train.X = data(1:end-1,1:400);   %选择前400个样本来训练,后面的样本来做测试
    29 train.y = data(end,1:400);
    30 
    31 test.X = data(1:end-1,401:end);
    32 test.y = data(end,401:end);
    33 
    34 m=size(train.X,2);  %训练样本数量
    35 n=size(train.X,1);  %每个样本的变量个数
    36 
    37 % Initialize the coefficient vector theta to random values.
    38 theta = rand(n,1); %随机生成初始theta 每个值在(0,1)之间
    39 
    40 % Run the minFunc optimizer with linear_regression.m as the objective.
    41 %
    42 % TODO:  Implement the linear regression objective and gradient computations
    43 % in linear_regression.m
    44 %
    45 tic; %Start a stopwatch timer. 开始计时
    46 options = struct('MaxIter', 200);
    47 theta = minFunc(@linear_regression, theta, options, train.X, train.y);
    48 fprintf('Optimization took %f seconds.
    ', toc); %toc Read the stopwatch timer
    49 
    50 % Run minFunc with linear_regression_vec.m as the objective.
    51 %
    52 % TODO:  Implement linear regression in linear_regression_vec.m
    53 % using MATLAB's vectorization features to speed up your code.
    54 % Compare the running time for your linear_regression.m and
    55 % linear_regression_vec.m implementations.
    56 %
    57 % Uncomment the lines below to run your vectorized code.
    58 %Re-initialize parameters
    59 %theta = rand(n,1);
    60 %tic;
    61 %theta = minFunc(@linear_regression_vec, theta, options, train.X, train.y);
    62 %fprintf('Optimization took %f seconds.
    ', toc);
    63 
    64 % Plot predicted prices and actual prices from training set.
    65 actual_prices = train.y;
    66 predicted_prices = theta'*train.X;
    67 
    68 % Print out root-mean-squared (RMS) training error.平方根误差
    69 train_rms=sqrt(mean((predicted_prices - actual_prices).^2));
    70 fprintf('RMS training error: %f
    ', train_rms);
    71 
    72 % Print out test RMS error
    73 actual_prices = test.y;
    74 predicted_prices = theta'*test.X;
    75 test_rms=sqrt(mean((predicted_prices - actual_prices).^2));
    76 fprintf('RMS testing error: %f
    ', test_rms);
    77 
    78 
    79 % Plot predictions on test data.
    80 plot_prices=true;
    81 if (plot_prices)
    82   [actual_prices,I] = sort(actual_prices); %从小到大排序价格
    83   predicted_prices=predicted_prices(I);
    84   plot(actual_prices, 'rx');
    85   hold on;
    86   plot(predicted_prices,'bx');
    87   legend('Actual Price', 'Predicted Price');
    88   xlabel('House #');
    89   ylabel('House price ($1000s)');
    90 end</span>

    3.为了保证运算的准确性我们需要对cost function函数进行运行,得到最小成本函数(本次练习时我添加的代码)

     1     % Step 1 : 计算成本函数
     2     for i = 1:m  
     3         f = f + (theta' * X(:,i) - y(i))^2;  
     4     end  
     5     f = 1/2*f;  
     6 
     7     % Step 2:计算gradient并储存在g中   
     8 
     9     for j = 1:n  
    10         for i = 1:m  
    11             g(j) = g(j) + X(j,i)*(theta' * X(:,i) - y(i));  
    12     end  
  • 相关阅读:
    SQL Server -使用表触发器记录表插入,更新,删除行数
    利用DataSet部分功能实现网站登录
    SQL Server排序的时候使null值排在最后
    大数据操作:删除和去重
    C#匿名类型序列化、反序列化
    Js调用asp.net后台代码
    C# Excel
    ajax的介绍
    MySQL数据库的知识总结
    ASP.NET MVC 入门系列教程
  • 原文地址:https://www.cnblogs.com/dinghing154/p/5566239.html
Copyright © 2011-2022 走看看