zoukankan      html  css  js  c++  java
  • Machine Learning--week2 多元线性回归、梯度下降改进、特征缩放、均值归一化、多项式回归、正规方程与设计矩阵

    对于multiple features 的问题(设有n个feature),hypothesis 应该改写成

    [mathit{h} _{ heta}(x) = heta_{0} + heta_{1}cdot x_{1}+ heta_{2}cdot x_{2}+ heta_{3}cdot x_{3}+dots+ heta_{n}cdot x_{n} ]

    其中:

    [x=egin{bmatrix}x_{1}\ x_{2}\ x_{3}\ vdots \ x_{n} end{bmatrix}in {Bbb R}^n ;,; heta=egin{bmatrix} heta_{1}\ heta_{2}\ heta_{3}\ vdots \ heta_{n} end{bmatrix}in {Bbb R}^n ]

    为便于表达,可令(x_{0}=1),则

    [mathit{h} _{ heta}(x) = heta_{0}cdot x_{0} + heta_{1}cdot x_{1}+ heta_{2}cdot x_{2}+ heta_{3}cdot x_{3}+dots+ heta_{n}cdot x_{n} ]

    [quad; x=egin{bmatrix}x_{0} \ x_{1}\ x_{2}\ x_{3}\ vdots \ x_{n} end{bmatrix}in {Bbb R}^{n+1};,; heta=egin{bmatrix} heta_{0} \ heta_{1}\ heta_{2}\ heta_{3}\ vdots \ heta_{n} end{bmatrix}in {Bbb R}^{n+1} ]

    即:

    [h_{ heta}(x) = heta^{ m T}x ]

    multivariate linear regression(多元线性回归)(h_{ heta}(x) = heta^{ m T}x)

    cost function:

    [J( heta_{0}, heta_{1}) = frac{1}{2m} sum_{i=1}^{m} (h_{ heta}(x^{(i)})-y^{(i)})^{2} = frac{1}{2m} sum_{i=1}^{m} ( heta^{ m T}x^{(i)}-y^{(i)})^{2} = frac{1}{2m} sum_{i=1}^{m} (sum_{j=0}^{n} heta_{j}x_{j}^{(i)}-y^{(i)})^{2} ]

    ( herefore) 梯度下降算法的循环内容变成( heta_{j}; ext{:= } heta_{j} - alphafrac{partial}{partial heta_{j}}J( heta) qquad (j = 0,1,2...,n))

    ( herefore) gradient descent algorism((n ge 1), simultaneously update ( heta_{j}) for (j=0,1,2,3dots,n)):

    [ ext{repeat until convergence}{qquadqquadqquadqquadqquad\ qquadqquadqquadqquad heta_{j}; ext{:= } heta_{j} - alphafrac{1}{m} sum_{i=1}^{m} (h_{ heta}(x^{(i)})-y^{(i)})x_{j}^{(i)} qquad (j = 0,1,2...,n)\ }qquadqquadqquadqquadqquadqquadqquadqquadqquadqquad ]

    [实现的注意事项:比如作业中的某一道题,循环判断条件我用了(sumDelta heta_{j}^2>C),其中(C)是且必须是某个很小的常数(如0.0000000001),不然出来的结果不准确,而(alpha)可以相对大点但也不能太大(如0.001)]

    技巧:

    Feature Scaling(特征缩放)

    ​ 对x各项元素(也就是feature)除以这个feature的最大值,得到一个百分比的数,这样x就不会因为各元素之间的数量级差异导致梯度算法的性能变差

    ​ 也就是说,把x各项元素(也就是feature)的值约束到([-1,1])之间

    ​ 范围在 ([-3,3]) ~ ([-frac{1}{3}, frac{1}{3}])的feature是可以接受的,而过小或者过大的feature则需要进行feature scaling

    Mean Normalization(均值归一化)

    ​ replace (x_{i}​) with (x_{i}-mu_{i}​) to make features have approximately zero mean (But do not apply to (x_{0} = 1​), 因为所有的(x_0=1​)故不可能平均值为0)

    ​ 说明:也就是把feature的均值归一为0,其中(mu_{i})(x_i)的平均值

    (e.g.:x_1= frac{size-1000}{2000},quad x_2 = frac{#bedrooms-2}{5},qquad s.t.\, -0.5le x_1le 0.5,; -0.5le x_2le 0.5)
    表达出来即是:

    [x_i = frac{x_i-mu_i}{s_i} ]

    ​ 其中 (mu_i ext{ is average value, }s_i ext{ is the range of the feature[ == max(feature) - min(feature)] or feature's Standard Deviation})

    【啊好吧,到这里讲了如何选择(alpha),不用我自己摸索了】

    Declare convergence if (J( heta)) decreases by less than (10^{-3}) in one iteration. (#循环的判断条件)

    To choose (alpha), try (dots,0.001,0.003,0.01,0.03,0.1,0.3,1,dots) ((x_{n+1} = x_n * 3)) (#(alpha)的选择)

    Try to pick the largest possible value, or the value just slightly smaller than the largest reasonable value that I found

    统合特征,比如用面积代替长和宽

    polynomial regression(多项式回归)

    例:

    (egin{align}h_{ heta}(x) &= heta_0 + heta_1cdot x_1+ heta_2cdot x_2+ heta_3cdot x_3\&= heta_0 + heta_1cdot (size)+ heta_2cdot (size)^2+ heta_3cdot (size)^3 end{align})

    由于到后面不同指数的size的值相差甚远,因此需要对其进行均值归一化

    其实指数不一定要上升,对于只增不减的某些函数而言,也可以选用:

    (egin{align}h_{ heta}(x) &= heta_0 + heta_1cdot (size)+ heta_2cdot sqrt{(size)} end{align})

    其均值归一化过程(已知①②③):
    ①model is (egin{align}h_{ heta}(x) &= heta_0 + heta_1cdot (size)+ heta_2cdot sqrt{(size)} end{align})

    ​ ②size range from 1 to 1000(feet(^2))

    ​ ③implement this by fitting a model (egin{align}h_{ heta}(x) &= heta_0 + heta_1cdot x_1+ heta_2cdot x_2 end{align})

    ( herefore) (x_1,x_2) should satisfy (x_1 = frac{size}{1000}, quad x_2=frac{sqrt{(size)}}{sqrt{1000}})

    One important thing to keep in mind is, if you choose your features this way then feature scaling becomes very important.

    Normal Equation(正规方程)

    可以直接求出( heta)的最优解

    其实就是,求导,解出导数为0

    一般直接想到的解法是:分别求解所有变量的偏导等于零:(frac{partial}{partial heta_j}f( heta) = 0)

    其实可以这么做:

    (X = egin{bmatrix}x_{10} & x_{11} &x_{12} & cdots & x_{1n} \ x_{20} & x_{21} &x_{22} & cdots & x_{2n} \ vdots & vdots &vdots & ddots & vdots \ x_{m0} & x_{m1} &x_{m2} & cdots & x_{mn} end{bmatrix}quad,quad y = egin{bmatrix} y_1\ y_2 \ vdots \ y_m end{bmatrix} ​)

    (large heta = (X^TX)^{-1}X^Ty)

    $ x^{(i)} = egin{bmatrix} x_0^{(i)} x_1^{(i)} x_2^{(i)} vdots x_n^{(i)} end{bmatrix}$ 则 design matrix(设计矩阵) (X = egin{bmatrix} (x^{(1)})^T \ (x^{(2)})^T\ (x^{(3)})^T \ vdots \ (x^{(m)})^T end{bmatrix})

    pinv(x'*x)*x'*y
    

    (不需要归一化特征变量)

    与Gradient Descent 的比较

    Gradient Descent Normal Equation
    Need to choose alpha No need to choose alpha
    Needs many iterations No need to iterate
    (O (kn^2)) (O (n^3)), need to calculate inverse of (X^TX)
    Works well when n is large Slow if n is very large

    选择:

    (lg(n)ge 4: ext{ gradient descent} \lg(n)le 4: ext{ normal equation})

    计算Normal Equation要(X^TX)是可逆的,但是如果它不可逆(Non-invertible)呢?

    Octave 中的pinv()inv()都能求逆,但是pinv()能展现数学上的过程,即使矩阵不可逆

    如果(X^TX)不可逆:

    • 首先看看都没有redundant features, 比如一个feature是单位为feet,而另一个feature仅仅是那个feet单位换算成m,有就删掉redundant的feature
    • check if I may have too many features. 若是, I would either delete some features if I can bear to use fewer features or else I would consider using regularization.
  • 相关阅读:
    ORA-14404
    ORA-00845
    ORA-00054
    oracle-11g-配置dataguard
    ORACLE 11G 配置DG 报ORA-10458、ORA-01152、ORA-01110
    Python:if __name__ == '__main__'
    HDFS-Shell 文件操作
    HDFS 概述
    PL/SQL Developer
    CentOS7 图形化方式安装 Oracle 18c 单实例
  • 原文地址:https://www.cnblogs.com/khunkin/p/10199370.html
Copyright © 2011-2022 走看看