（原创）Stanford Machine Learning (by Andrew NG) --- (week 1) Linear Regression

zoukankan html css js c++ java

（原创）Stanford Machine Learning (by Andrew NG) --- (week 1) Linear Regression
Andrew NG的Machine learning课程地址为：https://www.coursera.org/course/ml

在Linear Regression部分出现了一些新的名词，这些名词在后续课程中会频繁出现：

Cost Function Linear Regression Gradient Descent Normal Equation Feature Scaling Mean normalization

损失函数线性回归梯度下降正规方程特征归一化均值标准化

Model Representation
- m: number of training examples
- x⁽ⁱ⁾: input (features) of i^th training example
- x_j⁽ⁱ⁾: value of feature j in i^th training example
- y⁽ⁱ⁾: “output” variable / “target” variable of i^th training example
- n: number of features
- θ: parameters
- Hypothesis: h_θ(x) = θ₀+ θ₁x₁+ θ₂x₂ + … +θ_nx_n
Cost Function

IDEA: Choose θso that h_θ(x) is close to y for our training examples (x, y).

A.Linear Regression with One Variable Cost Function

Cost Function:

Goal:

Contour Plot:



B.Linear Regression with Multiple Variable Cost Function

Cost Function:

Goal:

Gradient Descent

Outline

Gradient Descent Algorithm

迭代过程收敛图可能如下：

(此为等高线图，中间为最小值点，图中蓝色弧线为可能的收敛路径。)

Learning Rate α:

1) If α is too small, gradient descent can be slow to converge;

      2) If α is too large, gradient descent may not decrease on every iteration or may not converge;

      3) For sufficiently small α , J(θ) should decrease on every iteration;

Choose Learning Rate α: Debug, 0.001, 0.003, 0.006, 0.01, 0.03, 0.06, 0.1, 0.3, 0.6, 1.0;

“Batch” Gradient Descent: Each step of gradient descent uses all the training examples;

“Stochastic” gradient descent: Each step of gradient descent uses only one training examples.

Normal Equation

IDEA: Method to solve for θ analytically.

for every j, then

Restriction: Normal Equation does not work when (X^TX) is non-invertible.

PS: 当矩阵为满秩矩阵时，该矩阵可逆。列向量（feature）线性无关且行向量（样本）线性无关的个数大于列向量的个数（特征个数n）.

Gradient Descent Algorithm VS. Normal Equation

Gradient Descent:
- Need to choose α;
- Needs many iterations;
- Works well even when n is large; (n > 1000 is appropriate)
Normal Equation:
- No need to choose α;
- Don’t need to iterate;
- Need to compute (X^TX)^-1 ;
- Slow if n is very large. (n < 1000 is OK)
Feature Scaling

IDEA: Make sure features are on a similar scale.

好处: 减少迭代次数，有利于快速收敛

Example: If we need to get every feature into approximately a -1 ≤ x_i ≤ 1 range, feature values located in [-3, 3] or [-1/3, 1/3] fields are acceptable.

Mean normalization:

HOMEWORK

好了，既然看完了视频课程，就来做一下作业吧，下面是Linear Regression部分作业的核心代码：

1.computeCost.m/computeCostMulti.m
J=1/(2*m)*sum((theta'*X'-y').^2);
2.gradientDescent.m/gradientDescentMulti.m
h=X*theta-y; v=X'*h; v=v*alpha/m; theta1=theta; theta=theta-v;
查看全文

相关阅读:
oop第四次课作业总结
 我罗斯方块
 getline()、cin.getline()、cin.get()的区分和应用
 20200328上记笔记
 Markdown使用方法
 A Lovely Message Board
NOIP2020游记
 题解 Codeforces Round #678 (Div. 2) (CF1436)
CSP-S2020 第一轮认证（初赛）游记
 题解 Codeforces Round #670 (Div. 2) (CF1406)

原文地址：https://www.cnblogs.com/tec-vegetables/p/3696399.html

Cost Function	Linear Regression	Gradient Descent	Normal Equation	Feature Scaling	Mean normalization
损失函数	线性回归	梯度下降	正规方程	特征归一化	均值标准化