https://www.bilibili.com/video/av94519857?p=5
https://www.bilibili.com/video/av94519857?p=6
https://www.bilibili.com/video/av94519857?p=7







为什么SGD比GD收敛更快?

Feature Scaling


GD的数学

GD的限制
- stuck at saddle point
- stuck at local minima
- very slow at the plateau
