https://www.bilibili.com/video/av94519857?p=5
https://www.bilibili.com/video/av94519857?p=6
https://www.bilibili.com/video/av94519857?p=7
为什么SGD比GD收敛更快?
Feature Scaling
GD的数学
GD的限制
- stuck at saddle point
- stuck at local minima
- very slow at the plateau