就我个人所知有太多的软件工程师尝试转行到数据科学家而盲目地使用机器学习框架来处理数据,例如,TensorFlow或者Apache Spark,但是对于这些框架背后的统计理论没有完全的理解。所以提起 statistical learning,这是机器学习的理论框架,是从统计学和泛函分析(functional analysis)的领域中发展出来的。
推荐的三本书:
- Intro to Statistical Learning (Hastie, Tibshirani, Witten, James)
- Doing Bayesian Data Analysis(Kruschke)
- Time Series Analysis and Applications (Shumway, Stoffer)
我在下面的这些内容上做了很多的练习:
Bayesian Analysis, Markov Chain Monte Carlo, Hierarchical Modeling, Supervised and Unsupervised Learning
推荐的课程:
Recently, I completed the Statistical Learning online course on Stanford Lagunita, which covers all the material in the Intro to Statistical Learning book I read in my Independent Study. Now being exposed to the content twice, I want to share the 10 statistical techniques from the book that I believe any data scientists should learn to be more effective in handling big datasets.