zoukankan      html  css  js  c++  java
  • Notes : <Handson ML with Sklearn & TF> Chapter 1

    <Hands-on ML with Sklearn & TF>  Chapter 1

    1. what is ml
      1. from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E.
    2. what problems to solve
      1. exist solution but a lot of hand-tuning/rules
      2. no good solutions using a traditional approach
      3. fluctuating environment
      4. get insight about conplex problem and large data
    3. type
      1. whether or not trained with human supervision(supervised, unsupervised, semisupervised, reinforcement)
      2. whether or not learn incrementally on the fly(online, batch)
      3. whether or not work by simply comparing new data point vs know data point,or instance detect pattern in training data and build a prediction model(instace-based, model-based)
    4. (un)supervision learning
      1. supervision : include the desired solution called labels
        • classification,K-Nearest Neighbors, Linear Regression, Logistic Regression, SVM, Decision Trees, Random Forests, Neural network
      2. unsupervision : without labels
        • Clustering : k-means, HCA, ecpectation maximization
        • Viualization and dimensionality reducation : PCA, kernal PCA, LLE, t-SNE
        • Association rule learning : Apriori, Eclat
      3. semisupervision
        • unsupervision --> supervision
      4. reinforcement : an agent in context
        1. observe the environment
        2. select and perform action
        3. get rewards in return
    5. batch/online learning
      1. batch : offline, to known new data need to train a new version from scratch one the full dataset
      2. online : incremental learning : challenge is bad data
    6. instance-based/model-based
      1. instance-based : the system learns the examples by heart, then the generalizes to the new cases using a similarity measure
      2. model-based : studied the data; select a model; train it on the training data; applied the model to make predictions on new cases
    7. Challenge
      1. insufficient quantity of training data
      2. nonrepresentative training data
      3. poor-quality data
      4. irrelevant features : feature selection; feature extraction; creating new feature by gathering new data
      5. overfitting : regularization -> hyperparameter
      6. underfitting : powerful model; better feature; reduce construct
    8. Testing and Validating
      1. 80% of data for training 20% for testing
      2. validating : best model and hyperparameter for training set unliking perform as well on new data
        1. train multiple models with various hyperparameters using training data
        2. to get generatlization error , select the model and hyperparamaters that perform best on the validation set
      3. cross-validating : the training set is split into complementary subsets, ans each model is trained against a different conbination of thse subsets and validated against the remain parts.

       Example 1-1:

    import matplotlib
    import matplotlib.pyplot as plt
    import numpy as np
    import pandas as pd
    import sklearn.linear_model
    
    #load the data
    oecd_bli = pd.read_csv("datasets/lifesat/oecd_bli_2015.csv",thousands=',')
    gdp_per_capita = pd.read_csv("datasets/lifesat/gdp_per_capita.csv",thousands=',',delimiter='\t',encoding='latin1',na_values='n/a')
    
    #prepare the data
    def prepare_country_stats(oecd_bli, gdp_per_capita):
        #get the pandas dataframe of GDP per capita and Life satisfaction
        oecd_bli = oecd_bli[oecd_bli["INEQUALITY"]=="TOT"]
        oecd_bli = oecd_bli.pivot(index="Country", columns="Indicator", values="Value")
        gdp_per_capita.rename(columns={"2015": "GDP per capita"}, inplace=True)
        gdp_per_capita.set_index("Country", inplace=True)
        full_country_stats = pd.merge(left=oecd_bli, right=gdp_per_capita, left_index=True, right_index=True)
        return full_country_stats[["GDP per capita", 'Life satisfaction']]
        
    country_stats = prepare_country_stats(oecd_bli, gdp_per_capita) 
    #regularization remove_indices = [0, 1, 6, 8, 33, 34, 35] country_stats.to_csv(
    'country_stats.csv',encoding='utf-8') X = np.c_[country_stats["GDP per capita"]] Y = np.c_[country_stats["Life satisfaction"]] #Visualize the data country_stats.plot(kind='scatter',x='GDP per capita',y='Life satisfaction') #Select a linear model lin_reg_model = sklearn.linear_model.LinearRegression() #Train the model lin_reg_model.fit(X, Y) #plot Regression model t0, t1 = lin_reg_model.intercept_[0], lin_reg_model.coef_[0][0] X = np.linspace(0, 110000, 1000) plt.plot(X, t0 + t1 * X, "k") plt.show() #Make a prediction for Cyprus X_new=[[22587]] print(lin_reg_model.predict(X_new))

          

    课后练习挺好的

  • 相关阅读:
    shell脚本执行错误:#!/bin/bash: No such file or directory
    odoo 主题中怎么添加多个代码块 (snippets)
    怎么使用 python 的 jieba 中文分词模块从百万数据中找到用户搜索最多的关键字
    odoo 网站上线后,怎么修改网站主题?
    阿里菜鸟网络春招 【部门直推】【22届校招实习】
    java jfreechart 折线图数据量大,X 轴刻度密密麻麻显示不下,或者省略号的解决办法
    java jfreechart 时序图横坐标显示,设置步数初始坐标不展示问题解决
    springboot2 整合 redis 并通过 aop 实现自定义注解
    java 线程池 Executors,ExecutorService
    git免密码clone push,多个git账号配置
  • 原文地址:https://www.cnblogs.com/yaoz/p/6858417.html
Copyright © 2011-2022 走看看