zoukankan      html  css  js  c++  java
  • python进行机器学习(五)之模型打分

    一、画出模型的残差值分布情况

    #!/usr/bin/python
    
    import pandas as pd
    import numpy as np
    import csv as csv
    import matplotlib
    import matplotlib.pyplot as plt
    from sklearn.linear_model import Ridge, RidgeCV, ElasticNet, LassoCV, LassoLarsCV
    from sklearn.model_selection import cross_val_score
    
    
    
    train = pd.read_csv('train.csv', header=0)        # Load the train file into a dataframe
    df = pd.get_dummies(train.iloc[:,1:-1])
    df = df.fillna(df.mean())
    
    X_train = df
    y = train.price
    
    
    
    def rmse_cv(model):
        rmse= np.sqrt(-cross_val_score(model, X_train, y, scoring="neg_mean_squared_error", cv = 3))
        return(rmse)
    
    #调用LassoCV函数,并进行交叉验证,默认cv=3
    model_lasso = LassoCV(alphas = [0.1,1,0.001, 0.0005]).fit(X_train, y)
    
    
    
    matplotlib.rcParams['figure.figsize'] = (6.0, 6.0)
    
    #将模型预测的值与真实值作为两列放在DataFrame里面
    preds = pd.DataFrame({"preds":model_lasso.predict(X_train), "true":y})
    
    
    #真实值与预测值之间的差值作为一个新列
    preds["residuals"] = preds["true"] - preds["preds"]
    
    print(preds)
    
    #预测值作为X轴,残差值作为y轴,画出图形
    preds.plot(x = "preds", y = "residuals",kind = "scatter")
    plt.show()
    
        

    注:本样例只是为了说明问题,只用了几行数据来预测画图

    正常来讲,一个好的模型,残差值应该分布比较集中,而且基本都在0上下稍微浮动,表明残差值都比较小。

  • 相关阅读:
    poj- 2528 Mayor's posters
    POJ 2631 Roads in the North (树的直径裸题)
    Quoit Design (白话--分治--平面点对问题)
    洛古 P1020 导弹拦截 (贪心+二分)
    D
    代理模式---动态代理之Cglib
    代理模式---动态代理之JDK
    开闭原则
    迪米特法则
    接口隔离原则
  • 原文地址:https://www.cnblogs.com/gczr/p/6836554.html
Copyright © 2011-2022 走看看