zoukankan      html  css  js  c++  java
  • 线性回归的改进-岭回归

    带有L2正则化的线性回归-岭回归

    岭回归,其实也是一种线性回归。只不过在算法建立回归方程时候,加上正则化的限制,从而达到解决过拟合的效果

    API

    sklearn.linear_model.Ridge(alpha=1.0, fit_intercept=True,solver="auto", normalize=False)

    • 具有l2正则化的线性回归
    • alpha:正则化力度,也叫 λ
      • λ取值:0~1 1~10
    • solver:会根据数据自动选择优化方法
      • sag:如果数据集、特征都比较大,选择该随机梯度下降优化
    • normalize:数据是否进行标准化
      • normalize=False:可以在fit之前调用preprocessing.StandardScaler标准化数据
    • Ridge.coef_:回归权重
    • Ridge.intercept_:回归偏置

    Ridge方法相当于SGDRegressor(penalty='l2', loss="squared_loss"),只不过SGDRegressor实现了一个普通的随机梯度下降学习,推荐使用Ridge(实现了SAG)

    sklearn.linear_model.RidgeCV(_BaseRidgeCV, RegressorMixin)

    • 具有l2正则化的线性回归,可以进行交叉验证
    • coef_:回归系数

    观察正则化程度的变化,对结果的影响?

    • 正则化力度越大,权重系数会越小
    • 正则化力度越小,权重系数会越大

    波士顿房价预测

    from sklearn.datasets import load_boston
    from sklearn.model_selection import train_test_split
    from sklearn.preprocessing import StandardScaler
    from sklearn.linear_model import LinearRegression, SGDRegressor, Ridge
    from sklearn.metrics import mean_squared_error
    from sklearn.externals import joblib
    def linear3():
        """
        岭回归对波士顿房价进行预测
        :return:
        """
        # 1)获取数据
        boston = load_boston()
        print("特征数量:
    ", boston.data.shape)
    
        # 2)划分数据集
        x_train, x_test, y_train, y_test = train_test_split(boston.data, boston.target, random_state=22)
    
        # 3)标准化
        transfer = StandardScaler()
        x_train = transfer.fit_transform(x_train)
        x_test = transfer.transform(x_test)
    
        # 4)预估器
        estimator = Ridge(alpha=0.5, max_iter=10000)
        estimator.fit(x_train, y_train)
    
        # 保存模型
        # joblib.dump(estimator, "my_ridge.pkl")
        # 加载模型
        #estimator = joblib.load("my_ridge.pkl")
    
        # 5)得出模型
        print("岭回归-权重系数为:
    ", estimator.coef_)
        print("岭回归-偏置为:
    ", estimator.intercept_)
    
        # 6)模型评估
        y_predict = estimator.predict(x_test)
        print("预测房价:
    ", y_predict)
        error = mean_squared_error(y_test, y_predict)
        print("岭回归-均方误差为:
    ", error)
    
        return None
    if __name__ == "__main__":
        # 代码3:岭回归对波士顿房价进行预测
        linear3()

    结果:

    岭回归-权重系数为:
     [-0.62710135  1.13221555 -0.07373898  0.74492864 -1.93983515  2.71141843
     -0.07982198 -3.27753496  2.44876703 -1.81107644 -1.74796456  0.88083243
     -3.91211699]
    岭回归-偏置为:
     22.62137203166228
    预测房价:
     [28.23082349 31.50636545 21.12739377 32.65793823 20.02076945 19.06632771
     21.106687   19.61624365 19.63161548 32.86596512 20.9946695  27.50329913
     15.55414648 19.79639417 36.88392371 18.80672342  9.38096    18.50907253
     30.67484295 24.30753141 19.0666843  34.09564382 29.80095002 17.51949727
     34.8916544  26.5394645  34.68264723 27.42856108 19.09405963 14.98997618
     30.8505874  15.81996969 37.18247113  7.85916465 16.25653448 17.15490009
      7.48867279 19.99147768 40.57329959 28.95128807 25.25723034 17.73738109
     38.75700749  6.87711291 21.78043375 25.27159224 20.45456114 20.48220948
     17.25258857 26.1375367   8.5448374  27.49204889 30.58183066 16.58438621
      9.37182303 35.52269097 32.24958654 21.87431027 17.60876103 22.08124517
     23.50114904 24.09591554 20.15605099 38.49857046 24.64026646 19.75933465
     13.91713858  6.78030217 42.04984214 21.92558236 16.8702938  22.59592875
     40.74980559 21.4284924  36.88064128 27.18855416 21.04326386 20.36536628
     25.36109432 22.27869444 31.14592486 20.39487869 23.99757481 31.54428168
     26.76210157 20.89486664 29.07215993 21.99603204 26.30599891 20.11183257
     25.47912071 24.0792631  19.89111149 16.56247916 15.22770226 18.38342191
     24.82070397 16.60156656 20.86675004 26.71162923 20.74443479 17.8825254
     24.28515984 23.37007961 21.58413976 36.79386382 15.88357121 21.47915185
     32.79931234 33.71603437 20.62134398 26.83678658 22.68850452 17.37312422
     21.67296898 21.67559608 27.66601539 25.0712154  23.73692967 14.64799906
     15.21577315  3.82030283 29.17847194 20.66853036 22.33184243 28.0180608
     28.56771983]
    岭回归-均方误差为:
     20.644810227653515
    
    Process finished with exit code 0
  • 相关阅读:
    LeetCode(81): 搜索旋转排序数组 II
    2018年6月8日论文阅读
    LeetCode(80):删除排序数组中的重复项 II
    LeetCode(79): 单词搜索
    LeetCode(78):子集
    LeetCode(77):组合
    LeetCode(76): 最小覆盖子串
    LeetCode(75):分类颜色
    LeetCode(74):搜索二维矩阵
    linux 两个查找工具 locate,find
  • 原文地址:https://www.cnblogs.com/a155-/p/14410337.html
Copyright © 2011-2022 走看看