zoukankan      html  css  js  c++  java
  • Kaggle预测房价知识点 02模型

    Stacked Regressions : Top 4% on LeaderBoard

    1. 常用的机器学习库

    from sklearn.linear_model import ElasticNet, Lasso,  BayesianRidge, LassoLarsIC
    from sklearn.ensemble import RandomForestRegressor,  GradientBoostingRegressor
    from sklearn.kernel_ridge import KernelRidge
    from sklearn.pipeline import make_pipeline
    from sklearn.preprocessing import RobustScaler
    from sklearn.base import BaseEstimator, TransformerMixin, RegressorMixin, clone
    from sklearn.model_selection import KFold, cross_val_score, train_test_split
    from sklearn.metrics import mean_squared_error
    import xgboost as xgb
    import lightgbm as lgb
    

    2. 交叉验证

    #Validation function
    n_folds = 5
    
    def rmsle_cv(model):
        kf = KFold(n_folds, shuffle=True, random_state=42).get_n_splits(train.values)
        rmse= np.sqrt(-cross_val_score(model, train.values, y_train, scoring="neg_mean_squared_error", cv = kf))
        return(rmse)
    

    3. 机器学习模型使用

    model_xgb = xgb.XGBRegressor(colsample_bytree=0.4603, gamma=0.0468, 
                                 learning_rate=0.05, max_depth=3, 
                                 min_child_weight=1.7817, n_estimators=2200,
                                 reg_alpha=0.4640, reg_lambda=0.8571,
                                 subsample=0.5213, silent=1,
                                 random_state =7, nthread = -1)
    
    model_lgb = lgb.LGBMRegressor(objective='regression',num_leaves=5,
                                  learning_rate=0.05, n_estimators=720,
                                  max_bin = 55, bagging_fraction = 0.8,
                                  bagging_freq = 5, feature_fraction = 0.2319,
                                  feature_fraction_seed=9, bagging_seed=9,
                                  min_data_in_leaf =6, min_sum_hessian_in_leaf = 11)
    

    4. 运行,输出结果

    score = rmsle_cv(model_xgb)
    print("Xgboost score: {:.4f} ({:.4f})
    ".format(score.mean(), score.std()))
    score = rmsle_cv(model_lgb)
    print("LGBM score: {:.4f} ({:.4f})
    " .format(score.mean(), score.std()))
    

    5. 模型堆叠

    待续...

  • 相关阅读:
    【Python】 命名空间与LEGB规则
    【Python&数据结构】 抽象数据类型 Python类机制和异常
    【算法】 算法和数据结构绪论
    【网络】 数据链路层&物理层笔记
    svn -- svn图标解析
    svn -- svn数据仓库
    svn -- svn安装与配置
    svn -- svn简介
    mysql -- 远程访问mysql的解决方案
    css3 -- 自动生成序号(不使用JS,可任意排序)
  • 原文地址:https://www.cnblogs.com/geoli/p/12752868.html
Copyright © 2011-2022 走看看