zoukankan      html  css  js  c++  java
  • GBDT,随机森林

    author:yangjing

    time:2018-10-22


    Gradient boosting decision tree

    1.main diea

    The main idea behind GBDT is to combine many simple models(also known as week kernels),like shallow trees.Each tree can only provide good predictions on part of the data,and so more and more trees are added to iteratively improve performance.

    2.parameters setting

    the algorithm is a bit more sensitive to parameter settings than random forests,but can provide better accuracy if the parameters are set correctly.

    • number of trees
      By increasing n_estimators ,also increasing the model complexity,as the model has more chances to correct misticks on the training set.
    • learning rate
      controns how strongly each tree tries to correct the misticks of the previous trees.A higher learning rate means each tree can make stronger correctinos,allowing for more complex models.
    • max_depth
      or alternatively max_leaf_nodes.Usyally max_depth is set very low for gradient-boosted models,often not deeper than five splits.

    3.code

    from sklearn.ensemble import GradientBoostingClassifier
    from sklearn.model_selection import train_test_split
    from sklearn.datasets import load_breast_cancer
    cancer=load_breast_cancer()
    X_train,X_test,y_train,y_test=train_test_split(cancer.data,cancer.target,random_state=0)
    gbrt=GradientBoostingClassifier(random_state=0)
    gbrt.fit(X_train,y_train)
    gbrt.score(X_test,y_test)
    

    In [261]: X_train,X_test,y_train,y_test=train_test_split(cancer.data,cancer.target,random_state=0)
         ...: gbrt=GradientBoostingClassifier(random_state=0)
         ...: gbrt.fit(X_train,y_train)
         ...: gbrt.score(X_test,y_test)
         ...:
    Out[261]: 0.958041958041958
    
    In [262]: gbrt.feature_importances_
    Out[262]:
    array([0.01337291, 0.04201687, 0.0208666 , 0.01889077, 0.01028091,
           0.03215986, 0.02074619, 0.11678956, 0.00820024, 0.00074312,
           0.02042134, 0.00680047, 0.02023052, 0.03907398, 0.05406751,
           0.04795741, 0.02358101, 0.00934718, 0.00593481, 0.0239241 ,
           0.05354265, 0.06160083, 0.10961728, 0.07395201, 0.01867851,
           0.03842953, 0.01915824, 0.07128703, 0.01773659, 0.00059199])
    
    In [263]: gbrt.learning_rate
    Out[263]: 0.1
    
    In [264]: gbrt.max_depth
    Out[264]: 3
    
    In [265]: len(gbrt.estimators_)
    Out[266]: 100
    
    In [272]: gbrt.get_params()
    Out[272]:
    {'criterion': 'friedman_mse',
     'init': None,
     'learning_rate': 0.1,
     'loss': 'deviance',
     'max_depth': 3,
     'max_features': None,
     'max_leaf_nodes': None,
     'min_impurity_decrease': 0.0,
     'min_impurity_split': None,
     'min_samples_leaf': 1,
     'min_samples_split': 2,
     'min_weight_fraction_leaf': 0.0,
     'n_estimators': 100,
     'presort': 'auto',
     'random_state': 0,
     'subsample': 1.0,
     'verbose': 0,
     'warm_start': False}
    

    Random forest

    In [230]: y
    Out[230]:
    array([1, 1, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 1, 0, 1, 1, 0, 1, 0, 1, 0,
           0, 0, 1, 1, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0,
           1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1,
           0, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0, 1, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0,
           0, 1, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0], dtype=int64)
    
    In [231]: axes.ravel()
    Out[231]:
    array([<matplotlib.axes._subplots.AxesSubplot object at 0x000001F46F3694A8>,
           <matplotlib.axes._subplots.AxesSubplot object at 0x000001F46C099F28>,
           <matplotlib.axes._subplots.AxesSubplot object at 0x000001F46E6E3BE0>,
           <matplotlib.axes._subplots.AxesSubplot object at 0x000001F46BEB72E8>,
           <matplotlib.axes._subplots.AxesSubplot object at 0x000001F46ED67198>,
           <matplotlib.axes._subplots.AxesSubplot object at 0x000001F46F292C88>],
          dtype=object)
    
    In [232]: from sklearn.model_selection import train_test_split
    
    In [233]: X_trai,X_test,y_train,y_test=train_test_split(X,y,stratify=y,random_state=42)
    
    In [234]: len(X_trai)
    Out[234]: 75
    
    In [235]: fores=RandomForestClassifier(n_estimators=5,random_state=2)
    
    In [236]: fores.fit(X_trai,y_train)
    Out[236]:
    RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
                max_depth=None, max_features='auto', max_leaf_nodes=None,
                min_impurity_decrease=0.0, min_impurity_split=None,
                min_samples_leaf=1, min_samples_split=2,
                min_weight_fraction_leaf=0.0, n_estimators=5, n_jobs=1,
                oob_score=False, random_state=2, verbose=0, warm_start=False)
    
    In [237]: fores.score(X_test,y_test)
    Out[237]: 0.92
    
    In [238]: fores.estimators_
    Out[238]:
    [DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=None,
                 max_features='auto', max_leaf_nodes=None,
                 min_impurity_decrease=0.0, min_impurity_split=None,
                 min_samples_leaf=1, min_samples_split=2,
                 min_weight_fraction_leaf=0.0, presort=False,
                 random_state=1872583848, splitter='best'),
     DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=None,
                 max_features='auto', max_leaf_nodes=None,
                 min_impurity_decrease=0.0, min_impurity_split=None,
                 min_samples_leaf=1, min_samples_split=2,
                 min_weight_fraction_leaf=0.0, presort=False,
                 random_state=794921487, splitter='best'),
     DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=None,
                 max_features='auto', max_leaf_nodes=None,
                 min_impurity_decrease=0.0, min_impurity_split=None,
                 min_samples_leaf=1, min_samples_split=2,
                 min_weight_fraction_leaf=0.0, presort=False,
                 random_state=111352301, splitter='best'),
     DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=None,
                 max_features='auto', max_leaf_nodes=None,
                 min_impurity_decrease=0.0, min_impurity_split=None,
                 min_samples_leaf=1, min_samples_split=2,
                 min_weight_fraction_leaf=0.0, presort=False,
                 random_state=1853453896, splitter='best'),
     DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=None,
                 max_features='auto', max_leaf_nodes=None,
                 min_impurity_decrease=0.0, min_impurity_split=None,
                 min_samples_leaf=1, min_samples_split=2,
                 min_weight_fraction_leaf=0.0, presort=False,
                 random_state=213298710, splitter='best')]
    

  • 相关阅读:
    Java可变参数
    为什么static方法中不可以调用非static方法
    用注解@DelcareParents实现引用增强
    在SpringBoot中用SpringAOP实现日志记录功能
    梳理一下我理解的aop
    包裹iframe的div与iframe存在高度差的问题解决方案
    非跨域情况下iframe 高度自适应的问题解决(一)
    flex布局较之float布局的优点新发现
    webpack4 动态导入文件 dynamic-import 报错的解决方法
    vue chrome 浏览器调试工具devtools插件安装
  • 原文地址:https://www.cnblogs.com/yangjing000/p/9832234.html
Copyright © 2011-2022 走看看