zoukankan      html  css  js  c++  java
  • 调参

    在利用gridseachcv进行调参时,其中关于scoring可以填的参数在SKlearn中没有写清楚,就自己找了下,具体如下:

    parameters = {'eps':[0.3,0.4,0.5,0.6], 'min_samples':[20,30,40]}
    db = DBSCAN(metric='cosine', algorithm='brute').fit(xx)
    grid = GridSearchCV(db, parameters, cv=5, scoring='adjusted_rand_score')
    ScoringFunctionComment
    Classification    
    ‘accuracy’ metrics.accuracy_score  
    ‘average_precision’ metrics.average_precision_score  
    ‘f1’ metrics.f1_score for binary targets
    ‘f1_micro’ metrics.f1_score micro-averaged
    ‘f1_macro’ metrics.f1_score macro-averaged
    ‘f1_weighted’ metrics.f1_score weighted average
    ‘f1_samples’ metrics.f1_score by multilabel sample
    ‘neg_log_loss’ metrics.log_loss requires predict_proba support
    ‘precision’ etc. metrics.precision_score suffixes apply as with ‘f1’
    ‘recall’ etc. metrics.recall_score suffixes apply as with ‘f1’
    ‘roc_auc’ metrics.roc_auc_score  
    Clustering    
    ‘adjusted_rand_score’ metrics.adjusted_rand_score  
    Regression    
    ‘neg_mean_absolute_error’ metrics.mean_absolute_error  
    ‘neg_mean_squared_error’ metrics.mean_squared_error  
    ‘neg_median_absolute_error’ metrics.median_absolute_error  
    ‘r2’ metrics.r2_score  

    ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

    但后面听另外一个课的时候老师说,对于特征较多的模型不建议用gridSearch ,耗时,而且只是在train上表现好的参数,不一定在跨时间验证集上表现好

    建议设计调参 ,设计的目标是跨时间验证集的KS要最大化,同时跨时间验证集和训练集的KS差距最小

    调参方法

    • offks + 0.8(offks - devks)最大化
    import pandas as pd
    from sklearn.metrics import roc_auc_score,roc_curve,auc
    from sklearn.model_selection import train_test_split
    from sklearn import metrics
    from sklearn.linear_model import LogisticRegression
    import numpy as np
    import random
    import math
    import lightgbm as lgb
    from sklearn.model_selection import train_test_split
    
    data = pd.read_csv('Acard.txt')
    
    train = data[data.obs_mth != '2018-11-30'].reset_index().copy()
    val = data[data.obs_mth == '2018-11-30'].reset_index().copy()
    feature_lst = ['person_info','finance_info','credit_info','act_info']
    x = train[feature_lst]
    y = train['bad_ind']
    
    val_x =  val[feature_lst]
    val_y = val['bad_ind']
    
    
    train_x,test_x,train_y,test_y = train_test_split(x,y,random_state=0,test_size=0.2)
    
    #改变我们想去调整的参数为value,设置调参区间
    min_value = 40
    max_value = 60
    for value in  range(min_value,max_value+1):
        best_omd = -1
        best_value = -1
        best_ks=[]
        def  lgb_test(train_x,train_y,test_x,test_y):
            clf =lgb.LGBMClassifier(boosting_type = 'gbdt',
                                   objective = 'binary',
                                   metric = 'auc',
                                   learning_rate = 0.1,
                                   n_estimators = value,
                                   max_depth = 5,
                                   num_leaves = 20,
                                   max_bin = 45,
                                   min_data_in_leaf = 6,
                                   bagging_fraction = 0.6,
                                   bagging_freq = 0,
                                   feature_fraction = 0.8,
                                   silent=True
                                   )
            clf.fit(train_x,train_y,eval_set = [(train_x,train_y),(test_x,test_y)],eval_metric = 'auc')
            return clf,clf.best_score_['valid_1']['auc'],
        lgb_model , lgb_auc  = lgb_test(train_x,train_y,test_x,test_y)
    
        y_pred = lgb_model.predict_proba(x)[:,1]
        fpr_lgb_train,tpr_lgb_train,_ = roc_curve(y,y_pred)
        train_ks = abs(fpr_lgb_train - tpr_lgb_train).max()
    
        y_pred = lgb_model.predict_proba(val_x)[:,1]
        fpr_lgb,tpr_lgb,_ = roc_curve(val_y,y_pred)
        val_ks = abs(fpr_lgb - tpr_lgb).max()
        
        Omd= val_ks + 0.8*(val_ks - train_ks)
        if Omd>best_omd:
            best_omd = Omd
            best_value = value
            best_ks = [train_ks,val_ks]
    print('best_value:',best_value)
    print('best_ks:',best_ks)
  • 相关阅读:
    后缀名文件说明
    转行小白成长路-java篇
    转行小白成长路-java篇
    转行小白成长路-java篇
    转行小白成长路-java篇
    转行小白成长路-java篇
    转行小白成长路-java篇
    转行小白成长路-java篇
    转行小白成长路-java篇
    转行小白成长路-java篇
  • 原文地址:https://www.cnblogs.com/fionacai/p/7125249.html
Copyright © 2011-2022 走看看