zoukankan      html  css  js  c++  java
  • xgboost 特征选择,筛选特征的正要性

    import pandas as pd
    import xgboost as xgb
    import operator
    from matplotlib import pylab as plt
    
    def ceate_feature_map(features):
        outfile = open('xgb.fmap', 'w')
        i = 0
        for feat in features:
            outfile.write('{0}	{1}	q
    '.format(i, feat))
            i = i + 1
    
        outfile.close()
    
    def get_data():
        train = pd.read_csv("../input/train.csv")
    
        features = list(train.columns[2:])
    
        y_train = train.Hazard
    
        for feat in train.select_dtypes(include=['object']).columns:
            m = train.groupby([feat])['Hazard'].mean()
            train[feat].replace(m,inplace=True)
    
        x_train = train[features]
    
        return features, x_train, y_train
    
    def get_data2():
        from sklearn.datasets import load_iris
        #获取数据
        iris = load_iris()
        x_train=pd.DataFrame(iris.data)
        features=["sepal_length","sepal_width","petal_length","petal_width"]
        x_train.columns=features
        y_train=pd.DataFrame(iris.target)
        return features, x_train, y_train
    
    #features, x_train, y_train = get_data()
    features, x_train, y_train = get_data2()
    ceate_feature_map(features)
    
    xgb_params = {"objective": "reg:linear", "eta": 0.01, "max_depth": 8, "seed": 42, "silent": 1}
    num_rounds = 1000
    
    dtrain = xgb.DMatrix(x_train, label=y_train)
    gbdt = xgb.train(xgb_params, dtrain, num_rounds)
    
    importance = gbdt.get_fscore(fmap='xgb.fmap')
    importance = sorted(importance.items(), key=operator.itemgetter(1))
    
    df = pd.DataFrame(importance, columns=['feature', 'fscore'])
    df['fscore'] = df['fscore'] / df['fscore'].sum()
    
    plt.figure()
    df.plot()
    df.plot(kind='barh', x='feature', y='fscore', legend=False, figsize=(16, 10))
    plt.title('XGBoost Feature Importance')
    plt.xlabel('relative importance')
    plt.gcf().savefig('feature_importance_xgb.png')
    

    根据结构分数的增益情况计算出来选择哪个特征的哪个分割点,某个特征的重要性,就是它在所有树中出现的次数之和。

    参考:https://blog.csdn.net/q383700092/article/details/53698760

     

    另外:使用xgboost,遇到一个问题

    看到网上有一个办法:
           重新新建Python文件,把你的代码拷过去;或者重命名也可以;还不行,就把代码复制到别的地方(不能在原始文件夹内),会重新编译,就正常了
           但是我觉得本质问题不是这样解决的,但临时应急还是可以的,欢迎讨论!
     
    问题根源:
    初学者或者说不太了解Python才会犯这种错误,其实只需要注意一点!不要使用任何模块名作为文件名,任何类型的文件都不可以!我的错误根源是在文件夹中使用xgboost.*的文件名,当import xgboost时会首先在当前文件中查找,才会出现这样的问题。所以,再次强调:不要用任何的模块名作为文件名!
     
     
    另外:若出现问题:

    D:ProgramPython3.5libsite-packagessklearncross_validation.py:44: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
    "This module will be removed in 0.20.", DeprecationWarning)

    先卸载原先版本的xgboost, pip uninstall xgboost

    然后下载安装新版本的xgboost,地址:https://www.lfd.uci.edu/~gohlke/pythonlibs/#xgboost

    命令:pip install xgboost-0.6-cp35-none-win_amd64.whl

  • 相关阅读:
    Ubuntu 安装.net core 设置源仓储地址
    ASP.NET Core开发-MVC 使用dotnet 命令创建Controller和View
    ubuntu安装postman
    redis相关操作
    C# StringValues 类型
    MySQL时间戳与日期格式的相互转换
    C# 生成时间戳以及时间戳转换为时间
    关于vscode 使用nuget插件相关说明
    ubuntu一些命令
    使用HBuilderX新建Uniapp项目并在不同平台上运行调试
  • 原文地址:https://www.cnblogs.com/Allen-rg/p/8868109.html
Copyright © 2011-2022 走看看