pyspark GBTRegressor 特征重要度及排序

GBTRegressor 模型评估指标和特征重要度分析

官方文档：https://spark.apache.org/docs/2.2.0/api/python/_modules/pyspark/ml/regression.html

和随机森林类似，训练好model 可用如下代码打印特征以及重要度排序

#打印特征索引及其重要度
features_important = model.featureImportances
print(features_important)

#获取各个特征在模型中的重要性并按照权重倒序打印
ks = list(features_important.indices)
vs = list(features_important.toArray())

features_important = tuple(features_important)
print(len(features_important))


name_index = train.schema["features"].metadata["ml_attr"]["attrs"]


index_im = zip(ks, vs)
names = []
idxs = []
 
fea_num = 0

for it in name_index['numeric']:
    names.append(it['name'])
    idxs.append(it['idx'])
    fea_num += 1
    
print (fea_num)

d = zip(names, idxs)
p = zip(index_im, d)
 
kv = {}
for fir, sec in p:
    kv[sec[0]] = fir[1]
    fea_num += 1
print(len(kv))
print (sorted(kv.items(), key=lambda el: el[1], reverse=True))

参考链接

https://blog.csdn.net/zx8167107/article/details/101709245?utm_medium=distribute.pc_relevant.none-task-blog-BlogCommendFromMachineLearnPai2-2.channel_param&depth_1-utm_source=distribute.pc_relevant.none-task-blog-BlogCommendFromMachineLearnPai2-2.channel_param

https://blog.csdn.net/qq_23860475/article/details/90766237

查看全文

相关阅读:
学习笔记-Java设计模式-结构型模式1
学习笔记-Java设计原则
 阅读笔记：DQL、DML、DDL、DCL的概念与区别
 阅读笔记-HTTP返回状态码
 大数据应用技术课程实践--选题与实践方案
 15 手写数字识别-小数据集
 14 深度学习-卷积
 java集合（一）
13-垃圾邮件分类2
机器学习——12.朴素贝叶斯-垃圾邮件分类

原文地址：https://www.cnblogs.com/Allen-rg/p/13390083.html

pyspark GBTRegressor 特征重要度 及排序

pyspark GBTRegressor 特征重要度及排序