zoukankan      html  css  js  c++  java
  • XGBoost小记

    1.原理

    //TODO

    2.Python Package Scikit-Learn API

    2.1输入

    数据的特征分为两类,一类是连续型,比如:体重,一种是分类型,比如性别。

    在scikit-learn中的Glossary of Common Terms and API Elements有这么一段话:

    Categorical Feature

    A categorical or nominal feature is one that has a finite set of discrete values across the population of data. These are commonly represented as columns of integers or strings. Strings will be rejected by most scikit-learn estimators, and integers will be treated as ordinal or count-valued. For the use with most estimators, categorical variables should be one-hot encoded. Notable exceptions include tree-based models such as random forests and gradient boosting models that often work better and faster with integer-coded categorical variables. OrdinalEncoder helps encoding string-valued categorical features as ordinal integers, and OneHotEncoder can be used to one-hot encode categorical features. See also Encoding categorical features and the http://contrib.scikit-learn.org/categorical-encoding package for tools related to encoding categorical features.

    大意是在利用基于树的模型训练时推荐使用数值编码而不是one-hot编码。

    详情:https://scikit-learn.org/stable/glossary.html#glossary

    2.2输出

    在这里只说两点:multi:softmax和multi:softprob,官方文档是这么说的:

    multi:softmax: set XGBoost to do multiclass classification using the softmax objective, you also need to set num_class(number of classes)
    multi:softprob: same as softmax, but output a vector of ndata * nclass, which can be further reshaped to ndata * nclass matrix. The result contains predicted probability of each data point belonging to each class.

    在这里略坑,建立model时无论填哪一个,在model fit之后,打印模型时参数却都是multi:softprob,但是predict的结果也和上述解释也不一致,结果是multi:softmax的结果,只有预测的标签,没有概率分布。

    官方代码如下:可见num_class也是不用设置的,objective被强制替换成了multi:softprob.最后若想输出概率分布请用predict_proba函数来预测.

    self.classes_ = np.unique(y)
    self.n_classes_ = len(self.classes_)
    
    if self.n_classes_ > 2:
                # Switch to using a multiclass objective in the underlying XGB instance
                xgb_options["objective"] = "multi:softprob"
                xgb_options['num_class'] = self.n_classes_

    3.DEMO

    //TODO

  • 相关阅读:
    单元测试课堂练习
    软件工程个人作业02
    软件工程个人作业01
    构建之法提问
    大道至简-第七、八章-心得体会
    06-接口与继承 动手动脑及验证
    大道至简-第六章-心得体会
    随机生成10个数,填充一个数组,然后用消息框显示数组内容,接着计算数组元素的和,将结果也显示在消息框中。
    大道至简-第五章-心得体会
    字符串加密
  • 原文地址:https://www.cnblogs.com/0xcafe/p/10347304.html
Copyright © 2011-2022 走看看