zoukankan      html  css  js  c++  java
  • xgboost 里边的gain freq, cover

    assuming that you're using xgboost to fit boosted trees for binary classification. The importance matrix is actually a data.table object with the first column listing the names of all the features actually used in the boosted trees.

    The meaning of the importance data table is as follows:

    1. The Gain implies the relative contribution of the corresponding feature to the model calculated by taking each feature's contribution for each tree in the model. A higher value of this metric when compared to another feature implies it is more important for generating a prediction.
    2. The Cover metric means the relative number of observations related to this feature. For example, if you have 100 observations, 4 features and 3 trees, and suppose feature1 is used to decide the leaf node for 10, 5, and 2 observations in tree1, tree2 and tree3 respectively; then the metric will count cover for this feature as 10+5+2 = 17 observations. This will be calculated for all the 4 features and the cover will be 17 expressed as a percentage for all features' cover metrics.
    3. The Frequence (frequency) is the percentage representing the relative number of times a particular feature occurs in the trees of the model. In the above example, if feature1 occurred in 2 splits, 1 split and 3 splits in each of tree1, tree2 and tree3; then the weightage for feature1 will be 2+1+3 = 6. The frequency for feature1 is calculated as its percentage weight over weights of all features.

    The Gain is the most relevant attribute to interpret the relative importance of each feature.

    Gain is the improvement in accuracy brought by a feature to the branches it is on. The idea is that before adding a new split on a feature X to the branch there was some wrongly classified elements, after adding the split on this feature, there are two new branches, and each of these branch is more accurate (one branch saying if your observation is on this branch then it should be classified as 1, and the other branch saying the exact opposite).

    Cover measures the relative quantity of observations concerned by a feature.

    Frequency is a simpler way to measure the Gain. It just counts the number of times a feature is used in all generated trees. You should not use it (unless you know why you want to use it).

  • 相关阅读:
    Spring 学习7 -事务
    Spring学习 6- Spring MVC (Spring MVC原理及配置详解)
    看秒杀系统的时候看到的关于并发队列的介绍,摘抄如下
    Spring 学习 3- AOP
    Spring学习-1 框架总览
    Spring 学习 5- task 定时任务
    JAVA锁机制-可重入锁,可中断锁,公平锁,读写锁,自旋锁,
    指定链接的样式的顺序
    css方法实现div固定浏览器底端
    文件中批量搜索字符串
  • 原文地址:https://www.cnblogs.com/xinping-study/p/8780817.html
Copyright © 2011-2022 走看看