zoukankan      html  css  js  c++  java
  • kaggle教程之特征工程

    • 特征工程

        特征工程可以有效地改善模型效果,减少训练时间。

        简单的方法包括:

          1. 进行特征转换

          2. 增加语义特征

    A Guiding Principle of Feature Engineering

    For a feature to be useful, it must have a relationship to the target that your model is able to learn. Linear models, for instance, are only able to learn linear relationships. So, when using a linear model, your goal is to transform the features to make their relationship to the target linear.

    The key idea here is that a transformation you apply to a feature becomes in essence a part of the model itself. Say you were trying to predict the Price of square plots of land from the Length of one side. Fitting a linear model directly to Length gives poor results: the relationship is not linear.

    A scatterplot of Length along the x-axis and Price along the y-axis, the points increasing in a curve, with a poorly-fitting line superimposed.

    A linear model fits poorly with only Length as feature.

    If we square the Length feature to get 'Area', however, we create a linear relationship. Adding Area to the feature set means this linear model can now fit a parabola. Squaring a feature, in other words, gave the linear model the ability to fit squared features.

    • 互信息

      面对大量特征,可以使用特征效益函数对特征进行排序,选择有效的特征子集。互信息是最常用的方法,相较于相关度,互信息可以衡量各种关系,而相关度只能衡量线性关      系。互信息以不确定性(熵)的角度进行特征相关度的衡量。

      互信息的好处:

    • easy to use and interpret,
    • computationally efficient,
    • theoretically well-founded,
    • resistant to overfitting, and,
    • able to detect any kind of relationship

    互信息的下限是0,上限是正无穷,但一般不大于2.

    互信息是一种单变量尺度。

    注意:类别特征编码

    # Label encoding for categoricals
    for colname in X.select_dtypes("object"):
        X[colname], _ = X[colname].factorize()

    针对实数类型的标签使用mutual_info_regression,针对整数类型的标签使用mutual_info_classif

  • 相关阅读:
    ionic的start-y属性初始化页面
    AngularJS时间轴指令
    [ngRepeat:dupes] Duplicates in a repeater are not allowed. Use 'track by' expression to specify unique keys
    常用CSS样式
    angularjs ng-click
    不是SELECTed表达式
    跨域资源共享(CORS)问题解决方案
    AngularJS身份验证:Cookies VS Tokens
    Javascript跨域
    Vmware虚拟机centos7命令登录模式与成桌面模式切换
  • 原文地址:https://www.cnblogs.com/big-zoo/p/14477887.html
Copyright © 2011-2022 走看看