zoukankan      html  css  js  c++  java
  • featuretools实践

    空值衍生得到的还是空值

    代码:

    import featuretools as ft
    import pandas as pd
    
    df = pd.DataFrame(data={"x1": [None,2,3], 'x2': [4, 5, 6]})
    
    es = ft.EntitySet(id='es_hypernets_fit')
    es.entity_from_dataframe(entity_id='e_hypernets_ft', dataframe=df,  make_index=True, index='e_hypernets_ft_index')
    feature_matrix, feature_defs = ft.dfs(entityset=es, target_entity="e_hypernets_ft",
                                          ignore_variables={"e_hypernets_ft": []},
                                          return_variable_types="all",
                                          trans_primitives=['add_numeric', 'subtract_numeric'],
                                          max_depth=1,
                                          features_only=False,
                                          max_features=-1)
    print(feature_matrix)
    
    

    输出:

                           x1  x2  x1 + x2  x1 - x2
    e_hypernets_ft_index                           
    0                     NaN   4      NaN      NaN
    1                     2.0   5      7.0     -3.0
    2                     3.0   6      9.0     -3.0
    

    只要跟空值有关的衍生列,都是NaN,建议在衍生前对空值进行填充

    衍生可能产生异常值

    对于n/0这种情况,会得到inf

    import featuretools as ft
    import pandas as pd
    
    df = pd.DataFrame(data={"x1": [1,2,3], 'x2': [0, 5, 6]})
    
    es = ft.EntitySet(id='es_hypernets_fit')
    es.entity_from_dataframe(entity_id='e_hypernets_ft', dataframe=df,  make_index=True, index='e_hypernets_ft_index')
    feature_matrix, feature_defs = ft.dfs(entityset=es, target_entity="e_hypernets_ft",
                                          ignore_variables={"e_hypernets_ft": []},
                                          return_variable_types="all",
                                          trans_primitives=['divide_numeric'],
                                          max_depth=1,
                                          features_only=False,
                                          max_features=-1)
    print(feature_matrix)
    

    结果为:

                          x1  x2  x1 / x2  x2 / x1
    e_hypernets_ft_index                          
    0                      1   0      inf      0.0
    1                      2   5      0.4      2.5
    2                      3   6      0.5      2.0
    

    如果算法无法处理极大/小值,建议在衍生之后进行替换。

    实验在featuretools ==0.18.1上进行。

  • 相关阅读:
    Vue异步数据交互 promise axios async fetch
    JS数组或对象转为JSON字符换 JavaScript JSON.stringify()
    JS返回数组的最大值与最小值
    Error: Cannot find module './application'
    Express框架
    NodeJS项目制作流程
    模板引擎art-template
    基于NodeJS的网络响应原理与HTTP协议
    leetcode 953. 验证外星语词典 做题笔记
    leetcode 771. 宝石与石头 做题笔记
  • 原文地址:https://www.cnblogs.com/oaks/p/13565835.html
Copyright © 2011-2022 走看看