空值衍生得到的还是空值
代码:
import featuretools as ft
import pandas as pd
df = pd.DataFrame(data={"x1": [None,2,3], 'x2': [4, 5, 6]})
es = ft.EntitySet(id='es_hypernets_fit')
es.entity_from_dataframe(entity_id='e_hypernets_ft', dataframe=df, make_index=True, index='e_hypernets_ft_index')
feature_matrix, feature_defs = ft.dfs(entityset=es, target_entity="e_hypernets_ft",
ignore_variables={"e_hypernets_ft": []},
return_variable_types="all",
trans_primitives=['add_numeric', 'subtract_numeric'],
max_depth=1,
features_only=False,
max_features=-1)
print(feature_matrix)
输出:
x1 x2 x1 + x2 x1 - x2
e_hypernets_ft_index
0 NaN 4 NaN NaN
1 2.0 5 7.0 -3.0
2 3.0 6 9.0 -3.0
只要跟空值有关的衍生列,都是NaN,建议在衍生前对空值进行填充。
衍生可能产生异常值
对于n/0这种情况,会得到inf
:
import featuretools as ft
import pandas as pd
df = pd.DataFrame(data={"x1": [1,2,3], 'x2': [0, 5, 6]})
es = ft.EntitySet(id='es_hypernets_fit')
es.entity_from_dataframe(entity_id='e_hypernets_ft', dataframe=df, make_index=True, index='e_hypernets_ft_index')
feature_matrix, feature_defs = ft.dfs(entityset=es, target_entity="e_hypernets_ft",
ignore_variables={"e_hypernets_ft": []},
return_variable_types="all",
trans_primitives=['divide_numeric'],
max_depth=1,
features_only=False,
max_features=-1)
print(feature_matrix)
结果为:
x1 x2 x1 / x2 x2 / x1
e_hypernets_ft_index
0 1 0 inf 0.0
1 2 5 0.4 2.5
2 3 6 0.5 2.0
如果算法无法处理极大/小值,建议在衍生之后进行替换。
实验在featuretools ==0.18.1
上进行。