zoukankan      html  css  js  c++  java
  • FeatureUnion 与 ColumnTransformer 关系

    from future import print_function
    from sklearn.pipeline import Pipeline, FeatureUnion
    from sklearn.compose import ColumnTransformer
    from sklearn.model_selection import GridSearchCV
    from sklearn.svm import SVC
    from sklearn.datasets import load_iris
    from sklearn.decomposition import PCA
    from sklearn.feature_selection import SelectKBest

    iris = load_iris()
    X, y = iris.data, iris.target

    This dataset is way too high-dimensional. Better do PCA:

    pca = PCA(n_components=2)

    Maybe some original features where good, too?

    selection = SelectKBest(k=1)

    Build estimator from PCA and Univariate selection:

    combined_features = FeatureUnion([("pca", pca), ("univ_select", selection)])
    ct_f = ColumnTransformer([("pca", pca, [0,1,2,3]), ("univ_select", selection, [0,1,2,3])])

    Use combined features to transform dataset:

    X_features = combined_features.fit(X, y).transform(X)
    print("Combined space has", X_features.shape[1], "features")

    X_features2 = ct_f.fit(X, y).transform(X)
    print("Combined space has", X_features2.shape[1], "features")

    for i in range(20):
    print(X_features[i],"--> ",X_features2[i],X_features[i]-X_features2[i])

    TODO(yu):这里又两种交叉模式,还有一种完全展开形式; a.一个特征处理所有列,多个特征重复然后合并;

    b.一列同时计算除多个特征,然后多列合并;c.特征和列完全交叉展开笛卡儿积,每个对应一个列转换元组;

    a符合基本设计可以利用ColumnTransformer,导致会多次; b特征计算可以共享FFT计算结果FFT,耦合度高,实现麻烦些; c最不划算的做法FFT

    为了方便,选a,显然c最灵活,可以任意指定列和特征,而a.b只能灵活指定一维。

    FeatureUnion主要解决的是多种特征的合并,ColumnTransformer主要解决列的指定问题,而Pipeline主要解决竖直方向连接的问题

    三者结合很有用,但是ColumnTransformer 似乎可以实现 FeatureUnion 的功能?

  • 相关阅读:
    Mysql:存储过程
    mysql-8.0.21的安装
    java8:四大函数式接口(Consumer、Supplier、Function、Predicate)
    java8:Lambda表达式、函数式接口
    Nginx:负载均衡
    JeecgBoot:开发环境准备(安装Node.js、yarn、WebStorm、Nodejs镜像)
    Nginx:反向代理(案例)
    Mysql:性能分析
    Nginx:初识Nginx(概念、在Docker中安装Nginx、常用命令、配置文件)
    Docker 实战之Registry以及持续集成
  • 原文地址:https://www.cnblogs.com/wdmx/p/9958641.html
Copyright © 2011-2022 走看看