zoukankan      html  css  js  c++  java
  • ALINK(三十一):特征工程(十)特征选择(二)卡方选择器 (ChiSqSelectorBatchOp)

    Java 类名:com.alibaba.alink.operator.batch.feature.ChiSqSelectorBatchOp

    Python 类名:ChiSqSelectorBatchOp

    功能介绍

    针对table数据,进行特征筛选

    参数说明

    名称

    中文名称

    描述

    类型

    是否必须?

    默认值

    labelCol

    标签列名

    输入表中的标签列名

    String

     

    selectedCols

    选择的列名

    计算列对应的列名列表

    String[]

     

    selectorType

    筛选类型

    筛选类型,包含"NumTopFeatures","percentile", "fpr", "fdr", "fwe"五种。

    String

     

    "NumTopFeatures"

    numTopFeatures

    最大的p-value列个数

    最大的p-value列个数, 默认值50

    Integer

     

    50

    percentile

    筛选的百分比

    筛选的百分比,默认值0.1

    Double

     

    0.1

    fpr

    p value的阈值

    p value的阈值,默认值0.05

    Double

     

    0.05

    fdr

    发现阈值

    发现阈值, 默认值0.05

    Double

     

    0.05

    fwe

    错误率阈值

    错误率阈值, 默认值0.05

    Double

     

    0.05

    代码示例

    Python 代码

    from pyalink.alink import *
    import pandas as pd
    useLocalEnv(1)
    df = pd.DataFrame([
        ["a", 1, 1,2.0, True],
        ["c", 1, 2, -3.0, True],
        ["a", 2, 2,2.0, False],
        ["c", 0, 0, 0.0, False]
    ])
    source = BatchOperator.fromDataframe(df, schemaStr='f_string string, f_long long, f_int int, f_double double, f_boolean boolean')
    selector = ChiSqSelectorBatchOp()
                .setSelectedCols(["f_string", "f_long", "f_int", "f_double"])
                .setLabelCol("f_boolean")
                .setNumTopFeatures(2)
    selector.linkFrom(source)
    modelInfo: ChisqSelectorModelInfo = selector.collectModelInfo()
            
    print(modelInfo.getColNames())

    Java 代码

    import org.apache.flink.types.Row;
    import com.alibaba.alink.operator.batch.BatchOperator;
    import com.alibaba.alink.operator.batch.feature.ChiSqSelectorBatchOp;
    import com.alibaba.alink.operator.batch.source.MemSourceBatchOp;
    import com.alibaba.alink.operator.common.feature.ChisqSelectorModelInfo;
    import org.junit.Test;
    import java.util.Arrays;
    import java.util.List;
    public class ChiSqSelectorBatchOpTest {
      @Test
      public void testChiSqSelectorBatchOp() throws Exception {
        List <Row> df = Arrays.asList(
          Row.of("a", 1L, 1, 2.0, true),
          Row.of("c", 1L, 2, -3.0, true),
          Row.of("a", 2L, 2, 2.0, false),
          Row.of("c", 0L, 0, 0.0, false)
        );
        BatchOperator <?> source = new MemSourceBatchOp(df,
          "f_string string, f_long long, f_int int, f_double double, f_boolean boolean");
        ChiSqSelectorBatchOp selector = new ChiSqSelectorBatchOp()
          .setSelectedCols("f_string", "f_long", "f_int", "f_double")
          .setLabelCol("f_boolean")
          .setNumTopFeatures(2);
        selector.linkFrom(source);
        ChisqSelectorModelInfo modelInfo = selector.collectModelInfo();
        System.out.println(modelInfo.toString());
      }
    }

    运行结果

    ------------------------- ChisqSelectorModelInfo -------------------------
    Number of Selector Features: 2
    Number of Features: 4
    Type of Selector: NumTopFeatures
    Number of Top Features: 2
    Selector Indices: 
        | ColName|ChiSquare|PValue| DF|Selected|
        |--------|---------|------|---|--------|
        |  f_long|        4|0.1353|  2|    true|
        |   f_int|        2|0.3679|  2|    true|
        |f_double|        2|0.3679|  2|   false|
        |f_string|        0|     1|  1|   false|
  • 相关阅读:
    [ISSUE]SyntaxWarning: name 'xxx' is assigned to before global declaration
    memcached 安装
    gulp 初体验
    gitcafe 使用hexo搭建博客
    sulime-text 3 安装以及使用
    media queries 媒体查询使用
    css3之@font-face---再也不用被迫使用web安全字体了
    前端的一些疑问总结01
    bootstrap 笔记01
    自定义表单样式之checkbox和radio
  • 原文地址:https://www.cnblogs.com/qiu-hua/p/14901569.html
Copyright © 2011-2022 走看看