zoukankan      html  css  js  c++  java
  • ALINK(四十一):模型评估(六)聚类评估 (EvalClusterBatchOp)

    Java 类名:com.alibaba.alink.operator.batch.evaluation.EvalClusterBatchOp

    Python 类名:EvalClusterBatchOp

    功能介绍

    聚类评估是对聚类算法的预测结果进行效果评估,支持下列评估指标。

     

     

    参数说明

    名称

    中文名称

    描述

    类型

    是否必须?

    默认值

    predictionCol

    预测结果列名

    预测结果列名

    String

     

    labelCol

    标签列名

    输入表中的标签列名

    String

     

    null

    vectorCol

    向量列名

    输入表中的向量列名

    String

     

    null

    distanceType

    距离度量方式

    距离类型

    String

     

    "EUCLIDEAN"

    代码示例

    Python 代码

    from pyalink.alink import *
    import pandas as pd
    useLocalEnv(1)
    df = pd.DataFrame([
        [0, "0 0 0"],
        [0, "0.1,0.1,0.1"],
        [0, "0.2,0.2,0.2"],
        [1, "9 9 9"],
        [1, "9.1 9.1 9.1"],
        [1, "9.2 9.2 9.2"]
    ])
    inOp = BatchOperator.fromDataframe(df, schemaStr='id int, vec string')
    metrics = EvalClusterBatchOp().setVectorCol("vec").setPredictionCol("id").linkFrom(inOp).collectMetrics()
    print("Total Samples Number:", metrics.getCount())
    print("Cluster Number:", metrics.getK())
    print("Cluster Array:", metrics.getClusterArray())
    print("Cluster Count Array:", metrics.getCountArray())
    print("CP:", metrics.getCp())
    print("DB:", metrics.getDb())
    print("SP:", metrics.getSp())
    print("SSB:", metrics.getSsb())
    print("SSW:", metrics.getSsw())
    print("CH:", metrics.getVrc())

    Java 代码

    import org.apache.flink.types.Row;
    import com.alibaba.alink.operator.batch.BatchOperator;
    import com.alibaba.alink.operator.batch.evaluation.EvalClusterBatchOp;
    import com.alibaba.alink.operator.batch.source.MemSourceBatchOp;
    import com.alibaba.alink.operator.common.evaluation.ClusterMetrics;
    import org.junit.Test;
    import java.util.Arrays;
    import java.util.List;
    public class EvalClusterBatchOpTest {
      @Test
      public void testEvalClusterBatchOp() throws Exception {
        List <Row> df = Arrays.asList(
          Row.of(0, "0 0 0"),
          Row.of(0, "0.1,0.1,0.1"),
          Row.of(0, "0.2,0.2,0.2"),
          Row.of(1, "9 9 9"),
          Row.of(1, "9.1 9.1 9.1"),
          Row.of(1, "9.2 9.2 9.2")
        );
        BatchOperator <?> inOp = new MemSourceBatchOp(df, "id int, vec string");
        ClusterMetrics metrics = new EvalClusterBatchOp().setVectorCol("vec").setPredictionCol("id").linkFrom(inOp)
          .collectMetrics();
        System.out.println("Total Samples Number:" + metrics.getCount());
        System.out.println("Cluster Number:" + metrics.getK());
        System.out.println("Cluster Array:" + Arrays.toString(metrics.getClusterArray()));
        System.out.println("Cluster Count Array:" + Arrays.toString(metrics.getCountArray()));
        System.out.println("CP:" + metrics.getCp());
        System.out.println("DB:" + metrics.getDb());
        System.out.println("SP:" + metrics.getSp());
        System.out.println("SSB:" + metrics.getSsb());
        System.out.println("SSW:" + metrics.getSsw());
        System.out.println("CH:" + metrics.getVrc());
      }
    }

    运行结果

    Total Samples Number: 6
    Cluster Number: 2
    Cluster Array: ['0', '1']
    Cluster Count Array: [3.0, 3.0]
    CP: 0.11547005383792497
    DB: 0.014814814814814791
    SP: 15.588457268119896
    SSB: 364.5
    SSW: 0.1199999999999996
    CH: 12150.000000000042
  • 相关阅读:
    发送邮件程序
    T-SQL存储过程、游标
    GPS经纬度换算成XY坐标
    开博了
    你应该知道的 50 个 Python 单行代码
    想提升java知识的同学请进
    adb工具包使用方法
    红米note3刷安卓原生
    hadoop 使用和javaAPI
    django学习——url的name
  • 原文地址:https://www.cnblogs.com/qiu-hua/p/14902458.html
Copyright © 2011-2022 走看看