zoukankan      html  css  js  c++  java
  • spark之AHP层次分析顾客价值得分

    一.什么是AHP

    RFM是对顾客价值分群,但是每个群内的顾客并没有区分价值度。所以AHP就是针对每个群内的顾客进行打分去区分不同价值顾客。

    什么是AHP---------------> https://baike.baidu.com/item/%E5%B1%82%E6%AC%A1%E5%88%86%E6%9E%90%E6%B3%95/1672?fr=aladdin)以及 (https://tellyouwhat.cn/p/ahp-users-value-score/)

    AHP(the analytic hierarchy process),层级分析法
    为每个用户计算AHP得分,并根据RFM分群结果进行同类中的客户排序
      1.建立层次结构模型
      2.构造成对比较矩阵
      3.计算权向量并做一致性检验

    目标:
      针针RFM中同类价值顾客排名
      利用RFM模型中的指标R、F、M
      为每一个用户计算AHP得分(根据AHP得分对同类价值顾客进行排名)

    二.数据

    数据来自:spark之RFM客户价值分群挖掘(https://www.cnblogs.com/little-horse/p/14014812.html)

    三.代码(spark3.0,java1.8)

    详细代码见,AHP层次分析顾客价值得分(https://github.com/jiangnanboy/spark_tutorial)

    /**
         * RFM聚类可以分为高价值用户、一般用户、低价值用户等。
         * 对于RFM中的同类用户的排序则使用AHP权向量给每个用户计算最终得分:利用每个用户的RFM向量与权值向量点乘得出AHP分数
         * @param dataset 经过RFM聚类后的数据
         * @param weightVector 权重向量
         */
        public static void ahpScore(Dataset<Row> dataset, List<Double> weightVector) {
    
            /**
             * 计算每个用户的AHP分值:
             *+----------+------------------+--------------------+----------+--------------------+
             * |customerid|          features|      scaledfeatures|prediction|            ahpscore|
             * +----------+------------------+--------------------+----------+--------------------+
             * |     12940| [46.0,4.0,876.29]|[0.12332439678284...|         1|0.024241021827781713|
             * |     13285|[23.0,4.0,2709.12]|[0.06166219839142...|         1|0.023847531248595018|
             * |     13623| [30.0,7.0,672.44]|[0.08042895442359...|         1|0.024049650279212683|
             * |     13832|  [17.0,2.0,40.95]|[0.04557640750670...|         1|0.014321280782467466|
             * |     14450|[180.0,3.0,483.25]|[0.48257372654155...|         0| 0.04870738944845504|
             * +----------+------------------+--------------------+----------+--------------------+
             */
            dataset = dataset.map((MapFunction<Row, Row>) row -> {
                int customerID = row.getInt(0);
                Vector featureVec = (Vector) row.get(1);
                Vector scaledFeatureVec = (Vector) row.get(2);
                int prediction = row.getInt(3);
                double aphScore = 0.0;
                for(int i = 0; i < weightVector.size(); i++) {
                    aphScore += weightVector.get(i) * scaledFeatureVec.apply(i);
                }
                return RowFactory.create(customerID, Vectors.dense(new double[]{featureVec.apply(0), featureVec.apply(1), featureVec.apply(2)}), Vectors.dense(new double[]{scaledFeatureVec.apply(0), scaledFeatureVec.apply(1), scaledFeatureVec.apply(2)}), prediction, aphScore);
            }, RowEncoder.apply(new StructType(new StructField[]{
                    new StructField("customerid", DataTypes.IntegerType, false, Metadata.empty()),//用户id
                    new StructField("features", SQLDataTypes.VectorType(),false, Metadata.empty()),//rfm特征向量
                    new StructField("scaledfeatures", SQLDataTypes.VectorType(), false, Metadata.empty()),//min-max标准化后的rfm特征向量
                    new StructField("prediction", DataTypes.IntegerType, false, Metadata.empty()),//预测该用户的价值类别
                    new StructField("ahpscore", DataTypes.DoubleType, false, Metadata.empty())//该用户的价值得分
            })));
    
            /**
             * 在同类价值用户中根据ahpscore排序
             * +----------+--------------------+--------------------+----------+------------------+----+
             * |customerid|            features|      scaledfeatures|prediction|          ahpscore|rank|
             * +----------+--------------------+--------------------+----------+------------------+----+
             * |     14646|[1.0,77.0,279489.02]|[0.00268096514745...|         1|0.7306140418787522|   1|
             * |     18102|[0.0,62.0,256438.49]|[0.0,0.2469635627...|         1|0.6609787921304062|   2|
             * |     14911|[1.0,248.0,132572...|[0.00268096514745...|         1|0.5933314030496094|   3|
             * |     17450|[8.0,55.0,187482.17]|[0.02144772117962...|         1|0.4982050472344627|   4|
             * |     14156|[9.0,66.0,113384.14]|[0.02412868632707...|         1|0.3430011157923704|   5|
             * +----------+--------------------+--------------------+----------+------------------+----+
             */
            dataset = dataset.withColumn("rank", functions.rank().over(Window.partitionBy("prediction").orderBy(col("ahpscore").desc())));
            dataset.show(5);
        }
  • 相关阅读:
    额外的 string 操作
    vector 对象是如何增长的
    顺序容器操作
    容器库概览
    顺序容器概述
    特定容器算法
    泛型算法结构
    再探迭代器
    定制操作
    使用关联容器
  • 原文地址:https://www.cnblogs.com/little-horse/p/14014851.html
Copyright © 2011-2022 走看看