zoukankan      html  css  js  c++  java
  • Spark Interaction(特征交互-笛卡尔转换)

    1、概念

         Interaction是一个Transformer。它使用向量或double列,并生成单个向量列,其中包含每个输入列的一个值的所有组合的乘积。例如,如果您有两个向量类型列,每个列有3个维度作为输入列,那么您将获得一个9维向量作为输出列。

    2、code

    package com.home.spark.ml
    
    import org.apache.spark.SparkConf
    import org.apache.spark.ml.feature.{Interaction, VectorAssembler}
    import org.apache.spark.sql.SparkSession
    
    /**
      * @Description: 笛卡尔特征交互
      * 实现特征交互转换。 此转换器接受Double和Vector类型的列,并输出其特征交互的展平向量。 
      * 为了处理交互,我们首先对任何标称特征进行一次热编码。 然后,生成特征叉积的向量。 
      * 例如,给定输入要素值Double(2)和Vector(3,4),如果所有输入要素都是数字,则输出将为Vector(6,8)。 
      * 如果第一个特征标称是具有四个类别,则输出将是“ Vector(0,0,0,0,3,4,0,0)”。              
      **/
    object Ex_Interaction {
      def main(args: Array[String]): Unit = {
        val conf: SparkConf = new SparkConf(true).setMaster("local[2]").setAppName("spark ml")
        val spark = SparkSession.builder().config(conf).getOrCreate()
    
    
        val df = spark.createDataFrame(Seq(
          (1, 1, 2, 3, 8, 4, 5),
          (2, 4, 3, 8, 7, 9, 8),
          (3, 6, 1, 9, 2, 3, 6),
          (4, 10, 8, 6, 9, 4, 5),
          (5, 9, 2, 7, 10, 7, 3),
          (6, 1, 1, 4, 2, 8, 4)
        )).toDF("id1", "id2", "id3", "id4", "id5", "id6", "id7")
    
        val assembler1 = new VectorAssembler().setInputCols(Array("id2","id3","id4")).setOutputCol("vector1")
        val assembled1 = assembler1.transform(df)
        assembled1.show(false)
        
        val assembler2 = new VectorAssembler().setInputCols(Array("id5","id6","id7")).setOutputCol("vector2")
        val assembled2 = assembler2.transform(assembled1)
        assembled2.show(false)
    
        val interaction = new Interaction().setInputCols(Array("id1","vector1","vector2")).setOutputCol("interactedCol")
        val result = interaction.transform(assembled2)
    
        result.show(false)
        spark.stop()
      }
    
    }

    +---+---+---+---+---+---+---+--------------+--------------+------------------------------------------------------+
    |id1|id2|id3|id4|id5|id6|id7|vector1       |vector2       |interactedCol                                         |
    +---+---+---+---+---+---+---+--------------+--------------+------------------------------------------------------+
    |1  |1  |2  |3  |8  |4  |5  |[1.0,2.0,3.0] |[8.0,4.0,5.0] |[8.0,4.0,5.0,16.0,8.0,10.0,24.0,12.0,15.0]            |
    |2  |4  |3  |8  |7  |9  |8  |[4.0,3.0,8.0] |[7.0,9.0,8.0] |[56.0,72.0,64.0,42.0,54.0,48.0,112.0,144.0,128.0]     |
    |3  |6  |1  |9  |2  |3  |6  |[6.0,1.0,9.0] |[2.0,3.0,6.0] |[36.0,54.0,108.0,6.0,9.0,18.0,54.0,81.0,162.0]        |
    |4  |10 |8  |6  |9  |4  |5  |[10.0,8.0,6.0]|[9.0,4.0,5.0] |[360.0,160.0,200.0,288.0,128.0,160.0,216.0,96.0,120.0]|
    |5  |9  |2  |7  |10 |7  |3  |[9.0,2.0,7.0] |[10.0,7.0,3.0]|[450.0,315.0,135.0,100.0,70.0,30.0,350.0,245.0,105.0] |
    |6  |1  |1  |4  |2  |8  |4  |[1.0,1.0,4.0] |[2.0,8.0,4.0] |[12.0,48.0,24.0,12.0,48.0,24.0,48.0,192.0,96.0]       |
    +---+---+---+---+---+---+---+--------------+--------------+------------------------------------------------------+
  • 相关阅读:
    +-字符串
    心急的C小加
    明明的随机数
    最大公约数和最小公倍数
    独木舟上的旅行
    背包问题
    喷水装置
    奇数魔方
    栈的应用
    c链表结点的删除和添加
  • 原文地址:https://www.cnblogs.com/asker009/p/12200941.html
Copyright © 2011-2022 走看看