zoukankan      html  css  js  c++  java
  • spark union intersection subtract

    union、intersection subtract 都是transformation 算子

    1、union 合并2个数据集,2个数据集的类型要求一致,返回的新RDD的分区数是合并RDD分区数的总和;

        val kzc=spark.sparkContext.parallelize(List(("hive",8),("apache",8),("hive",30),("hadoop",18)),2)
        val bd=spark.sparkContext.parallelize(List(("hive",18),("test",2),("spark",20)),1)
        val result=bd.union(kzc)
        println(result.partitions.size)
        println("*******************")
        result.collect().foreach(println(_))

    结果

    3
    *******************
    (hive,18)
    (test,2)
    (spark,20)
    (hive,8)
    (apache,8)
    (hive,30)
    (hadoop,18)

    2、intersection 取交集,新RDD的分区与父RDD分区数多的一致

     spark.sparkContext.setLogLevel("error")
        val kzc=spark.sparkContext.parallelize(List(("hive",8),("apache",8),("hive",30),("hadoop",18)),2)
        val bd=spark.sparkContext.parallelize(List(("hive",8),("test",2),("spark",20)),1)
        val result=bd.intersection(kzc)
        println(result.partitions.size)
        println("*******************")
        result.collect().foreach(println(_))

    结果

    2
    *******************
    (hive,8)

    3、subtract,减去二者之间的交集(intersection),新RDD与subtract前边的父RDD分区数一致

        spark.sparkContext.setLogLevel("error")
        val kzc=spark.sparkContext.parallelize(List(("hive",8),("apache",8),("hive",30),("hadoop",18)),2)
        val bd=spark.sparkContext.parallelize(List(("hive",8),("test",2),("spark",20)),1)
        val result=bd.subtract(kzc)
        println(result.partitions.size)
        println("*******************")
        result.collect().foreach(println(_))

    结果

    1
    *******************
    (test,2)
    (spark,20)
  • 相关阅读:
    Adventure C CF-665E(字典树、二进制)
    实验7投资问题
    Beautiful Array CF-1155D(DP)
    Salary Changing CF-1251D(二分)
    Beautiful Sets of Points CF-268C(乱搞)
    Vasya And Array CF1187C(构造)
    Tree Painting CF-1187E(换根DP)
    Vus the Cossack and Numbers CF-1186D(思维)
    Tree POJ-1741(点分治+树形DP)
    Magical Girl Haze 计蒜客-A1958(分层最短路)
  • 原文地址:https://www.cnblogs.com/students/p/14237108.html
Copyright © 2011-2022 走看看