zoukankan      html  css  js  c++  java
  • PairRDD中算子reduceByKey图解

    reduceByKey

    函数原型:

    def reduceByKey(func: (V, V) => V): RDD[(K, V)]

    def reduceByKey(func: (V, V) => V, numPartitions: Int): RDD[(K, V)]

    def reduceByKey(partitioner: Partitioner, func: (V, V) => V): RDD[(K, V)]

    作用:

    按照func的映射关系,将两个V型的值映射到相同类型的V值上去。

    例子:

    scala> var rdd1 = sc.makeRDD(Array(("A",0),("A",2),("B",1),("B",2),("C",1)))
    rdd1: org.apache.spark.rdd.RDD[(String, Int)] = ParallelCollectionRDD[0] at makeRDD at <console>:27

    scala> rdd1.partitions.size
    res0: Int = 48

    scala> var rdd2 = rdd1.reduceByKey((x,y) => x + y)
    rdd2: org.apache.spark.rdd.RDD[(String, Int)] = ShuffledRDD[1] at reduceByKey at <console>:29

    scala> rdd2.collect
    res1: Array[(String, Int)] = Array((A,2), (B,3), (C,1))

    scala> rdd2.partitions.size
    res2: Int = 48

    scala> var rdd2 = rdd1.reduceByKey(new org.apache.spark.HashPartitioner(2),(x,y) => x + y)
    rdd2: org.apache.spark.rdd.RDD[(String, Int)] = ShuffledRDD[2] at reduceByKey at <console>:29

    scala> rdd2.collect
    res3: Array[(String, Int)] = Array((B,3), (A,2), (C,1))

    scala> rdd2.partitions.size
    res4: Int = 2

  • 相关阅读:
    Encoding
    F Takio与Blue的人生赢家之战
    D FFF团的怒火
    C Golden gun的巧克力
    B 倒不了的塔
    A jubeat
    17230 计算轴承半径
    10686 DeathGod不知道的事情
    10688 XYM-AC之路
    10692 XYM-入门之道
  • 原文地址:https://www.cnblogs.com/seaspring/p/5722036.html
Copyright © 2011-2022 走看看