zoukankan      html  css  js  c++  java
  • PairRDD中算子reduceByKey图解

    reduceByKey

    函数原型:

    def reduceByKey(func: (V, V) => V): RDD[(K, V)]

    def reduceByKey(func: (V, V) => V, numPartitions: Int): RDD[(K, V)]

    def reduceByKey(partitioner: Partitioner, func: (V, V) => V): RDD[(K, V)]

    作用:

    按照func的映射关系,将两个V型的值映射到相同类型的V值上去。

    例子:

    scala> var rdd1 = sc.makeRDD(Array(("A",0),("A",2),("B",1),("B",2),("C",1)))
    rdd1: org.apache.spark.rdd.RDD[(String, Int)] = ParallelCollectionRDD[0] at makeRDD at <console>:27

    scala> rdd1.partitions.size
    res0: Int = 48

    scala> var rdd2 = rdd1.reduceByKey((x,y) => x + y)
    rdd2: org.apache.spark.rdd.RDD[(String, Int)] = ShuffledRDD[1] at reduceByKey at <console>:29

    scala> rdd2.collect
    res1: Array[(String, Int)] = Array((A,2), (B,3), (C,1))

    scala> rdd2.partitions.size
    res2: Int = 48

    scala> var rdd2 = rdd1.reduceByKey(new org.apache.spark.HashPartitioner(2),(x,y) => x + y)
    rdd2: org.apache.spark.rdd.RDD[(String, Int)] = ShuffledRDD[2] at reduceByKey at <console>:29

    scala> rdd2.collect
    res3: Array[(String, Int)] = Array((B,3), (A,2), (C,1))

    scala> rdd2.partitions.size
    res4: Int = 2

  • 相关阅读:
    资料下载
    sublime安装AngularJS插件
    Zen Coding: 一种快速编写HTML/CSS代码的方法[Emmet]
    手机号码归属地API
    浅谈JavaScript中的作用域
    原生ajax、jsoup
    Java排序:选择排序
    Java排序:冒泡排序
    Oracle系列一、基本术语
    linux jdk tomcat 安装
  • 原文地址:https://www.cnblogs.com/seaspring/p/5722036.html
Copyright © 2011-2022 走看看