zoukankan      html  css  js  c++  java
  • spark中产生shuffle的算子

    Spark中产生shuffle的算子

    作用

    算子名

    能否替换,由谁替换

    去重

    distinct()

    不能

    聚合

    reduceByKey()

    groupByKey

    groupBy()

    groupByKey()

    reduceByKey

    aggregateByKey()

    combineByKey()

    排序

    sortByKey()

    sortBy()

    重分区

    coalesce()

    repartition()

    集合或者表操作

    Intersection()

    Substract()

    SubstractByKey()

    Join()

    LeftOutJoin()

    https://www.cnblogs.com/Alex-zqzy/p/9949117.html

    去重

    def distinct()
    
    def distinct(numPartitions: Int)

    聚合

    def reduceByKey(func: (V, V) => V, numPartitions: Int): RDD[(K, V)]
    
    def reduceByKey(partitioner: Partitioner, func: (V, V) => V): RDD[(K, V)]
    
    def groupBy[K](f: T => K, p: Partitioner):RDD[(K, Iterable[V])]
    
    def groupByKey(partitioner: Partitioner):RDD[(K, Iterable[V])]
    
    def aggregateByKey[U: ClassTag](zeroValue: U, partitioner: Partitioner): RDD[(K, U)]
    
    def aggregateByKey[U: ClassTag](zeroValue: U, numPartitions: Int): RDD[(K, U)]
    
    def combineByKey[C](createCombiner: V => C, mergeValue: (C, V) => C, mergeCombiners: (C, C) => C): RDD[(K, C)]
    
    def combineByKey[C](createCombiner: V => C, mergeValue: (C, V) => C, mergeCombiners: (C, C) => C, numPartitions: Int): RDD[(K, C)]
    
    def combineByKey[C](createCombiner: V => C, mergeValue: (C, V) => C, mergeCombiners: (C, C) => C, partitioner: Partitioner, mapSideCombine: Boolean = true, serializer: Serializer = null): RDD[(K, C)]

    排序

    def sortByKey(ascending: Boolean = true, numPartitions: Int = self.partitions.length): RDD[(K, V)]
    
    def sortBy[K](f: (T) => K, ascending: Boolean = true, numPartitions: Int = this.partitions.length)(implicit ord: Ordering[K], ctag: ClassTag[K]): RDD[T]

    重分区

    def coalesce(numPartitions: Int, shuffle: Boolean = false, partitionCoalescer: Option[PartitionCoalescer] = Option.empty)
    
    def repartition(numPartitions: Int)(implicit ord: Ordering[T] = null)

    集合或者表操作

    def intersection(other: RDD[T]): RDD[T]
    
    def intersection(other: RDD[T], partitioner: Partitioner)(implicit ord: Ordering[T] = null): RDD[T]
    
    def intersection(other: RDD[T], numPartitions: Int): RDD[T]
    
    def subtract(other: RDD[T], numPartitions: Int): RDD[T]
    
    def subtract(other: RDD[T], p: Partitioner)(implicit ord: Ordering[T] = null): RDD[T]
    
    def subtractByKey[W: ClassTag](other: RDD[(K, W)]): RDD[(K, V)]
    
    def subtractByKey[W: ClassTag](other: RDD[(K, W)], numPartitions: Int): RDD[(K, V)]
    
    def subtractByKey[W: ClassTag](other: RDD[(K, W)], p: Partitioner): RDD[(K, V)]
    
    def join[W](other: RDD[(K, W)], partitioner: Partitioner): RDD[(K, (V, W))]
    
    def join[W](other: RDD[(K, W)]): RDD[(K, (V, W))]
    
    def join[W](other: RDD[(K, W)], numPartitions: Int): RDD[(K, (V, W))]
    
    def leftOuterJoin[W](other: RDD[(K, W)]): RDD[(K, (V, Option[W]))]

     

  • 相关阅读:
    eclipse项目迁移到android studio(图文最新版)
    栈上分配存储器的方法 alloca 抽样
    【PHP】PHP获得第一章
    阿里2015回顾面试招收学历(获得成功offer)
    Linux 于 shell 变数 $#,$@,$0,$1,$2 含义解释:
    Codeforces 451E Devu and Flowers(容斥原理)
    hdu 4964 Emmet()模拟
    “度”思考
    Windows Auzre 微软的云计算产品的后台操作界面
    Java设计模式菜鸟系列(两)建模与观察者模式的实现
  • 原文地址:https://www.cnblogs.com/moonlightml/p/10006573.html
Copyright © 2011-2022 走看看