zoukankan      html  css  js  c++  java
  • spark coalesce和repartition的区别和使用场景

    区别:

    repartition底层调用的是coalesce方法,默认shuffle

    def repartition(numPartitions: Int)(implicit ord: Ordering[T] = null): RDD[T] = withScope {
    coalesce(numPartitions, shuffle = true)
    }

    coalesce方法的shuffle参数默认为false,默认不shuffle

    def coalesce(numPartitions: Int, shuffle: Boolean = false)(implicit ord: Ordering[T] = null)
        : RDD[T] = withScope {
      if (shuffle) {
        /** Distributes elements evenly across output partitions, starting from a random partition. */
        val distributePartition = (index: Int, items: Iterator[T]) => {
          var position = (new Random(index)).nextInt(numPartitions)
          items.map { t =>
            // Note that the hash code of the key will just be the key itself. The HashPartitioner
            // will mod it with the number of total partitions.
            position = position + 1
            (position, t)
          }
        } : Iterator[(Int, T)]
     
        // include a shuffle step so that our upstream tasks are still distributed
        new CoalescedRDD(
          new ShuffledRDD[Int, T, T](mapPartitionsWithIndex(distributePartition),
          new HashPartitioner(numPartitions)),
          numPartitions).values
      } else {
        new CoalescedRDD(this, numPartitions)
      }
    }

     

    使用场景:

    如果你减少分区数,考虑使用coalesce,这样可以避免执行shuffle。但是假如内存不够用,可能会引起内存溢出。

  • 相关阅读:
    qwebchannel.js
    锚点
    ECharts
    基于html2canvas实现网页保存为图片及图片清晰度优化
    JS判断某变量是否为某数组中的一个值的3种方法
    .on()之selector——jQuery
    Loading——spin.js
    理解CSS3 transform中的Matrix(矩阵)
    IOS中修改图片的大小:修改分辨率和裁剪
    IOS版本被拒的经历
  • 原文地址:https://www.cnblogs.com/Alcesttt/p/11386049.html
Copyright © 2011-2022 走看看