zoukankan      html  css  js  c++  java
  • SPARK共享变量:广播变量和累加器

    Shared Variables

    Spark does provide two limited types of shared variables for two common usage patterns: broadcast variables and accumulators.

     Broadcast variables allow the programmer to keep a read-only variable cached on each machine rather than shipping a copy of it with tasks. 

    Broadcast variables are created from a variable v by calling SparkContext.broadcast(v). The broadcast variable is a wrapper around v, and its value can be accessed by calling the value method.    

     val broadcastVar sc.broadcast(Array(123))

    Accumulators are variables that are only “added” to through an associative and commutative operation and can therefore be efficiently supported in parallel. They can be used to implement counters (as in MapReduce) or sums. Spark natively supports accumulators of numeric types, and programmers can add support for new types.

    scala> val accnum=sc.longAccumulator("ggg")
    accnum: org.apache.spark.util.LongAccumulator = LongAccumulator(id: 264, name: Some(ggg), value: 0)

    scala> sc.parallelize(Array(1,2,3,4,5)).foreach(x=>accnum.add(x))

    scala> accnum
    res14: org.apache.spark.util.LongAccumulator = LongAccumulator(id: 264, name: Some(ggg), value: 15)

     累加器(accumulator)与广播变量(broadcast variable)。累加器用来对信息进行聚合,而广播变量用来高效分发较大的对象

  • 相关阅读:
    每日一题_191126
    每日一题_191125
    每日一题_191124
    每日一题_191123
    每日一题_191122
    每日一题_191121
    每日一题_191120
    由一元二次不等式的解法引出的自编题
    2019高考,李尚志教授三评“维纳斯”(来自网络)
    2019全国卷(III)理科23题的另类解法
  • 原文地址:https://www.cnblogs.com/playforever/p/9408109.html
Copyright © 2011-2022 走看看