zoukankan      html  css  js  c++  java
  • SPARK共享变量:广播变量和累加器

    Shared Variables

    Spark does provide two limited types of shared variables for two common usage patterns: broadcast variables and accumulators.

     Broadcast variables allow the programmer to keep a read-only variable cached on each machine rather than shipping a copy of it with tasks. 

    Broadcast variables are created from a variable v by calling SparkContext.broadcast(v). The broadcast variable is a wrapper around v, and its value can be accessed by calling the value method.    

     val broadcastVar sc.broadcast(Array(123))

    Accumulators are variables that are only “added” to through an associative and commutative operation and can therefore be efficiently supported in parallel. They can be used to implement counters (as in MapReduce) or sums. Spark natively supports accumulators of numeric types, and programmers can add support for new types.

    scala> val accnum=sc.longAccumulator("ggg")
    accnum: org.apache.spark.util.LongAccumulator = LongAccumulator(id: 264, name: Some(ggg), value: 0)

    scala> sc.parallelize(Array(1,2,3,4,5)).foreach(x=>accnum.add(x))

    scala> accnum
    res14: org.apache.spark.util.LongAccumulator = LongAccumulator(id: 264, name: Some(ggg), value: 15)

     累加器(accumulator)与广播变量(broadcast variable)。累加器用来对信息进行聚合,而广播变量用来高效分发较大的对象

  • 相关阅读:
    表格标签
    图片标签
    超链接标签
    媒体标签
    实体标签
    html常用的标签
    头信息的作用
    【bzoj5017】[Snoi2017]炸弹 线段树优化建图+Tarjan+拓扑排序
    【bzoj3309】DZY Loves Math 莫比乌斯反演+线性筛
    【bzoj4010】[HNOI2015]菜肴制作 拓扑排序+堆
  • 原文地址:https://www.cnblogs.com/playforever/p/9408109.html
Copyright © 2011-2022 走看看