zoukankan      html  css  js  c++  java
  • SPARK共享变量:广播变量和累加器

    Shared Variables

    Spark does provide two limited types of shared variables for two common usage patterns: broadcast variables and accumulators.

     Broadcast variables allow the programmer to keep a read-only variable cached on each machine rather than shipping a copy of it with tasks. 

    Broadcast variables are created from a variable v by calling SparkContext.broadcast(v). The broadcast variable is a wrapper around v, and its value can be accessed by calling the value method.    

     val broadcastVar sc.broadcast(Array(123))

    Accumulators are variables that are only “added” to through an associative and commutative operation and can therefore be efficiently supported in parallel. They can be used to implement counters (as in MapReduce) or sums. Spark natively supports accumulators of numeric types, and programmers can add support for new types.

    scala> val accnum=sc.longAccumulator("ggg")
    accnum: org.apache.spark.util.LongAccumulator = LongAccumulator(id: 264, name: Some(ggg), value: 0)

    scala> sc.parallelize(Array(1,2,3,4,5)).foreach(x=>accnum.add(x))

    scala> accnum
    res14: org.apache.spark.util.LongAccumulator = LongAccumulator(id: 264, name: Some(ggg), value: 15)

     累加器(accumulator)与广播变量(broadcast variable)。累加器用来对信息进行聚合,而广播变量用来高效分发较大的对象

  • 相关阅读:
    SpringBoot连接数据库
    String、StringBuffer、StringBulider的区别和解析
    异常This application has no explicit mapping for /error
    node使用
    JS总结defer与async(一)
    前端项目搭建与知识框架
    git ssh配置总结
    JS算法
    JS数据结构
    Http与Http2与Https区别和联系
  • 原文地址:https://www.cnblogs.com/playforever/p/9408109.html
Copyright © 2011-2022 走看看