zoukankan      html  css  js  c++  java
  • Spark IMF传奇行动第18课:RDD持久化、广播、累加器总结

    昨晚听了王家林老师的Spark IMF传奇行动第18课:RDD持久化、广播、累加器,作业是unpersist试验,阅读累加器源码看内部工作机制:

    scala> val rdd = sc.parallelize(1 to 1000)
    rdd: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[0] at parallelize at <console>:21
    
    scala> rdd.persist
    res0: rdd.type = ParallelCollectionRDD[0] at parallelize at <console>:21
    
    scala> rdd.count
    16/01/24 11:42:56 INFO DAGScheduler: Job 0 finished: count at <console>:24, took 1.451543 s
    res1: Long = 1000
    
    16/01/24 11:43:14 INFO DAGScheduler: Job 2 finished: count at <console>:24, took 0.094119 s
    res3: Long = 1000
    
    scala> rdd.unpersist()
    16/01/24 11:43:43 INFO ParallelCollectionRDD: Removing RDD 0 from persistence list
    16/01/24 11:43:43 INFO BlockManager: Removing RDD 0
    res5: rdd.type = ParallelCollectionRDD[0] at parallelize at <console>:21
    
    scala> rdd.count
    16/01/24 11:44:56 INFO DAGScheduler: Job 0 finished: count at <console>:24, took 1.475321 s
    res1: Long = 1000

    persisit后,count执行快了许多,但unpersist后,执行又变慢了。

     累加器Accumulator:全局唯一,对于Executor只能修改但不可读,只对Driver可读,只增不减

     val sum = sc.accumulator(0)
     val d1 = sc.parallelize(1 to 5)
     val result1 = d1.foreach(item => sum+= item)
     println(sum)

    结果是15.

    后续课程可以参照新浪微博 王家林_DT大数据梦工厂:http://weibo.com/ilovepains

    王家林  中国Spark第一人,微信公共号DT_Spark

    转发请写明出处。

  • 相关阅读:
    ios中的XMPP简介
    iOS项目开发中的目录结构
    ios中怎么样点击背景退出键盘
    ios中怎么处理键盘挡住输入框
    ios中怎么样调节占位文字与字体大小在同一高度
    ios中怎么样设置drawRect方法中绘图的位置
    ios中用drawRect方法绘图的时候设置颜色
    字符串常见操作
    字典、列表、元组
    字符串查看及应用
  • 原文地址:https://www.cnblogs.com/haitianS/p/5154888.html
Copyright © 2011-2022 走看看