zoukankan      html  css  js  c++  java
  • RDD中cache和persist的区别

    通过观察RDD.scala源代码即可知道cache和persist的区别:

    def persist(newLevel: StorageLevel): this.type = {
      if (storageLevel != StorageLevel.NONE && newLevel != storageLevel) {
        throw new UnsupportedOperationException( "Cannot change storage level of an RDD after it was already assigned a level")
      }
      sc.persistRDD(this)

      sc.cleaner.foreach(_.registerRDDForCleanup(this))
      storageLevel = newLevel
      this
    }

    /** Persist this RDD with the default storage level (`MEMORY_ONLY`). */
    def persist(): this.type = persist(StorageLevel.MEMORY_ONLY)

     

    /** Persist this RDD with the default storage level (`MEMORY_ONLY`). */
    def cache(): this.type = persist()

     

     

     

     

     

     

     

     

     

     

     

    可知:

    1)RDD的cache()方法其实调用的就是persist方法,缓存策略均为MEMORY_ONLY;

    2)可以通过persist方法手工设定StorageLevel来满足工程需要的存储级别;

    3)cache或者persist并不是action;

     

     

     

     

  • 相关阅读:
    Linux环境变量$PATH
    grep
    echo命令
    ip命令
    浅析Linux下的/etc/profile、/etc/bashrc、~/.bash_profile、~/.bashrc文件
    shell脚本4种执行方式
    /proc路径
    tr命令
    Linux命令cut
    前端论坛网站知识
  • 原文地址:https://www.cnblogs.com/luogankun/p/3801062.html
Copyright © 2011-2022 走看看