zoukankan      html  css  js  c++  java
  • Spark ML 之 ALS内存溢出的解决办法

    原帖:https://blog.csdn.net/Damonhaus/article/details/76572971

    问题:协同过滤 ALS算法。在测试过程中遇到了内存溢出的错误

    解决办法1:降低迭代次数,20次 -> 10次

      val model = new ALS().setRank(10).setIterations(20).setLambda(0.01).setImplicitPrefs(false) .run(alldata) 

    以上改成 .setIterations(10)

    解决办法2:checkpoint机制

      /**
         *  删除checkpoint留下的过程数据
         */
        val path = new Path(HDFSConnection.paramMap("hadoop_url")+"/checkpoint"); //声明要操作(删除)的hdfs 文件路径
        val hadoopConf = spark.sparkContext.hadoopConfiguration
        val hdfs = org.apache.hadoop.fs.FileSystem.get(new URI(HDFSConnection.paramMap("hadoop_url")+"/checkpoint"),hadoopConf)
        if(hdfs.exists(path)) {
          //需要递归删除设置true,不需要则设置false
          hdfs.delete(path, true) //这里因为是过程数据,可以递归删除
        }
    
      /**
       * 设置 CheckpointDir
       */
        spark.sparkContext.setCheckpointDir(HDFSConnection.paramMap("hadoop_url")+"/checkpoint")
     /**
       * Set period (in iterations) between checkpoints (default = 10). Checkpointing helps with
       * recovery (when nodes fail) and StackOverflow exceptions caused by long lineage. It also helps
       * with eliminating temporary shuffle files on disk, which can be important when there are many
       * ALS iterations. If the checkpoint directory is not set in [[org.apache.spark.SparkContext]],
       * this setting is ignored.
       */
    
    val model = new ALS().setCheckpointInterval(2).setRank(10).setIterations(20).setLambda(0.01).setImplicitPrefs(false)
          .run(alldata)
  • 相关阅读:
    js中Unicode转义序列
    css相对定位和绝对定位
    C#默认以管理员身份运行程序
    asp.net判断是否代理
    JS上传图片选择后立即预览
    asp.net判断是刷新还是提交
    查询QQ好友的IP地址(二)
    查询QQ好友的IP地址(一)
    Hadoop综合大作业+补交平时作业
    熟悉常用的Hbase操作
  • 原文地址:https://www.cnblogs.com/sabertobih/p/13863214.html
Copyright © 2011-2022 走看看