zoukankan      html  css  js  c++  java
  • Spark在Windows上调试

    1. 背景

    (1) spark的一般开发与运行流程是在本地Idea或Eclipse中写好对应的spark代码,然后打包部署至驱动节点,然后运行spark-submit。然而,当运行时异常,如空指针或数据库连接等出现问题时,又需要再次修改优化代码,然后再打包....有木有可能只需一次部署?

    (2) 当新版本的spark发布时,想立刻马上体验新特性,而当前没有现成的spark集群,或spark集群版本较老,又如何体验新特性呢?

    2. 方案

    (1) 无需多次打包测试,直接在本地测试或调试通过,然后只需要打包部署一次即可。

    spark支持standalone本地模式,初始化SparkConf时,设置master时,仅需指定"local[*]"或"local[1]"

    (2) 基于本地模式,即使无现有的spark集群,也可以调试新版本的spark

    只需在sbt或maven的配置文件中增加新版本的依赖即可。

    (3) 设置spark的日志级别

    spark默认打印INFO信息,比如我只想打印take操作后的少许数据,但调用spark时打印日志太多,就得从一大堆日志中进行查找。因此更改spark的默认日志级别。具体配置如下:

    # Set everything to be logged to the console
    log4j.rootCategory=INFO, console
    log4j.appender.console=org.apache.log4j.ConsoleAppender
    log4j.appender.console.target=System.err
    log4j.appender.console.layout=org.apache.log4j.PatternLayout
    log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n
    
    # Settings to quiet third party logs that are too verbose
    log4j.logger.org.spark_project.jetty=ERROR
    log4j.logger.org.spark_project=ERROR
    log4j.logger.org.apache.spark=ERROR
    log4j.logger.org.apache.parquet=ERROR
    log4j.logger.parquet=ERROR
    log4j.logger.io.netty=ERROR
    log4j.logger.org.apache.hadoop=FATAL
    
    
    # SPARK-9183: Settings to avoid annoying messages when looking up nonexistent UDFs in SparkSQL with Hive support
    log4j.logger.org.apache.hadoop.hive.metastore.RetryingHMSHandler=FATAL
    
    # 控制台输出
    log4j.appender.stdout=org.apache.log4j.ConsoleAppender
    log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
    log4j.appender.stdout.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss,SSS} %5p %c{1}:%L - %m%n
    View Code

    (4) 测试代码

    import org.apache.spark.{SparkConf, SparkContext}
    
    object Test {
    
      def main(args: Array[String]): Unit = {
        val sc = new SparkContext(new SparkConf().setMaster("local[1]").setAppName("test"))
        println(sc.version)
        sc.parallelize(List(1,2,3,4)).foreach(println)
        sc.stop()
      }
    
    }
    View Code

      运行结果

    log4j: Trying to find [log4j.xml] using context classloader sun.misc.Launcher$AppClassLoader@18b4aac2.
    log4j: Trying to find [log4j.xml] using sun.misc.Launcher$AppClassLoader@18b4aac2 class loader.
    log4j: Trying to find [log4j.xml] using ClassLoader.getSystemResource().
    log4j: Trying to find [log4j.properties] using context classloader sun.misc.Launcher$AppClassLoader@18b4aac2.
    log4j: Using URL [file:/E:/IntelliJWorkSpace/AIMind-backend/aimind_backend/pipeline-tools/target/classes/log4j.properties] for automatic log4j configuration.
    log4j: Reading configuration from URL file:/E:/IntelliJWorkSpace/AIMind-backend/aimind_backend/pipeline-tools/target/classes/log4j.properties
    log4j: Parsing for [root] with value=[INFO, console].
    log4j: Level token is [INFO].
    log4j: Category root set to INFO
    log4j: Parsing appender named "console".
    log4j: Parsing layout options for "console".
    log4j: Setting property [conversionPattern] to [%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n].
    log4j: End of parsing for "console".
    log4j: Setting property [target] to [System.err].
    log4j: Parsed "console" options.
    log4j: Parsing for [org.spark_project.jetty] with value=[ERROR].
    log4j: Level token is [ERROR].
    log4j: Category org.spark_project.jetty set to ERROR
    log4j: Handling log4j.additivity.org.spark_project.jetty=[null]
    log4j: Parsing for [org.spark_project] with value=[ERROR].
    log4j: Level token is [ERROR].
    log4j: Category org.spark_project set to ERROR
    log4j: Handling log4j.additivity.org.spark_project=[null]
    log4j: Parsing for [org.apache.spark] with value=[ERROR].
    log4j: Level token is [ERROR].
    log4j: Category org.apache.spark set to ERROR
    log4j: Handling log4j.additivity.org.apache.spark=[null]
    log4j: Parsing for [org.apache.hadoop.hive.metastore.RetryingHMSHandler] with value=[FATAL].
    log4j: Level token is [FATAL].
    log4j: Category org.apache.hadoop.hive.metastore.RetryingHMSHandler set to FATAL
    log4j: Handling log4j.additivity.org.apache.hadoop.hive.metastore.RetryingHMSHandler=[null]
    log4j: Parsing for [parquet] with value=[ERROR].
    log4j: Level token is [ERROR].
    log4j: Category parquet set to ERROR
    log4j: Handling log4j.additivity.parquet=[null]
    log4j: Parsing for [io.netty] with value=[ERROR].
    log4j: Level token is [ERROR].
    log4j: Category io.netty set to ERROR
    log4j: Handling log4j.additivity.io.netty=[null]
    log4j: Parsing for [org.apache.hadoop] with value=[FATAL].
    log4j: Level token is [FATAL].
    log4j: Category org.apache.hadoop set to FATAL
    log4j: Handling log4j.additivity.org.apache.hadoop=[null]
    log4j: Parsing for [org.apache.parquet] with value=[ERROR].
    log4j: Level token is [ERROR].
    log4j: Category org.apache.parquet set to ERROR
    log4j: Handling log4j.additivity.org.apache.parquet=[null]
    log4j: Finished configuring.
    2.4.1
    1
    2
    3
    4
    View Code

    3. 参考

    (1) https://www.jianshu.com/p/c4b6ed734e72

    (2) https://blog.csdn.net/weixin_41122339/article/details/81141913

    按照如上两个链接的方法,在windows环境上调试spark:下载winutils.exe -> 配置环境变量,重启womdows, 增加spark依赖....

    4.  异常解决

     (1) 按照如上第一个链接配置spark的输出日志级别时,总是还能显示出spark的INFO、DEBUG信息,随单步调试排查了下,发现"Class path contains multiple SLF4J bindings."异常,找到本地的包仓库地址,删除非slf4j对应的包即可

     

  • 相关阅读:
    输入输出重定向
    进程管理
    普通变量_环境变量_环境变量配置文件
    高级文件操作命令_文件查找
    软件包管理_rpm命令管理_yum工具管理_文件归档压缩_源码包管理
    用户管理_组管理_设置主机名_UGO_文件高级权限_ACL权限
    字符串是否包含中文
    SQL 优化
    JS数组
    RedisUtil 工具类
  • 原文地址:https://www.cnblogs.com/mengrennwpu/p/11045639.html
Copyright © 2011-2022 走看看