zoukankan      html  css  js  c++  java
  • 在交互环境下使用 Pyspark 提交任务给 Spark 解决 : java.sql.SQLException: No suitable driver

    在 jupyter 上启用 local 交互环境和 spark 进行交互使用 imapla 来帮助 spark 取数据却失败了

    from pyspark.sql import SparkSession
    
    jdbc_url= "jdbc:impala://data1.hundun-new.sa:21050/rawdata;UseNativeQuery=1"
    spark = SparkSession.builder 
    .appName("sa-test") 
    .master("local") 
    .getOrCreate()
    
    # properties = {
    #     "driver": "com.cloudera.ImpalaJDBC41",
    #     "AuthMech": "1",
    # #     "KrbRealm": "EXAMPLE.COM",
    # #     "KrbHostFQDN": "impala.example.com",
    #     "KrbServiceName": "impala"
    # }
    
    
    # df = spark.read.jdbc(url=jdbc_url, table="(/*SA(default)*/ SELECT date, event, count(*) AS c FROM events WHERE date=CURRENT_DATE() GROUP BY 1,2) a")
    df = spark.read.jdbc(url=jdbc_url, table="(/*SA(production)*/ SELECT date, event, count(*) AS c FROM events WHERE date=CURRENT_DATE())")
    df.select(df['date'], df['event'], df['c'] * 10000).show()
    
    
    y4JJavaError: An error occurred while calling o32.jdbc.
    : java.sql.SQLException: No suitable driver
        at java.sql.DriverManager.getDriver(DriverManager.java:315)
        at org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions$$anonfun$6.apply(JDBCOptions.scala:105)
        at org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions$$anonfun$6.apply(JDBCOptions.scala:105)
        at scala.Option.getOrElse(Option.scala:121)
        at org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions.<init>(JDBCOptions.scala:104)
        at org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions.<init>(JDBCOptions.scala:35)
        at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.s

    可以清楚的看到报出的错误 No suitable driver ,我们需要添加上 impala 的 jdbc driver 才能正常运行。

    首先我们下载一个 impala 的 jdbc driver 

    http://repo.odysseusinc.com/artifactory/community-libs-release-local/com/cloudera/ImpalaJDBC41/2.6.3/ImpalaJDBC41-2.6.3.jar

    然后我们在申请 ss 的时候通过 cnofig 指定该 impala driver 的路径即可

    from pyspark.sql import SparkSession
    
    jdbc_url= "jdbc:impala://data1.hundun-new.sa:21050/rawdata;UseNativeQuery=1"
    spark = SparkSession.builder 
    .appName("sa-test") 
    .master("local") 
    .config('spark.driver.extraClassPath', '/usr/share/java/ImpalaJDBC41-2.6.3.jar') 
    .getOrCreate()

    这里我在 stackoverflow 上还找到另外一种方法

    EDIT

    The answers from How to load jar dependenices in IPython Notebook are already listed in the link I shared myself, and do not work for me. I already tried to configure the environment variable from the notebook:

    import os
    os.environ['PYSPARK_SUBMIT_ARGS'] = '--driver-class-path /path/to/postgresql.jar --jars /path/to/postgresql.jar'

    There's nothing wrong with the file path or the file itself since it works fine when I specify it and run the pyspark-shell.

    Reference:

    https://spark.apache.org/docs/latest/configuration.html    Spark Configuration 

    https://stackoverflow.com/questions/51772350/how-to-specify-driver-class-path-when-using-pyspark-within-a-jupyter-notebook    How to specify driver class path when using pyspark within a jupyter notebook?

  • 相关阅读:
    iOS: 学习笔记, Swift与C指针交互(译)
    kubernetes多节点部署的决心
    vim温馨提示
    简单工厂
    C++调用一个成员函数的需求this指针的情况
    hdoj 1226 超级password 【隐图BFS】
    Oracle Global Finanicals Technical Reference(一个)
    连载:面向对象的葵花宝典:思维、技能与实践(40)
    Android启动第三方应用程序
    BZOJ 1004 HNOI2008 Cards Burnside引理
  • 原文地址:https://www.cnblogs.com/piperck/p/12056226.html
Copyright © 2011-2022 走看看