zoukankan      html  css  js  c++  java
  • Spark SQL读取hive数据时报找不到mysql驱动

    Exception:

    Caused by: org.datanucleus.exceptions.NucleusException: Attempt to invoke the "BoneCP" plugin to create a ConnectionPool gave an error : The specified datastore driver ("com.mysql.jdbc.Driver") was not found in the CLASSPATH. Please check your CLASSPATH specification, and the name of the driver.

    Solution:

    1、$HIVE_HOME/conf/hive-site.xml中增加关于 hive.metastore.uris 的配置信息,如下:
    <property>
      <name>hive.metastore.uris</name>
      <value>thrift://namenode1:9083</value>
      <description>IP address (or fully-qualified domain name) and port of the metastore host</description>
    </property>

    2、执行:$HIVE_HOME/bin/hive --service metastore,启动元数据存储服务;

    3、将$HIVE_HOME/conf/hive-site.xml拷贝至$SPARK_HOME/conf/目录下;

    4、启动spark-shell进行验证:$SPARK_HOME/bin/spark-shell --master namenode1:7077或spark-sql -> show databases.


    Note:
    1. 当在Intellij IDE中编写Spark SQL程序时(val hiveContext = new HiveContext(sc); import hiveContext.sql; sql("show databases")),打包成相应的.jar文件,并利用如下脚本将任务提交到Spark集群运行时,Spark默认采用derby进行metastore,即元数据的存储;当再次在不同目录下执行该任务时,之前创建的数据库或表数据无法获取,有点即用即删的感觉。故要想访问Hive下的元数据,首先需要将Hive目录下的配置文件中的hive-site.xml文件放到Spark目录下的配置文件中,让Spark集群执行程序时能识别进入Hive元数据的路径,然后启动上述服务(
    hive --service metastore)即可访问Hive相应数据。

    2.
    /**
    * An instance of the Spark SQL execution engine that integrates with data stored in Hive.
    * Configuration for Hive is read from hive-site.xml on the classpath.
    */
    class HiveContext(sc: SparkContext) extends SQLContext(sc) {
    
     ....................................
    
    }
    
    

    3. 

    Use HiveContext instead.  It will still create a local metastore if one is not specified. However, note that the default directory is ./metastore_db, not ./metastore

    测试程序如下:

    package com.husor.Hive
    
    import org.apache.spark.{SparkContext, SparkConf}
    import org.apache.spark.sql.hive.HiveContext
    
    /* Spark SQL执行时的sql是临时的,即用即删 **/
    
    /**
     * Created by kelvin on 2015/1/27.
     */
    object Recommendation {
      def main(args: Array[String]) {
    
        println("Test is starting......")
    
        if (args.length < 1) {
          System.err.println("Usage:HDFS_OutputDir <Directory>")
          System.exit(1)
        }
    
        //System.setProperty("hadoop.home.dir", "d:\winutil\")
    
        val conf = new SparkConf().setAppName("Recommendation")
        val spark = new SparkContext(conf)
    
        val hiveContext = new HiveContext(spark)
    
        import hiveContext.sql
    
        /*sql("create database if not exists baby")
        val databases = sql("show databases")
        databases.collect.foreach(println)*/
    
        sql("use baby")
        /*sql("CREATE EXTERNAL TABLE if not exists origin_orders (oid string, uid INT, gmt_create INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY '	' LINES TERMINATED BY '
    ' LOCATION '/beibei/order'")
        sql("CREATE EXTERNAL TABLE if not exists items (iid INT, pid INT, title string, cid INT, brand INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY '	' LINES TERMINATED BY '
    ' LOCATION '/beibei/item'")
        sql("CREATE EXTERNAL TABLE if not exists order_item (oid string, iid INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY '	' LINES TERMINATED BY '
    ' LOCATION '/beibei/order_item'")
        sql("create table if not exists test_orders(oid string, uid INT, gmt_create INT)")
        sql("create table if not exists verify_orders(oid string, uid INT, gmt_create INT)")
        sql("insert OVERWRITE table test_orders select * from origin_orders where gmt_create <= 1415635200")
        sql("insert OVERWRITE table verify_orders select * from origin_orders where gmt_create > 1415635200")
    
        val tables = sql("show tables")
        tables.collect.foreach(println)*/
    
        sql("SET spark.sql.shuffle.partitions = 5")
    
        val olderTime = System.currentTimeMillis()
    
        val userOrderData = sql("select i.pid, o.uid, o.gmt_create from items i " +
                                             "join order_item oi " +
                                             "on i.iid = oi.iid     " +
                                             "join test_orders o " +
                                             "on oi.oid = o.oid")
    
        userOrderData.take(10).foreach(println)
    
        val newTime = System.currentTimeMillis()
    
        println("Consume Time: " + (newTime - olderTime))
    
        userOrderData.saveAsTextFile(args(0))
        spark.stop()
    
        println("Test is Succeed!!!")
    
      }
    
    }
     
  • 相关阅读:
    国内最火的3款前端开发框架
    Cordova是做什么的
    老师你好。使用cordova生成的hellowold 的安卓5.0版本太高。怎么才可以生成4.4的呢?
    一个类似bootstrap的foundation
    role在标签中的作用是什么?
    如何做到根据不同的进度用不同的颜色显示整个进度条
    wall 和panel有啥区别
    git ignore
    eclipse js 引用跳转
    计划
  • 原文地址:https://www.cnblogs.com/likai198981/p/4256034.html
Copyright © 2011-2022 走看看