zoukankan      html  css  js  c++  java
  • sparksql与hive整合

    hive配置

    编辑 $HIVE_HOME/conf/hive-site.xml,增加如下内容:

    <property>
      <name>hive.metastore.uris</name>
      <value>thrift://master:9083</value>
      <description>Thrift uri for the remote metastore. Used by metastore client to connect to remote metastore.</description>
    </property>

    启动hive metastore

     启动 metastore:
     $hive --service metastore &
     查看 metastore:
     $jobs
    [1]+  Running                 hive --service metastore &
    
    关闭 metastore:
    
    $kill %1
    
    kill %jobid,1代表job id

    spark配置

    $HIVE_HOME/conf/hive-site.xml copy或者软链 到 $SPARK_HOME/conf/$HIVE_HOME/lib/mysql-connector-java-5.1.12.jar copy或者软链到$SPARK_HOME/lib/
    copy或者软链$SPARK_HOME/lib/ 是方便spark standalone模式使用

    启动spark-sql

    1. standalone模式

      ./bin/spark-sql --master spark:master:7077 --jars /home/stark_summer/spark/spark-1.4/spark-1.4.1/lib/mysql-connector-java-5.1.12.jar
    2. yarn-client模式
    $./bin/spark-sql --master yarn-client --jars /home/stark_summer/spark/spark-1.4/spark-1.4.1/lib/mysql-connector-java-5.1.12.jar
    
    执行 sql:
    select count(*) from o2o_app;
    
    结果:
    302
    Time taken: 0.828 seconds, Fetched 1 row(s)
    2015-09-14 18:27:43,158 INFO  [main] CliDriver (SessionState.java:printInfo(536)) - Time taken: 0.828 seconds, Fetched 1 row(s)
    spark-sql> 2015-09-14 18:27:43,160 INFO  [SparkListenerBus] scheduler.StatsReportListener (Logging.scala:logInfo(59)) - Finished stage: org.apache.spark.scheduler.StageInfo@5939ed30
    2015-09-14 18:27:43,161 INFO  [SparkListenerBus] scheduler.StatsReportListener (Logging.scala:logInfo(59)) - task runtime:(count: 1, mean: 242.000000, stdev: 0.000000, max: 242.000000, min: 242.000000)
    2015-09-14 18:27:43,161 INFO  [SparkListenerBus] scheduler.StatsReportListener (Logging.scala:logInfo(59)) -    0%      5%      10%     25%     50%     75%     90%     95%     100%
    2015-09-14 18:27:43,161 INFO  [SparkListenerBus] scheduler.StatsReportListener (Logging.scala:logInfo(59)) -    242.0 ms        242.0 ms        242.0 ms        242.0 ms        242.0 ms        242.0 ms    242.0 ms 242.0 ms        242.0 ms
    2015-09-14 18:27:43,162 INFO  [SparkListenerBus] scheduler.StatsReportListener (Logging.scala:logInfo(59)) - fetch wait time:(count: 1, mean: 0.000000, stdev: 0.000000, max: 0.000000, min: 0.000000)
    2015-09-14 18:27:43,162 INFO  [SparkListenerBus] scheduler.StatsReportListener (Logging.scala:logInfo(59)) -    0%      5%      10%     25%     50%     75%     90%     95%     100%
    2015-09-14 18:27:43,162 INFO  [SparkListenerBus] scheduler.StatsReportListener (Logging.scala:logInfo(59)) -    0.0 ms  0.0 ms  0.0 ms  0.0 ms  0.0 ms  0.0 ms  0.0 ms  0.0 ms  0.0 ms
    2015-09-14 18:27:43,163 INFO  [SparkListenerBus] scheduler.StatsReportListener (Logging.scala:logInfo(59)) - remote bytes read:(count: 1, mean: 31.000000, stdev: 0.000000, max: 31.000000, min: 31.000000)
    2015-09-14 18:27:43,163 INFO  [SparkListenerBus] scheduler.StatsReportListener (Logging.scala:logInfo(59)) -    0%      5%      10%     25%     50%     75%     90%     95%     100%
    2015-09-14 18:27:43,163 INFO  [SparkListenerBus] scheduler.StatsReportListener (Logging.scala:logInfo(59)) -    31.0 B  31.0 B  31.0 B  31.0 B  31.0 B  31.0 B  31.0 B  31.0 B  31.0 B
    2015-09-14 18:27:43,163 INFO  [SparkListenerBus] scheduler.StatsReportListener (Logging.scala:logInfo(59)) - task result size:(count: 1, mean: 1228.000000, stdev: 0.000000, max: 1228.000000, min: 1228.000000)
    2015-09-14 18:27:43,163 INFO  [SparkListenerBus] scheduler.StatsReportListener (Logging.scala:logInfo(59)) -    0%      5%      10%     25%     50%     75%     90%     95%     100%
    2015-09-14 18:27:43,163 INFO  [SparkListenerBus] scheduler.StatsReportListener (Logging.scala:logInfo(59)) -    1228.0 B        1228.0 B        1228.0 B        1228.0 B        1228.0 B        1228.0 B    1228.0 B 1228.0 B        1228.0 B
    2015-09-14 18:27:43,164 INFO  [SparkListenerBus] scheduler.StatsReportListener (Logging.scala:logInfo(59)) - executor (non-fetch) time pct: (count: 1, mean: 69.834711, stdev: 0.000000, max: 69.834711, min: 69.834711)
    2015-09-14 18:27:43,164 INFO  [SparkListenerBus] scheduler.StatsReportListener (Logging.scala:logInfo(59)) -    0%      5%      10%     25%     50%     75%     90%     95%     100%
    2015-09-14 18:27:43,164 INFO  [SparkListenerBus] scheduler.StatsReportListener (Logging.scala:logInfo(59)) -    70 %    70 %    70 %    70 %    70 %    70 %    70 %    70 %    70 %
    2015-09-14 18:27:43,165 INFO  [SparkListenerBus] scheduler.StatsReportListener (Logging.scala:logInfo(59)) - fetch wait time pct: (count: 1, mean: 0.000000, stdev: 0.000000, max: 0.000000, min: 0.000000)
    2015-09-14 18:27:43,165 INFO  [SparkListenerBus] scheduler.StatsReportListener (Logging.scala:logInfo(59)) -    0%      5%      10%     25%     50%     75%     90%     95%     100%
    2015-09-14 18:27:43,165 INFO  [SparkListenerBus] scheduler.StatsReportListener (Logging.scala:logInfo(59)) -     0 %     0 %     0 %     0 %     0 %     0 %     0 %     0 %     0 %
    2015-09-14 18:27:43,166 INFO  [SparkListenerBus] scheduler.StatsReportListener (Logging.scala:logInfo(59)) - other time pct: (count: 1, mean: 30.165289, stdev: 0.000000, max: 30.165289, min: 30.165289)
    2015-09-14 18:27:43,166 INFO  [SparkListenerBus] scheduler.StatsReportListener (Logging.scala:logInfo(59)) -    0%      5%      10%     25%     50%     75%     90%     95%     100%
    2015-09-14 18:27:43,166 INFO  [SparkListenerBus] scheduler.StatsReportListener (Logging.scala:logInfo(59)) -    30 %    30 %    30 %    30 %    30 %    30 %    30 %    30 %    30 %
    1. yarn-cluster模式
    ./bin/spark-sql --master yarn-cluster  --jars /home/dp/spark/spark-1.4/spark-1.4.1/lib/mysql-connector-java-5.1.12.jar
    Error: Cluster deploy mode is not applicable to Spark SQL shell.
    Run with --help for usage help or --verbose for debug output
    2015-09-14 18:28:28,291 INFO  [Thread-0] util.Utils (Logging.scala:logInfo(59)) - Shutdown hook called
    
    Cluster deploy mode 不支持的

    启动 spark-shell

    1. standalone模式
    ./bin/spark-shell --master spark:master:7077 --jars /home/stark_summer/spark/spark-1.4/spark-1.4.1/lib/mysql-connector-java-5.1.12.jar
    1. yarn-client模式
    ./bin/spark-shell --master yarn-client   --jars /home/dp/spark/spark-1.4/spark-1.4.1/lib/mysql-connector-java-5.1.12.jar
    
    sqlContext.sql("from o2o_app SELECT count(appkey,name1,name2)").collect().foreach(println)
    

    尊重原创,拒绝转载,http://blog.csdn.net/stark_summer/article/details/48443147

    版权声明:本文为博主原创文章,未经博主允许不得转载。

  • 相关阅读:
    光线步进——RayMarching入门
    MATLAB GUI制作快速入门
    Python中用绘图库绘制一条蟒蛇
    node 常见的一些系统问题
    webpack 入门指南
    利用 gulp 来合并seajs 的项目
    移动端 解决自适应 和 多种dpr (device pixel ratio) 的 [淘宝] 解决方案 lib-flexible
    富有哲理的文章
    NodeJS 难点(网络,文件)的 核心 stream 四: writable
    Vue.js 源码学习笔记 -- 分析前准备2 -- Object.defineProperty
  • 原文地址:https://www.cnblogs.com/stark-summer/p/4829740.html
Copyright © 2011-2022 走看看