zoukankan      html  css  js  c++  java
  • SparkSQL使用之Thrift JDBC server

    Thrift JDBC Server描述

    Thrift JDBC Server使用的是HIVE0.12的HiveServer2实现。能够使用Spark或者hive0.12版本的beeline脚本与JDBC Server进行交互使用。Thrift JDBC Server默认监听端口是10000。

    使用Thrift JDBC Server前需要注意:

    1、将hive-site.xml配置文件拷贝到$SPARK_HOME/conf目录下;

    2、需要在$SPARK_HOME/conf/spark-env.sh中的SPARK_CLASSPATH添加jdbc驱动的jar包

    export SPARK_CLASSPATH=$SPARK_CLASSPATH:/home/hadoop/software/mysql-connector-java-5.1.27-bin.jar

    Thrift JDBC Server命令使用帮助:

    cd $SPARK_HOME/sbin
    start-thriftserver.sh --help
    Usage: ./sbin/start-thriftserver [options] [thrift server options]
    Spark assembly has been built with Hive, including Datanucleus jars on classpath
    Options:
      --master MASTER_URL         spark://host:port, mesos://host:port, yarn, or local.
      --deploy-mode DEPLOY_MODE   Whether to launch the driver program locally ("client") or
                                  on one of the worker machines inside the cluster ("cluster")
                                  (Default: client).
      --class CLASS_NAME          Your application's main class (for Java / Scala apps).
      --name NAME                 A name of your application.
      --jars JARS                 Comma-separated list of local jars to include on the driver
                                  and executor classpaths.
      --py-files PY_FILES         Comma-separated list of .zip, .egg, or .py files to place
                                  on the PYTHONPATH for Python apps.
      --files FILES               Comma-separated list of files to be placed in the working
                                  directory of each executor.
    
      --conf PROP=VALUE           Arbitrary Spark configuration property.
      --properties-file FILE      Path to a file from which to load extra properties. If not
                                  specified, this will look for conf/spark-defaults.conf.
    
      --driver-memory MEM         Memory for driver (e.g. 1000M, 2G) (Default: 512M).
      --driver-java-options       Extra Java options to pass to the driver.
      --driver-library-path       Extra library path entries to pass to the driver.
      --driver-class-path         Extra class path entries to pass to the driver. Note that
                                  jars added with --jars are automatically included in the
                                  classpath.
    
      --executor-memory MEM       Memory per executor (e.g. 1000M, 2G) (Default: 1G).
    
      --help, -h                  Show this help message and exit
      --verbose, -v               Print additional debug output
    
     Spark standalone with cluster deploy mode only:
      --driver-cores NUM          Cores for driver (Default: 1).
      --supervise                 If given, restarts the driver on failure.
    
     Spark standalone and Mesos only:
      --total-executor-cores NUM  Total cores for all executors.
    
     YARN-only:
      --executor-cores NUM        Number of cores per executor (Default: 1).
      --queue QUEUE_NAME          The YARN queue to submit to (Default: "default").
      --num-executors NUM         Number of executors to launch (Default: 2).
      --archives ARCHIVES         Comma separated list of archives to be extracted into the
                                  working directory of each executor.
    
    Thrift server options:
        --hiveconf <property=value>   Use value for given property

    master的描述与Spark SQL CLI一致 

    beeline命令使用帮助:

    cd $SPARK_HOME/bin
    beeline --help
    Usage: java org.apache.hive.cli.beeline.BeeLine 
       -u <database url>               the JDBC URL to connect to
       -n <username>                   the username to connect as
       -p <password>                   the password to connect as
       -d <driver class>               the driver class to use
       -e <query>                      query that should be executed
       -f <file>                       script file that should be executed
       --color=[true/false]            control whether color is used for display
       --showHeader=[true/false]       show column names in query results
       --headerInterval=ROWS;          the interval between which heades are displayed
       --fastConnect=[true/false]      skip building table/column list for tab-completion
       --autoCommit=[true/false]       enable/disable automatic transaction commit
       --verbose=[true/false]          show verbose error messages and debug info
       --showWarnings=[true/false]     display connection warnings
       --showNestedErrs=[true/false]   display nested errors
       --numberFormat=[pattern]        format numbers using DecimalFormat pattern
       --force=[true/false]            continue running script even after errors
       --maxWidth=MAXWIDTH             the maximum width of the terminal
       --maxColumnWidth=MAXCOLWIDTH    the maximum width to use when displaying columns
       --silent=[true/false]           be more silent
       --autosave=[true/false]         automatically save preferences
       --outputformat=[table/vertical/csv/tsv]   format mode for result display
       --isolation=LEVEL               set the transaction isolation level
       --help                          display this message

    Thrift JDBC Server/beeline启动

    启动Thrift JDBC Server:默认端口是10000

    cd $SPARK_HOME/sbin
    start-thriftserver.sh

    如何修改Thrift JDBC Server的默认监听端口号?借助于--hiveconf

    start-thriftserver.sh  --hiveconf hive.server2.thrift.port=14000

    HiveServer2 Clients 详情参见:https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients

    启动beeline

    cd $SPARK_HOME/bin
    beeline -u jdbc:hive2://hadoop000:10000/default -n hadoop

    sql脚本测试

    SELECT track_time, url, session_id, referer, ip, end_user_id, city_id FROM page_views WHERE city_id = -1000 limit 10;
    SELECT session_id, count(*) c FROM page_views group by session_id order by c desc limit 10;
  • 相关阅读:
    CF 1119 题解
    CF 582 题解
    CF 1098 题解
    CF 1129 题解
    CF 513 题解
    CF 417 D 题解
    ingress nginx遇到502错误,connect() failed (113 Host is unreachable) while connecting to upstream
    MySQL性能剖析
    MySQL的基准测试
    MySQL架构与历史
  • 原文地址:https://www.cnblogs.com/luogankun/p/3970006.html
Copyright © 2011-2022 走看看