zoukankan      html  css  js  c++  java
  • The Ex CS Grad Student: Running HQL from Python without using the Hive Standalone Server

    The Ex CS Grad Student: Running HQL from Python without using the Hive Standalone Server

    Running HQL from Python without using the Hive Standalone Server

    To use a language other than Java (say python) with Hive, you must use the Hive Standalone Server. The main disadvantage of using the Hive Standalone Server is that it is currently single threaded [HIVE-80].  Additionally, there is the inconvenience of running an additional server.

     

    We can solve this problem by using Jython (and possibly JRuby).   Jython enables us to use Hive's Java client library to execute the HQL query and retrieve the results.  We can then process the results in pure python.



    Let us try it out:



    STEP 1:

    Download and install Jython.



    STEP 2:

    Make sure you have the following jars and directories in your CLASSPATH.
    • hive-service-0.6.0.jar
    • libfb303.jar
    • log4j-1.2.15.jar
    • antlr-runtime-3.0.1.jar derby.jar
    • jdo2-api-2.3-SNAPSHOT.jar
    • commons-logging-1.0.4.jar
    • datanucleus-core-1.1.2.jar
    • datanucleus-enhancer-1.1.2.jar
    • datanucleus-rdbms-1.1.2.jar
    • hive-exec-0.6.0.jar
    • hive-jdbc-0.6.0.jar
    • hive-metastore-0.6.0.jar
    • derby.jar
    • jdo2-api-2.3-SNAPSHOT.jar
    • commons-lang-2.4.jar
    • hadoopcore/hadoop-0.20.0/hadoop-0.20.0-core.jar
    • /usr/lib/hadoop-0.20/lib/mysql-connector-java-5.0.8-bin.jar
    • conf (this is your hive installation's build/dist/conf directory)

    Jar locations and versions may be different in your hive installation.



    STEP 3:

    Create a test data file /tmp/test.dat with the following lines

    1:one
    2:two
    3:three



    STEP 4:

    Run the following Jython script

    from java.lang import *
    from java.lang import *
    from java.sql import *
    
    driverName = "org.apache.hadoop.hive.jdbc.HiveDriver";
    
    try:
      Class.forName(driverName);
    except Exception, e:
      print "Unable to load %s" % driverName
      System.exit(1);
    
    conn = DriverManager.getConnection("jdbc:hive://");
    stmt = conn.createStatement();
    
    # Drop table
    #stmt.executeQuery("DROP TABLE testjython")
    
    # Create a table
    res = stmt.executeQuery("CREATE TABLE testjython (key int, value string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ':'")
    
    # Show tables
    res = stmt.executeQuery("SHOW TABLES")
    print "List of tables:"
    while res.next():
        print res.getString(1)
    
    # Load some data
    res = stmt.executeQuery("LOAD DATA LOCAL INPATH '/tmp/test.dat' INTO TABLE testjython")
    
    # SELECT the data
    res = stmt.executeQuery("SELECT * FROM testjython")
    print "Listing contents of table:"
    while res.next():
        print res.getInt(1), res.getString(2)
    



    You should see the following output, amidst a whole lot of debug statements:

    1 one

    2 two

    3 three

    No comments:

  • 相关阅读:
    Django Swagger接口文档生成
    基于docker快速搭建hbase集群
    Cassandra数据操作管理工具tableplus
    基于docker创建Cassandra集群
    基于docker快速搭建hive环境
    [20200623]应用报错:当前事务无法提交,而且无法支持写入日志文件的操作
    zabbix--监控 TCP 连接状态
    kubernetes 使用ceph实现动态持久卷存储
    MySQL备份脚本
    Linux Pam后门总结拓展
  • 原文地址:https://www.cnblogs.com/lexus/p/2701337.html
Copyright © 2011-2022 走看看