zoukankan      html  css  js  c++  java
  • Drill —— Querying Hive Using Apache Drill

    https://acadgild.com/blog/querying-hive-using-apache-drill

    Apache Drill is an open source software framework which has been derived from Google’s Dremel System available as an infrastructure service called Google BigQuery.

     

    The specilaity of the Drill is to scale up to 10,000 servers or more and to be able to process petabytes of data and trillions of records in seconds.

     

    Drill has been popularly used as a SQL query engine for Big Data exploration.

    It is designed to support high-performance analysis on the semi-structured and rapidly evolving data coming from modern Big Data applications and the same time it provides the familiarity and ecosystem of ANSI SQL, the industry-standard query language.

     

    Drill provides plug-and-play integration with existing Apache Hive and Apache HBase deployments.

    In this post, we will be discussing how to install apache Drill in embedded mode and how to configure Hive storage plug-in for Drill to query on Hive tables.

     

    Installing Apache Drill

    The first step to do is to download Apache Drill. You can do this using the below link:

    www.apache.org/dyn/closer.lua?filename=drill/drill-1.6.0/apache-drill-1.6.0.tar.gz&action=download

    After downloading, move to the file in the downloaded folder, and type the command

     

    tar -xvzf apache-drill-1.6.0.tar.gz

    Installing apache drill

    After the extraction of the tar file, you will be able to see apache-drill-1.6.0.

    That’s it! Drill has been installed in your single node system.

     

    Next, let’s configure Drill to query on Hive tables.

    Note: Hive should be pre-installed in your cluster

    Now, type the command hive –service metastore in one terminal and open another terminal and start Drill. To do this, move into the Drill installed folder and go into the bin folder and then type the command ./drill-embedded

    starting drill

    Setting Hive storage plugin for Drill:

    Next, open your browser and type localhost:8047 to enter into Drill’s Web UI.

    Now, click on ‘Storage’ and enable Hive.


    storage plugins in drill

    Now, click on ‘Update’ and do the changes as specified. By default, the configuration properties will look like as shown below:

     {
           type:"hive",
           enabled: true,
           configProps : {
             "hive.metastore.uris" : "thrift://<metastore_host>:<port>",
             "fs.default.name" : "hdfs://<host>:<port>/",
             "hive.metastore.sasl.enabled" : "false",
             "hive.server2.enable.doAs" : "true",
             "hive.metastore.execute.setugi" : "true"
           }
          }

    Next, in the Hive.metastore.uris, enter the thrift host name and port number. In our case, host name is localhost and the port number is default for all it is 9083.

    In fs.default.name, give the HDFS complete path; in our case, it is hdfs://localhost:9000

     

    Finally, the configuration properties will look as shown below:

    {
    "type": "hive",
    "enabled": true,
    "configProps": {
    "hive.metastore.uris": "thrift://localhost:9083",
    "javax.jdo.option.ConnectionURL": "jdbc:derby:;databaseName=../sample-data/drill_hive_db;create=true",
    "hive.metastore.warehouse.dir": "/tmp/drill_hive_wh",
    "fs.default.name": "hdfs://localhost:9000",
    "hive.metastore.sasl.enabled": "false"
    }
    }

    For performing simple operations on the tables in Hive, we need to create one table in Hive. Let’s create one table in Hive, now.

    create table olympic(athelete STRING,age INT,country STRING,year STRING,closing STRING,sport STRING,gold INT,silver INT,bronze INT,total INT) row format delimited fields terminated by '	' stored as textfile;

    Here, we are creating a table with name “olympic” and the schema of the table is as specified above. The data inside the above input file is delimited by tab space. As explained earlier, the file format is specified as TEXTFILE at the end. The schema of the table created above can be checked using describe olympic;

    We can load data into the created table as:

    load data local inpath ‘path of your file’ into table olympic;

     

    The same is shown in the figure below:

    creating olympic table

    We have successfully loaded our input file data into our table in TEXTFILE format.

     

    Now, let’s perform one basic SELECT operation on the data as shown below:

    Select athlete from olympic;
     

    The data retrieved is as shown in the below image:

     

    Now, go to the terminal where Apache Drill is running; here type the command use hive to make Drill use Hive schema.

    using hive storage in drill
    Now, you will be able to perform all the Hive queries using Drill. First, let’s check the table which we have created in Hive through Drill.

    describing hive table in drill
    You can see the schema of the table in Hive using Drill.

     

    Now, let us perform some queries on the data. Let’s find out the average age of the athletes who has participated in Olympics.

    Command: select AVG(age) from olympic

    hive query on drill

    You can see that the result has been displayed on the screen. The average age of the athletes is 26.405433646812956

     

    Hope this post has been helpful in understanding the concept of querying Hive table using Apache Drill. In case of any questions, feel free to comment below and we will get back to you at the earliest.

    Keep visiting our site www.acadgild.com for more updates on Big Data and other technologies.

     

  • 相关阅读:
    Trie
    [转]Tarjan应用:求割点/桥/缩点/强连通分量/双连通分量/LCA(最近公共祖先)
    [Bzoj2242]常见数值算法
    JavaScript百炼成仙(记录笔记)
    tcpreplay重放报文,tcpdump能抓到包,应用程序收不到包
    用jquery tools来实现选项卡
    ExtNet第二篇
    线性表最简单的表示
    c语言中线性表基本操作方法
    ExtNet第一篇
  • 原文地址:https://www.cnblogs.com/panpanwelcome/p/13435067.html
Copyright © 2011-2022 走看看