zoukankan      html  css  js  c++  java
  • 第二章 impala基础使用

    第二章 impala基本使用

    1、impala的使用

    1.1、impala-shell语法

    1.1.1、impala-shell的外部命令参数语法

    不需要进入到impala-shell交互命令行当中即可执行的命令参数

    impala-shell后面执行的时候可以带很多参数:

    -h 查看帮助文档

    impala-shell -h
    
    [root@node03 hive-1.1.0-cdh5.14.0]# impala-shell -h
    Usage: impala_shell.py [options]
    
    Options:
      -h, --help            show this help message and exit
      -i IMPALAD, --impalad=IMPALAD
                            <host:port> of impalad to connect to
                            [default: node03.hadoop.com:21000]
      -q QUERY, --query=QUERY
                            Execute a query without the shell [default: none]
      -f QUERY_FILE, --query_file=QUERY_FILE
                            Execute the queries in the query file, delimited by ;.
                            If the argument to -f is "-", then queries are read
                            from stdin and terminated with ctrl-d. [default: none]
      -k, --kerberos        Connect to a kerberized impalad [default: False]
      -o OUTPUT_FILE, --output_file=OUTPUT_FILE
                            If set, query results are written to the g
    
    

    -r 刷新整个元数据,数据量大的时候,比较消耗服务器性能

    impala-shell -r
    
    #结果
    [root@node03 hive-1.1.0-cdh5.14.0]# impala-shell -r
    Starting Impala Shell without Kerberos authentication
    Connected to node03.hadoop.com:21000
    Server version: impalad version 2.11.0-cdh5.14.0 RELEASE (build d68206561bce6b26762d62c01a78e6cd27aa7690)
    Invalidating Metadata
    ***********************************************************************************
    Welcome to the Impala shell.
    (Impala Shell v2.11.0-cdh5.14.0 (d682065) built on Sat Jan  6 13:27:16 PST 2018)
    
    The HISTORY command lists all shell commands in chronological order.
    ***********************************************************************************
    +==========================================================================+
    | DEPRECATION WARNING:                                                     |
    | -r/--refresh_after_connect is deprecated and will be removed in a future |
    | version of Impala shell.                                                 |
    +==========================================================================+
    Query: invalidate metadata
    Query submitted at: 2019-08-22 14:45:28 (Coordinator: http://node03.hadoop.com:25000)
    Query progress can be monitored at: http://node03.hadoop.com:25000/query_plan?query_id=ce4db858e1dfd774:814fabac00000000
    Fetched 0 row(s) in 5.04s
    
    

    -B 去格式化,查询大量数据可以提高性能
    --print_header 去格式化显示列名
    --output_delimiter 指定分隔符
    -v 查看对应版本

    impala-shell -v -V
    
    #结果
    [root@node03 hive-1.1.0-cdh5.14.0]# impala-shell -v -V
    Impala Shell v2.11.0-cdh5.14.0 (d682065) built on Sat Jan  6 13:27:16 PST 2018
    

    -f 执行查询文件
    --query_file 指定查询文件

    cd /export/servers
    vim impala-shell.sql
    
    #写入下面两段话
    use weblog;
    select * from ods_click_pageviews limit 10;
    
    #赋予可执行权限
    chmod 755 imapala-shell.sql 
    
    #通过-f 参数来执行执行的查询文件
    impala-shell -f impala-shell.sql
    
    #结果
    [root@node03 hivedatas]# impala-shell -f imapala-shell.sql 
    Starting Impala Shell without Kerberos authentication
    Connected to node03.hadoop.com:21000
    Server version: impalad version 2.11.0-cdh5.14.0 RELEASE (build d68206561bce6b26762d62c01a78e6cd27aa7690)
    Query: use hivesql
    Query: select * from ods_click_pageviews limit 10
    Query submitted at: 2019-08-22 15:29:54 (Coordinator: http://node03.hadoop.com:25000)
    Query progress can be monitored at: http://node03.hadoop.com:25000/query_plan?query_id=6a4d51930cf99b9d:21f02c4e00000000
    +--------------------------------------+-----------------+-------------+---------------------+----------------------------+------------+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------+--------+----------+
    | session                              | remote_addr     | remote_user | time_local          | request                    | visit_step | page_staylong | http_referer                                                                                                                                                                                                                                                                                                                    | http_user_agent                                                                                                                                                                                   | body_bytes_sent | status | datestr  |
    +--------------------------------------+-----------------+-------------+---------------------+----------------------------+------------+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------+--------+----------+
    | d1328698-d475-4973-86ee-15ad9da8c860 | 1.80.249.223    | -           | 2013-09-18 07:57:33 | /hadoop-hive-intro/        | 1          | 60            | "http://www.google.com.hk/url?sa=t&rct=j&q=hive%E7%9A%84%E5%AE%89%E8%A3%85&source=web&cd=2&ved=0CC4QFjAB&url=%68%74%74%70%3a%2f%2f%62%6c%6f%67%2e%66%65%6e%73%2e%6d%65%2f%68%61%64%6f%6f%70%2d%68%69%76%65%2d%69%6e%74%72%6f%2f&ei=5lw5Uo-2NpGZiQfCwoG4BA&usg=AFQjCNF8EFxPuCMrm7CvqVgzcBUzrJZStQ&bvm=bv.52164340,d.aGc&cad=rjt" | "Mozilla/5.0(WindowsNT5.2;rv:23.0)Gecko/20100101Firefox/23.0"                                                                                                                                     | 14764           | 200    | 20130918 |
    | 0370aa09-ebd6-4d31-b6a5-469050a7fe61 | 101.226.167.201 | -           | 2013-09-18 09:30:36 | /hadoop-mahout-roadmap/    | 1          | 60            | "http://blog.fens.me/hadoop-mahout-roadmap/"                    
    

    -i 连接到impalad

    ​ --impalad 指定impalad去执行任务

    -o 保存执行结果到文件当中去

    ​ --output_file 指定输出文件名

    impala-shell -f impala-shell.sql -o fizz.txt
    
    #结果
    [root@node03 hivedatas]# impala-shell -f imapala-shell.sql -o fizz.txt
    Starting Impala Shell without Kerberos authentication
    Connected to node03.hadoop.com:21000
    Server version: impalad version 2.11.0-cdh5.14.0 RELEASE (build d68206561bce6b26762d62c01a78e6cd27aa7690)
    Query: use hivesql
    Query: select * from ods_click_pageviews limit 10
    Query submitted at: 2019-08-22 15:31:45 (Coordinator: http://node03.hadoop.com:25000)
    Query progress can be monitored at: http://node03.hadoop.com:25000/query_plan?query_id=7c421ab5d208f3b1:dec5a09300000000
    Fetched 10 row(s) in 0.13s
    
    #当前文件夹多了一个 fizz.txt 文件
    [root@node03 hivedatas]# ll
    total 2592
    -rw-r--r-- 1 root root     511 Aug 21  2017 dim_time_dat.txt
    -rw-r--r-- 1 root root    9926 Aug 22 15:31 fizz.txt
    -rwxr-xr-x 1 root root      57 Aug 22 15:29 imapala-shell.sql
    -rwxrwxrwx 1 root root     133 Aug 20 00:36 movie.txt
    -rw-r--r-- 1 root root   18372 Jun 17 18:33 pageview2
    -rwxr-xr-x 1 root root     154 Aug 20 00:32 test.txt
    -rw-r--r-- 1 root root     327 Aug 20 02:37 user_table
    -rw-r--r-- 1 root root   10361 Jun 18 09:00 visit2
    -rw-r--r-- 1 root root 2587511 Jun 17 18:05 weblog2
    
    

    -p 显示查询计划

    impala-shell -f impala-shell.sql -p
    

    -q 执行片段sql语句

    impala-shell -q "use hivesql;select * from ods_click_pageviews limit 10;"
    
    [root@node03 hivedatas]# impala-shell -q "use hivesql;select * from ods_click_pageviews limit 10;"
    Starting Impala Shell without Kerberos authentication
    Connected to node03.hadoop.com:21000
    Server version: impalad version 2.11.0-cdh5.14.0 RELEASE (build d68206561bce6b26762d62c01a78e6cd27aa7690)
    Query: use hivesql
    Query: select * from ods_click_pageviews limit 10
    Query submitted at: 2019-08-22 15:36:58 (Coordinator: http://node03.hadoop.com:25000)
    Query progress can be monitored at: http://node03.hadoop.com:25000/query_plan?query_id=b443d56565419f60:a149235700000000
    +--------------------------------------+-----------------+-------------+---------------------+----------------------------+------------+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------+--------+----------+
    | session                              | remote_addr     | remote_user | time_local          | request                    | visit_step | page_staylong | http_referer                                                                                                                                                                                                                                                                                                                    | http_user_agent                                                                                                                                                                                   | body_bytes_sent | status | datestr  |
    
    

    1.1.2、impala-shell的内部命令行参数语法

    进入impala-shell命令行之后可以执行的语法

    进入impala-shell:

    impala-shell  #任意目录
    
    #结果
    [root@node03 hivedatas]# impala-shell
    Starting Impala Shell without Kerberos authentication
    Connected to node03.hadoop.com:21000
    Server version: impalad version 2.11.0-cdh5.14.0 RELEASE (build d68206561bce6b26762d62c01a78e6cd27aa7690)
    ***********************************************************************************
    Welcome to the Impala shell.
    (Impala Shell v2.11.0-cdh5.14.0 (d682065) built on Sat Jan  6 13:27:16 PST 2018)
    
    To see more tips, run the TIP command.
    ***********************************************************************************
    [node03.hadoop.com:21000] > 
    
    

    help命令

    帮助文档

    [node03.hadoop.com:21000] > help;
    
    Documented commands (type help <topic>):
    ========================================
    compute  describe  explain  profile  rerun   set    show  unset  values   with
    connect  exit      history  quit     select  shell  tip   use    version
    
    Undocumented commands:
    ======================
    alter   delete  drop  insert  source  summary  upsert
    create  desc    help  load    src     update 
    
    

    connect命令

    connect hostname 连接到某一台机器上面去执行

    connect node02;
    
    #结果
    [node03.hadoop.com:21000] > connect node02;
    Connected to node02:21000
    Server version: impalad version 2.11.0-cdh5.14.0 RELEASE (build d68206561bce6b26762d62c01a78e6cd27aa7690)
    [node02:21000] > 
    

    refresh命令

    refresh dbname.tablename 增量刷新,刷新某一张表的元数据,主要用于刷新hive当中数据表里面的数据改变的情况

    用于刷新hive当中数据表里面的数据改变的情况

    refresh movie_info;
    
    #结果
    [node03:21000] > refresh movie_info;
    Query: refresh movie_info
    Query submitted at: 2019-08-22 15:49:24 (Coordinator: http://node03.hadoop.com:25000)
    Query progress can be monitored at: http://node03.hadoop.com:25000/query_plan?query_id=f74330d533ff2402:27364f7600000000
    Fetched 0 row(s) in 0.27s
    

    invalidate metadata 命令:

    invalidate metadata全量刷新,性能消耗较大,主要用于hive当中新建数据库或者数据库表的时候来进行刷新

    invalidate metadata;
    
    #结果
    [node03:21000] > invalidate metadata;
    Query: invalidate metadata
    Query submitted at: 2019-08-22 15:48:04 (Coordinator: http://node03.hadoop.com:25000)
    Query progress can be monitored at: http://node03.hadoop.com:25000/query_plan?query_id=6a431748d41bc369:7eeb053400000000
    Fetched 0 row(s) in 2.87s
    

    explain 命令:

    用于查看sql语句的执行计划

    explain select * from stu;
    
    #结果
    [node03:21000] > explain select * from user_table;
    Query: explain select * from user_table
    +------------------------------------------------------------------------------------+
    | Explain String                                                                     |
    +------------------------------------------------------------------------------------+
    | Max Per-Host Resource Reservation: Memory=0B                                       |
    | Per-Host Resource Estimates: Memory=32.00MB                                        |
    | WARNING: The following tables are missing relevant table and/or column statistics. |
    | hivesql.user_table                                                                 |
    |                                                                                    |
    | PLAN-ROOT SINK                                                                     |
    | |                                                                                  |
    | 01:EXCHANGE [UNPARTITIONED]                                                        |
    | |                                                                                  |
    | 00:SCAN HDFS [hivesql.user_table]                                                  |
    |    partitions=1/1 files=1 size=327B                                                |
    +------------------------------------------------------------------------------------+
    Fetched 11 row(s) in 3.99s
    

    explain的值可以设置成0,1,2,3等几个值,其中3级别是最高的,可以打印出最全的信息

    set explain_level=3;
    
    #结果
    [node03:21000] > set explain_level=3;
    EXPLAIN_LEVEL set to 3
    [node03:21000] > 
    

    profile命令:

    执行sql语句之后执行,可以打印出更加详细的执行步骤,

    主要用于查询结果的查看,集群的调优等

    select * from user_table;
    profile;
    
    #部分结果截取
    [node03:21000] > profile;
    Query Runtime Profile:
    Query (id=ff4799938b710fbb:7997836800000000):
      Summary:
        Session ID: a14d3b3894050309:7f300ddf8dcd8584
        Session Type: BEESWAX
        Start Time: 2019-08-22 15:58:22.786612000
        End Time: 2019-08-22 15:58:24.558806000
        Query Type: QUERY
        Query State: FINISHED
        Query Status: OK
        Impala Version: impalad version 2.11.0-cdh5.14.0 RELEASE (build d68206561bce6b26762d62c01a78e6cd27aa7690)
        User: root
        Connected User: root
        Delegated User: 
        Network Address: ::ffff:192.168.52.120:48318
        Default Db: hivesql
        Sql Statement: select * from user_table
        Coordinator: node03.hadoop.com:22000
        Query Options (set by configuration): EXPLAIN_LEVEL=3
        Query Options (set by configuration and planner): EXPLAIN_LEVEL=3,MT_DOP=0
        Plan: 
    
    

    注意:在hive窗口当中插入的数据或者新建的数据库或者数据库表,在impala当中是不可直接查询到的,需要刷新数据库,在impala-shell当中插入的数据,在impala当中是可以直接查询到的,不需要刷新数据库,其中使用的就是catalog这个服务的功能实现的,catalog是impala1.2版本之后增加的模块功能,主要作用就是同步impala之间的元数据

    1.2、创建数据库

    1.1.1进入impala交互窗口

    impala-shell #进入到impala的交互窗口
    

    1.1.2查看所有数据库

    show databases;
    

    1.1.3创建与删除数据库

    创建数据库

    CREATE DATABASE IF NOT EXISTS mydb1;
    drop database  if exists  mydb;
    

    1.3、 创建数据库表

    创建student表

    CREATE TABLE IF NOT EXISTS mydb1.student (name STRING, age INT, contact INT );
    

    创建employ表

    create table employee (Id INT, name STRING, age INT,address STRING, salary BIGINT);
    

    1.3.1、 数据库表中插入数据

    insert into employee (ID,NAME,AGE,ADDRESS,SALARY)VALUES (1, 'Ramesh', 32, 'Ahmedabad', 20000 );
    insert into employee values (2, 'Khilan', 25, 'Delhi', 15000 );
    Insert into employee values (3, 'kaushik', 23, 'Kota', 30000 );
    Insert into employee values (4, 'Chaitali', 25, 'Mumbai', 35000 );
    Insert into employee values (5, 'Hardik', 27, 'Bhopal', 40000 );
    Insert into employee values (6, 'Komal', 22, 'MP', 32000 );
    

    数据的覆盖

    Insert overwrite employee values (1, 'Ram', 26, 'Vishakhapatnam', 37000 );
    

    执行覆盖之后,表中只剩下了这一条数据了

    另外一种建表语句

    create table customer as select * from employee;
    

    1.3.2、 数据的查询

    select * from employee;
    select name,age from employee; 
    

    1.3.3、 删除表

    DROP table  mydb1.employee;
    

    1.3.4、 清空表数据

    truncate  employee;
    

    1.3.5、 创建视图

    CREATE VIEW IF NOT EXISTS employee_view AS select name, age from employee;
    

    1.3.6、 查看视图数据

    select * from employee_view;
    

    1.4、order by语句

    基础语法

    select * from table_name ORDER BY col_name [ASC|DESC] [NULLS FIRST|NULLS LAST]
    Select * from employee ORDER BY id asc;
    

    1.5、group by 语句

    Select name, sum(salary) from employee Group BY name; 
    

    1.6、 having 语句

    基础语法

    select * from table_name ORDER BY col_name [ASC|DESC] [NULLS FIRST|NULLS LAST]
    

    按年龄对表进行分组,并选择每个组的最大工资,并显示大于20000的工资

    select max(salary) from employee group by age having max(salary) > 20000
    

    1.7、 limit语句

    select * from employee order by id limit 4;
    

    2、impala当中的数据表导入几种方式

    第一种方式,通过load hdfs的数据到impala当中去

    create table user(id int ,name string,age int ) row format delimited fields terminated by "	";
    

    准备数据user.txt并上传到hdfs的 /user/impala路径下去

    上传user.txt到hadoop上去:

    hdfs dfs -put user.txt /user/impala/
    

    查看是否上传成功:

    hdfs dfs -ls /user/impala
    
    1       kasha   15
    2       fizz        20
    3       pheonux    30
    4       manzi  50
    

    加载数据

    load data inpath '/user/impala/' into table user;
    

    查询加载的数据

    select  *  from  user;
    

    如果查询不不到数据,那么需要刷新一遍数据表

    refresh  user;
    

    第二种方式:

    create  table  user2   as   select * from  user;
    

    第三种方式:

    insert  into  #不推荐使用 因为会产生大量的小文件
    

    千万不要把impala当做一个数据库来使用

    第四种方式:

    insert  into  select  #用的比较多
    
  • 相关阅读:
    [原]Google的小Bug
    [原]安装Oracle 11g R2 遇到的两个小问题及解决方法
    [原]关于数据库是否使用索引的讨论,我想说的
    [原]在新服务器中找到了上个世纪的产物
    Oracle db_block_checking和db_block_checksum 两个参数区别
    [原]16路的PC服务器
    [原]第一次遭遇Oracle的Bug,纪念一下 |ORA00600 kmgs_pre_process_request_6|
    Oracle 隐含参数的查询
    [原]nginx折腾记(HTTP性能能测试,与Apache对比)
    [原]Oracle Control File 意外情况研究
  • 原文地址:https://www.cnblogs.com/-xiaoyu-/p/11186672.html
Copyright © 2011-2022 走看看