zoukankan      html  css  js  c++  java
  • 【Hive学习之六】Hive Lateral View &视图&索引

    环境
      虚拟机:VMware 10
      Linux版本:CentOS-6.5-x86_64
      客户端:Xshell4
      FTP:Xftp4
      jdk8
      hadoop-3.1.1
      apache-hive-3.1.1

    一、Hive Lateral View
    Lateral View用于和UDTF函数(explode、split)结合来使用。
    首先通过UDTF函数拆分成多行,再将多行结果组合成一个支持别名的虚拟表。
    主要解决在select使用UDTF做查询过程中,查询只能包含单个UDTF,不能包含其他字段、以及多个UDTF的问题

    语法:
    LATERAL VIEW udtf(expression) tableAlias AS columnAlias (',' columnAlias)

    举例:统计人员表中共有多少种爱好、多少个城市?

    hive> select * from psn2;
    OK
    psn2.id    psn2.name    psn2.likes    psn2.address    psn2.age
    1    小明1    ["lol","book","movie"]    {"beijing":"shangxuetang","shanghai":"pudong"}    10
    2    小明2    ["lol","book","movie"]    {"beijing":"shangxuetang","shanghai":"pudong"}    10
    3    小明3    ["lol","book","movie"]    {"beijing":"shangxuetang","shanghai":"pudong"}    10
    4    小明4    ["lol","book","movie"]    {"beijing":"shangxuetang","shanghai":"pudong"}    10
    5    小明5    ["lol","book","movie"]    {"beijing":"shangxuetang","shanghai":"pudong"}    10
    6    小明6    ["lol","book","movie"]    {"beijing":"shangxuetang","shanghai":"pudong"}    10
    1    小明1    ["lol","book","movie"]    {"beijing":"shangxuetang","shanghai":"pudong"}    20
    2    小明2    ["lol","book","movie"]    {"beijing":"shangxuetang","shanghai":"pudong"}    20
    3    小明3    ["lol","book","movie"]    {"beijing":"shangxuetang","shanghai":"pudong"}    20
    4    小明4    ["lol","book","movie"]    {"beijing":"shangxuetang","shanghai":"pudong"}    20
    5    小明5    ["lol","book","movie"]    {"beijing":"shangxuetang","shanghai":"pudong"}    20
    6    小明6    ["lol","book","movie"]    {"beijing":"shangxuetang","shanghai":"pudong"}    20
    Time taken: 0.138 seconds, Fetched: 12 row(s)
    hive> select explode(likes) from psn2;
    OK
    col
    lol
    book
    movie
    lol
    book
    movie
    lol
    book
    movie
    lol
    book
    movie
    lol
    book
    movie
    lol
    book
    movie
    lol
    book
    movie
    lol
    book
    movie
    lol
    book
    movie
    lol
    book
    movie
    lol
    book
    movie
    lol
    book
    movie
    Time taken: 0.294 seconds, Fetched: 36 row(s)
    hive> select count(distinct(myCol1)), count(distinct(myCol2)) from psn2 
    > LATERAL VIEW explode(likes) myTable1 AS myCol1 
    > LATERAL VIEW explode(address) myTable2 AS myCol2, myCol3;
    Query ID = root_20190216171853_af297af9-dcc6-4e1e-8674-fa0969727b23
    Total jobs = 1
    Launching Job 1 out of 1
    Number of reduce tasks determined at compile time: 1
    In order to change the average load for a reducer (in bytes):
    set hive.exec.reducers.bytes.per.reducer=<number>
    In order to limit the maximum number of reducers:
    set hive.exec.reducers.max=<number>
    In order to set a constant number of reducers:
    set mapreduce.job.reduces=<number>
    Starting Job = job_1548397153910_0012, Tracking URL = http://PCS102:8088/proxy/application_1548397153910_0012/
    Kill Command = /usr/local/hadoop-3.1.1/bin/mapred job -kill job_1548397153910_0012
    Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
    2019-02-16 17:19:00,480 Stage-1 map = 0%, reduce = 0%
    2019-02-16 17:19:04,582 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 4.08 sec
    2019-02-16 17:19:09,693 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 7.24 sec
    MapReduce Total cumulative CPU time: 7 seconds 240 msec
    Ended Job = job_1548397153910_0012
    MapReduce Jobs Launched: 
    Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 7.24 sec HDFS Read: 15860 HDFS Write: 103 SUCCESS
    Total MapReduce CPU Time Spent: 7 seconds 240 msec
    OK
    _c0    _c1
    3    2
    Time taken: 16.894 seconds, Fetched: 1 row(s)
    hive>

    二、hive View视图
    和关系型数据库中的普通视图一样,hive也支持视图
    特点:
      不支持物化视图(oracle支持)
      只能查询,不能做加载数据操作
      视图的创建,只是保存一份元数据,查询视图时才执行对应的子查询
      view定义中若包含了ORDER BY/LIMIT语句,当查询视图时也进行ORDER BY/LIMIT语句操作,view当中定义的优先级更高
      view支持迭代视图

    View语法
    创建视图:

    CREATE VIEW [IF NOT EXISTS] [db_name.]view_name 
    [(column_name [COMMENT column_comment], ...) ]
    [COMMENT view_comment]
    [TBLPROPERTIES (property_name = property_value, ...)]
    AS SELECT ... ;

    举例:注意 视图在HDFS下不存在文件  

    hive> create view v_psn2 as select id,name from psn2;
    OK
    id    name
    Time taken: 0.127 seconds
    hive> show tables;
    OK
    tab_name
    cell_drop_monitor
    cell_monitor
    docs
    logtbl
    person
    person3
    psn2
    psn21
    psn22
    psn3
    psn31
    psn4
    psnbucket
    student
    test01
    v_psn2
    wc
    Time taken: 0.02 seconds, Fetched: 17 row(s)
    hive> select * from v_psn2;
    OK
    v_psn2.id    v_psn2.name
    1    小明1
    2    小明2
    3    小明3
    4    小明4
    5    小明5
    6    小明6
    1    小明1
    2    小明2
    3    小明3
    4    小明4
    5    小明5
    6    小明6
    Time taken: 0.11 seconds, Fetched: 12 row(s)
    hive> drop view v_psn2;
    OK
    Time taken: 0.08 seconds
    hive> select * from v_psn2;
    FAILED: SemanticException [Error 10001]: Line 1:14 Table not found 'v_psn2'
    hive> 

    三、Hive 索引

     目的:优化查询以及检索性能

    给表psn2创建索引:
    create index t1_index on table psn2(name)
    as 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler' with deferred rebuild
    in table t1_index_table;

    as:指定索引器;
    in table:指定索引表,若不指定默认生成在default__psn2_t1_index__表中

    create index t1_index on table psn2(name)
    as 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler' with deferred rebuild;

    查询索引
    show index on psn2;

    重建索引(建立索引之后必须重建索引才能生效)
    ALTER INDEX t1_index ON psn2 REBUILD;

    删除索引
    DROP INDEX IF EXISTS t1_index ON psn2;

  • 相关阅读:
    windows10安装vmware14教程
    MySQL变量的使用
    软考和软件设计师
    Java web加密之将应用从http换成https的方法
    cmd命令net和sc
    cmd命令 taskkill
    CSS系列:CSS的继承与层叠特性
    CSS系列:CSS选择器
    CSS系列:在HTML中引入CSS的方法
    Sql Server系列:索引维护
  • 原文地址:https://www.cnblogs.com/cac2020/p/10388485.html
Copyright © 2011-2022 走看看