zoukankan      html  css  js  c++  java
  • Impala & Hive 使用复杂数据类型

    1. 环境

    CDH 5.16.1

    2. Hive 使用复杂数据类型

    2.1 数据格式

    1       zhangsan:man    football,basketball
    2       lisi:female     sing,dance
    

    2.2 Hive 建表

    create table studentInfo(
        id int,
        info map<string,string>  comment 'map<姓名,性别>',
        favorite array<string> comment 'array[football,basketball]'
    )
    row format delimited fields terminated by '	'    --列分隔符
    collection items terminated by ','   --array中各个item之间的分隔符
    map keys terminated 
    by ':'        --map中key和value之间的分隔符
    lines terminated by '
    ';       --行分隔符
    

    2.3 导入数据

    load data local inpath '/opt/module/jobs/student.txt' into table studentInfo;
    

    2.3 执行查询

    select *  from studentInfo;
    
    +-----------------+---------------------+----------------------------+--+
    | studentinfo.id  |  studentinfo.info   |    studentinfo.favorite    |
    +-----------------+---------------------+----------------------------+--+
    | 1               | {"zhangsan":"man"}  | ["football","basketball"]  |
    | 2               | {"lisi":"female"}   | ["sing","dance"]           |
    +-----------------+---------------------+----------------------------+--+
    
    
    
    
    -- 对于map查询,map[key]
    --对于array查询,array[index]
    select id, info['zhangsan'],favorite[1] from studentInfo;
    
    +-----+-------+-------------+--+
    | id  |  sex  |  favorite   |
    +-----+-------+-------------+--+
    | 1   | man   | basketball  |
    | 2   | NULL  | dance       |
    +-----+-------+-------------+--+
    

    3. Impala 使用复杂类型

    注意:Impala 只用parquet格式存储时,才能使用复杂数据类型

    3.1 Hive中建表(parquet格式,导入数据

    create table student_parquet(
        id int,
        info map<string,string>  comment 'map<姓名,性别>',
        favorite array<string> comment 'array[football,basketball]'
    )
    stored as parquet
    
    insert overwrite table student_parquet select id,info,favorite from studentInfo;
    

    3.2 刷新impala元数据

    refresh default.student_parquet;
    

    3.3 执行查询

    select 
        id ,favorite_array.item,info_map.key,info_map.value
    from student_parquet,
        student_parquet.info as info_map,
        student_parquet.favorite as favorite_array;
    
    +----+------------+----------+--------+
    | id | item       | key      | value  |
    +----+------------+----------+--------+
    | 1  | football   | zhangsan | man    |
    | 1  | basketball | zhangsan | man    |
    | 2  | sing       | lisi     | female |
    | 2  | dance      | lisi     | female |
    +----+------------+----------+--------+
    
    
    
    
    select 
        id ,favorite_array.item
    from student_parquet,
        student_parquet.info as info_map,
        student_parquet.favorite as favorite_array
    where favorite_array.POS = 0;
    
    +----+----------+
    | id | item     |
    +----+----------+
    | 1  | football |
    | 2  | sing     |
    +----+----------+
    
    
    
    
    select 
        id ,favorite_array.item,info_map.value
    from student_parquet,
        student_parquet.info as info_map,
        student_parquet.favorite as favorite_array
    where favorite_array.item = 'sing'
    and info_map.key = 'lisi';
    
    +----+------+--------+
    | id | item | value  |
    +----+------+--------+
    | 2  | sing | female |
    +----+------+--------+
    

    总结:

    1. array 类型视为 一张表, 其列名为 item

    2. map类型有两个列, 一个是key, 一个是value

    参考:

    1. https://blog.csdn.net/rav009/article/details/86750850
    2. https://docs.cloudera.com/documentation/enterprise/5-5-x/topics/impala_complex_types.html
  • 相关阅读:
    调试跳转动态打印
    PHP对redis操作详解
    SSL证书没有绿锁您与此网站建立的连接并非完全安全解决办法
    63. Unique Paths II
    62. Unique Paths
    40. Combination Sum II
    60. Permutation Sequence
    59. Spiral Matrix II
    批量修改名字的脚本
    57. Insert Interval
  • 原文地址:https://www.cnblogs.com/wuning/p/12293579.html
Copyright © 2011-2022 走看看