zoukankan      html  css  js  c++  java
  • How to use lateral view explode() function in Hive (HIve 中的explode()函数使用)

    In Hive, we can create tables which has the MAP structure inside, like:

    1 create table test (
    2     item MAP<STRING, STRING>
    3 );

    and sometimes we want to iterate all the items inside the MAP as key-value pairs.
    Hive offered such function called explode():

    explode() takes in an array as an input and outputs the elements of the array as separate rows. UDTF's can be used in the SELECT expression list and as a part of LATERAL VIEW.An example use of explode() in the SELECT expression list is as follows:Consider a table named myTable that has a single column (myCol) and two rows:

    Array<int> myCol
    [1,2,3]
    [4,5,6]

    Then running the query:
    SELECT explode(myCol) AS myNewCol FROM myTable;
    Will produce:

    (int) myNewCol
    1
    2
    3
    4
    5
    6
    the above is extracted from the official guide:
    and here I am going to take a tour through how to explode the MAP structure and also how to explode multiple MAP structure 
     
    taking `test` table as an example:
    hive>select * from test;
    {"123":"abc"}
    {"234":"bcd"}
     
    now if we do :
    hive>select key, value from test 
        lateral view explode(item) dummy_table as key, value;
    123    abc
    234    bcd
     
    as we can see, explode will expand the MAP into multiple rows. and of course we can use key, value in any clause like 'group by' or 'sort by' etc.
     
    and how about if we have multiple items which are all MAP structure, like :
     
    create table test2 (
        item1 MAP<STRING, STRING>,
        item2 MAP<STRING, STRING>
    )
     
    hive>select * from test2;
    {"123":"abc","234":"bcd"}  {"123":"aaa","234":"bbb"}
     
    now if we do the same query again:
    hive>select key1, value1, key2, value2 from test2 
                      lateral view explode(item1) dummy1 as key1, value1
                      lateral view explode(item2) dummy2 as key2, value2;
    123 abc 123 aaa
    123 abc 234 bbb
    234 bcd 123 aaa
    234 bcd 234 bbb
     
    we see that Hive won't show just two lines, instead, it will try all combinations.
    so now we have a problem, how about I wanna do sum up on item1["123"] ?
    if the value of key "123" is not the alphabet but number instead, I should be able to sum up the value based on the key, right ?
     
    but now, as we showed above, Hive will do combination, so the value will be duplicated!
     
    Here is my solution, and simple:
    hive>select key1, value1, key2, value2 from test2 
                      lateral view explode(item1) dummy1 as key1, value1
                      lateral view explode(item2) dummy2 as key2, value2
              where key1 = key2;
    123 abc 123 aaa
    234 bcd 234 bbb
     
    so now if we do sum up like :
    hive>select key1,SUM(value1), SUM(value2) from test2
                      lateral view explode(item1) dummy1 as key1, value1
                      lateral view explode(item2) dummy2 as key2, value2
              where key1 = key2;
    we will get the correct sum up value of every key.
  • 相关阅读:
    SharePoint 2010 ——自定义上传页面与多文件上传解决方案
    SPJS Upload for SharePoint: Custom upload page for uploading documents to various document libraries in a site collection
    刚刚结束了公司EP流程,开始KMS项目开发了
    小孩出生6个月了,记录一下
    PeopleSoft FSCM Production Support 案例分析之一重大紧急事故发生时的应对策略
    PeopleSoft FSCM Production Support 案例分析
    SQL Server数据库常用的T-SQL命令
    详细讲解删除SQL Server日志的具体方法
    year()+month() 不错的Idear
    input只能输入数字
  • 原文地址:https://www.cnblogs.com/linehrr-freehacker/p/3309088.html
Copyright © 2011-2022 走看看