zoukankan      html  css  js  c++  java
  • hive优化1-列转行

    数据需求说明:

    table1的数据是 客户id,基金差额,保险差额,理财差额,。。。

    create table youhua1(
        -> custid int
        -> ,jijin_change float
        -> ,baoxian_change float
        -> ,cunkuan_change float)
        -> ;
    
    insert into youhua1 values(1,1.1,1.2,1.3),(2,2.67,2.34,2.1),(3,3.1,3.4,3.78);
    
    select * from youhua1;
    +--------+--------------+----------------+----------------+
    | custid | jijin_change | baoxian_change | cunkuan_change |
    +--------+--------------+----------------+----------------+
    |      1 |          1.1 |            1.2 |            1.3 |
    |      2 |         2.67 |           2.34 |            2.1 |
    |      3 |          3.1 |            3.4 |           3.78 |
    +--------+--------------+----------------+----------------+

    原查询思路:

    select 
         custid 
        ,"基金" as prod_name
        ,jijin_change as change
    from youhua1
    
    union all
    select 
         custid 
        ,"保险" as prod_name
        ,baoxian_change change
    from youhua1
    
    union all
    select 
         custid 
        ,"存款" as prod_name
        ,cunkuan_change as change
    from youhua1;

    目标数据:客户1,基金,差额(基金)
                      客户1,保险,差额(保险)

                      ...

    +--------+-----------+------+
    | custid | prod_name |change|
    +--------+-----------+------+
    |      1 | 基金      |  1.1 |
    |      2 | 基金      | 2.67 |
    |      3 | 基金      |  3.1 |
    |      1 | 保险      |  1.2 |
    |      2 | 保险      | 2.34 |
    |      3 | 保险      |  3.4 |
    |      1 | 存款      |  1.3 |
    |      2 | 存款      |  2.1 |
    |      3 | 存款      | 3.78 |
    +--------+-----------+------+

    优化方案:

    当数据量很大时,这种方式会造成多次读表。

    可以采用from前置的方式;

    也可以先通过concat函数拼接jijin_change、baoxian_change、cunkuan_change三个字段成一列,然后再用列转行函数lateral view explode函数处理。

    行转列:一个字段的多行数据合进一个列,通常可用collect_set+concat_ws;列转行:一个字段的一列数据拆到多个行,参考:https://blog.csdn.net/weixin_42003671/article/details/88132666

    select custid,concat_ws(',',jijin_change,baoxian_change,cunkuan_change) from youhua1;  #这是mysql处理,hive中concat_ws里类型要转string
    +--------+-----------------------------------------------------------+
    | custid | concat_ws(',',jijin_change,baoxian_change,cunkuan_change) |
    +--------+-----------------------------------------------------------+
    |      1 | 1.1,1.2,1.3                                               |
    |      2 | 2.67,2.34,2.1                                             |
    |      3 | 3.1,3.4,3.78                                              |
    +--------+-----------------------------------------------------------+

     以下为hive处理:

    select 
         a.custid
        ,change 
    from (
         select
               custid
              ,concat_ws(',',cast(jijin_change as string),cast(baoxian_change as string),cast(cunkuan_change as string)) as all_change
         from youhua1
        ) as a
    lateral view explode(split(a.all_change,',')) table_tmp as change;
    1    1.1
    1    1.2
    1    1.3
    2    2.67
    2    2.34
    2    2.1
    3    3.1
    3    3.4
    3    3.78

    发现没有加上基金、保险、存款的行标识:

    可以在concat_ws之前先将基金、保险、存款的行标识和值打在一起,concat_ws("+","基金"+"jijin_change"),...

    在explode时指定split(a.all_change, ','),最后select custid,split(change,"+")[0] as prod_name,split(change,"+")[1] as change

    报bug,用“+”没有办法拆开,会报错,把+替换成'+'、'/+'都不行,改换为‘-’

    select 
         a.custid
        ,split(change,'-')[0] as prod_name
        ,split(change,'-')[1] as change    
    from (
         select
               custid
              ,concat_ws(',',concat_ws('-','基金',cast(jijin_change as string)),concat_ws('-','保险',cast(baoxian_change as string)),concat_ws('-','存款',cast(cunkuan_change as string))) as all_change
         from youhua1
        ) as a
    lateral view explode(split(a.all_change,',')) table_tmp as change;
    1    基金    1.1
    1    保险    1.2
    1    存款    1.3
    2    基金    2.67
    2    保险    2.34
    2    存款    2.1
    3    基金    3.1
    3    保险    3.4
    3    存款    3.78

     “+”号的方式:

    concat_ws用“+”,split用\+“”

    select 
         a.custid
        ,split(change,'\+')[0] as prod_name
        ,split(change,'\+')[1] as change    
    from (
         select
               custid
              ,concat_ws(',',concat_ws('+','基金',cast(jijin_change as string)),concat_ws('+','保险',cast(baoxian_change as string)),concat_ws('+','存款',cast(cunkuan_change as string))) as all_change
         from youhua1
        ) as a
    lateral view explode(split(a.all_change,',')) table_tmp as change;
  • 相关阅读:
    求素数(定义法,埃式法,欧拉法)
    打表法
    python学习日记(匿名函数)
    python学习日记(编码再回顾)
    python学习日记(文件操作练习题)
    python学习日记(迭代器、生成器)-乱七八糟
    python学习日记(生成器函数进阶)
    python学习日记(装饰器的补充)
    python学习日记(函数--装饰器)
    python学习日记(函数进阶)
  • 原文地址:https://www.cnblogs.com/foolangirl/p/14145147.html
Copyright © 2011-2022 走看看