zoukankan      html  css  js  c++  java
  • Hive之列转行,行转列

    在数据处理中,经常会遇到需要行列转换的需求,转载一篇博客

     

    列转行

    测试数据

    复制代码
    hive> select * from col_lie limit 10;
    OK
    col_lie.user_id    col_lie.order_id
    104399    1715131
    104399    2105395
    104399    1758844
    104399    981085
    104399    2444143
    104399    1458638
    104399    968412
    104400    1609001
    104400    2986088
    104400    1795054
    复制代码

    把相同user_id的order_id按照逗号转为一行

    复制代码
    select user_id,
    concat_ws(',',collect_list(order_id)) as order_value 
    from col_lie
    group by user_id
    limit 10;
    
    //结果(简写)
    user_id    order_value
    104399    1715131,2105395,1758844,981085,2444143
    复制代码

    总结

    使用函数:concat_ws(',',collect_set(column))  

    说明:collect_list 不去重,collect_set 去重。 column的数据类型要求是string

    行转列

    测试数据

    复制代码
    hive> select * from lie_col;
    OK
    lie_col.user_id    lie_col.order_value
    104408    2909888,2662805,2922438,674972,2877863,190237
    104407    2982655,814964,1484250,2323912,2689723,2034331,1692373,677498,156562,2862492,338128
    104406    1463273,2351480,1958037,2606570,3226561,3239512,990271,1436056,2262338,2858678
    104405    153023,2076625,1734614,2796812,1633995,2298856,2833641,3286778,2402946,2944051,181577,464232
    104404    1815641,108556,3110738,2536910,1977293,424564
    104403    253936,2917434,2345879,235401,2268252,2149562,2910478,375109,932923,1989353
    104402    3373196,1908678,291757,1603657,1807247,573497,1050134,3402420
    104401    814760,213922,2008045,3305934,2130994,1602245,419609,2502539,3040058,2828163,3063469
    104400    1609001,2986088,1795054,429550,1812893
    104399    1715131,2105395,1758844,981085,2444143,1458638,968412
    Time taken: 0.065 seconds, Fetched: 10 row(s)
    复制代码

    将order_value的每条记录切割为单元素

    复制代码
    select user_id,order_value,order_id
    from lie_col
    lateral view explode(split(order_value,',')) num as order_id
    limit 10;
    
    //结果
    user_id    order_value    order_id
    104408    2909888,2662805,2922438,674972,2877863,190237    2909888
    104408    2909888,2662805,2922438,674972,2877863,190237    2662805
    104408    2909888,2662805,2922438,674972,2877863,190237    2922438
    104408    2909888,2662805,2922438,674972,2877863,190237    674972
    104408    2909888,2662805,2922438,674972,2877863,190237    2877863
    104408    2909888,2662805,2922438,674972,2877863,190237    190237
    104407    2982655,814964,1484250,2323912,2689723,2034331,1692373,677498,156562,2862492,338128    2982655
    104407    2982655,814964,1484250,2323912,2689723,2034331,1692373,677498,156562,2862492,338128    814964
    104407    2982655,814964,1484250,2323912,2689723,2034331,1692373,677498,156562,2862492,338128    1484250
    104407    2982655,814964,1484250,2323912,2689723,2034331,1692373,677498,156562,2862492,338128    2323912
    Time taken: 0.096 seconds, Fetched: 10 row

    转载 https://www.cnblogs.com/blogyuhan/p/9274784.html
  • 相关阅读:
    向存储过程中传入列表参数
    TRUNCATE TABLE (Transact-SQL)
    Program.cs
    联想G40-30 安装win10系统
    完美解决github访问速度慢[转]
    关于Visual studio 2017安装方法的若干问题
    [转载] 中国象棋软件-引擎实现(七)测试程序
    [转载] 中国象棋软件-引擎实现(六)局面评估
    [转载] 中国象棋软件-引擎实现(五)历史启发及着法排序
    [转载] 中国象棋软件-引擎实现(四)搜索算法
  • 原文地址:https://www.cnblogs.com/huyang1988/p/14869595.html
Copyright © 2011-2022 走看看