zoukankan      html  css  js  c++  java
  • hive行转列

    一、问题

    hive如何将

    a       1,2,3
    b       4,7
    c       5

    转化成为:

    a       1
    a       2
    a       3
    b       4
    b       7
    c       5

    二、原始数据

    cat row_column.txt
    a       1,2,3
    b       4,7
    c       5

    三、解决方案

    3.1 遍历每一列

    3.1.1 创建表

    -- 创建表
    create table tmp.row_column
    (
    col1 string,
    col3 string
    )
    row format delimited fields terminated by '	'
    stored as textfile;
    -- 载入数据
    load data local inpath '/tmp/row_column.txt' into table row_column;

    3.1.2 查看数据:

    hive> select * from row_column;                                          
    OK
    a       1,2,3
    b       4,7
    c       5

    3.1.3 遍历每一列

    select col1,name 
    from tmp.row_column
    lateral view explode(split(col3,',')) col3 as name;
    ---------------------------------------------------------------
    Total MapReduce CPU Time Spent: 2 seconds 20 msec
    OK
    a       1
    a       2
    a       3
    b       4
    b       7
    c       5

    3.2 数组遍历

    3.2.1 创建表

    create table tmp.row_column_array
    (
      col1 string,
      col3 array<int>
    )
    row format delimited 
    fields terminated by '	'
    collection items terminated by ','
    stored as textfile;

    3.2.2 加载数据

    load data local inpath '/tmp/row_column.txt' into table tmp.row_column_array;

    3.2.3 查看数据

    hive> select * from tmp.row_column_array;
    OK
    a       [1,2,3]
    b       [4,7]
    c       [5]

    3.2.4 查看每一列

    select col1,name
    from tmp.row_column_array
    lateral view explode(col3) col3 as name;

    3.2.5 结果

    a       1
    a       2
    a       3
    b       4
    b       7
    c       5

    四、补充

    查看使用逗号分割的列

    select t.list[0],t.list[1],t.list[2] from (
    select (split(col3,',')) list from tmp.row_column)t;
    Total MapReduce CPU Time Spent: 1 seconds 740 msec
    OK
    1       2       3
    4       7       NULL
    5       NULL    NULL
    Time taken: 15.264 seconds, Fetched: 3 row(s)

    查看长度

    select col1, size(split(col3,',')) list from tmp.row_column;
    Total MapReduce CPU Time Spent: 1 seconds 690 msec
    OK
    a       3
    b       2
    c       1
  • 相关阅读:
    工程的创建
    scrapy框架简介和基础应用
    移动端数据爬取
    Python网络爬虫之图片懒加载技术、selenium和PhantomJS
    验证码处理
    Python网络爬虫之requests模块
    Python网络爬虫之三种数据解析方式
    Python网络爬虫之requests模块
    scrapy
    基于XML的AOP配置
  • 原文地址:https://www.cnblogs.com/liqiu/p/4374294.html
Copyright © 2011-2022 走看看