zoukankan      html  css  js  c++  java
  • hive进阶 技巧

    1.日期格式转换(将yyyymmdd转换为yyyy-mm-dd)

    select from_unixtime(unix_timestamp('20180905','yyyymmdd'),'yyyy-mm-dd')
    

    2..hive去掉字段中除字母和数字外的其它字符

    select regexp_replace(a, '[^0-9a-zA-Z]', '') from tbl_name

    3.hive解析json字段
    content字段存储json {"score":"100","name":"zhou","class":''math"},若要对json进行解析,则可用以下方式

    ---解析单个字段
    select get_json_object(content,'$.score') ,
               get_json_object(content,'$.name),
               get_json_object(content,'$.class')
     from tbl_name
    ---解析多个字段可以用json_tuple
    select a.*
          ,b.score
          ,b.name
          ,b.class
     from tbl a 
    lateral view outer json_tuple(a.content,'score', 'name', 'class') b as score,name,class

    4.hive 导入数据
    若从本地文件系统上传,需要加上local关键字;如果直接从hdfs路径上传,则不加local

    load data [local] inpath '/data/monthcard.csv' overwrite into table tbl_name;
    

    5.hive 避免科学计数法

    select printf("%.2f",3.428777027500007E7)

    6.hive collect_set和lateral view explode用法
    原始数据

    id1    id2    name
    1       1       A
    1       1       B
    1       1       C
    1       2       X
    1       2       Y
    

    (1)collect_set

    select id1,id2,
    collect_set(name) as new_name1,
    collect_set(case when id2>1 then name end) as new_name2,
    count(name) as cnt
    from default.zql_test
    group by id1,id2;
    ---输出结果
    OK
    id1     id2     new_name1       new_name2       cnt
    1       1       ["C","A","B"]   []      3
    1       2       ["X","Y"]       ["X","Y"]       2
    

    (2)lateral view explode

    select * 
    from 
    (
    select id1,id2,
    collect_set(name) as new_name1,
    collect_set(case when id2>1 then name end) as new_name2,
    count(name) as cnt
    from default. zql_test
    group by id1,id2
    )t
    lateral view explode(new_name1) t as new_type1 
    lateral view explode(new_name2) t as new_type2
    ----输出结果
    OK
    t.id1   t.id2   t.new_name1     t.new_name2     t.cnt   t.new_type1     t.new_type2
    1       2       ["Y","X"]       ["Y","X"]       2       Y       Y
    1       2       ["Y","X"]       ["Y","X"]       2       Y       X
    1       2       ["Y","X"]       ["Y","X"]       2       X       Y
    1       2       ["Y","X"]       ["Y","X"]       2       X       X
    

    (3)lateral view explode outer ,加上outer会保留所有记录,两者差异可以参考之前的专题

    select * 
    from 
    (
    select id1,id2,
    collect_set(name) as new_name1,
    collect_set(case when id2>1 then name end) as new_name2,
    count(name) as cnt
    from default. zql_test
    group by id1,id2
    )t
    lateral view outer explode(new_name1) t as new_type1 
    lateral view outer explode(new_name2) t as new_type2
    ;
    
    ----输出结果
    OK
    t.id1   t.id2   t.new_name1     t.new_name2     t.cnt   t.new_type1     t.new_type2
    1       1       ["B","A","C"]   []      3       B       NULL
    1       1       ["B","A","C"]   []      3       A       NULL
    1       1       ["B","A","C"]   []      3       C       NULL
    1       2       ["X","Y"]       ["X","Y"]       2       X       X
    1       2       ["X","Y"]       ["X","Y"]       2       X       Y
    1       2       ["X","Y"]       ["X","Y"]       2       Y       X
    1       2       ["X","Y"]       ["X","Y"]       2       Y       Y
    
    

    7.hive取前百分之几

    ---分组内将数据分成两片
    ntile(2)over(partition by id order by create_tm)

    8.hive返回星期几的方法

    ---2012-01-01刚好星期日
    select pmod(datediff(from_unixtime(unix_timestamp()),'2012-01-01'),7) from default.dual;
     
    --返回值0-6
    --其中0代表星期日

    9.hive产生uuid

    select regexp_replace(reflect("java.util.UUID", "randomUUID"), "-", "");
    

    10.hive中匹配中文

    select  regexp '[\u4e00-\u9fa5]';
    

    11.hive中regexp_extract的用法
    regexp_extract(string subject, string regex_pattern, string index)
    说明:抽取字符串subject中符合正则表达式regex_pattern的第index个部分的字符串

    第一参数: 要处理的字段
    第二参数: 需要匹配的正则表达式
    第三个参数:
    0是显示与之匹配的整个字符串
    1 是显示第一个括号里面的
    2 是显示第二个括号里面的字段...

    举例:
    --取一个连续17位为数字的字符串,且两端为非数字
    
    select regexp_extract('1、非订单号(20位):00123456789876543210;
                          2、订单号(17位):12345678987654321;
                          3、其它文字','[^\d](\d{17})[^\d]',0) as s1
    , substr(regexp_extract('1、非订单号(20位):01234567898765432100;
                          2、订单号(17位):12345678987654321;
                          3、其它文字','[^\d](\d{17})[^\d]',0),2,17) as s2
    ,regexp_extract('1、非订单号(20位):00123456789876543210;
                          2、订单号(17位):12345678987654321;
                          3、其它文字','[^\d](\d{17})[^\d]',1) as s3;
    



    链接:https://www.jianshu.com/p/fe1cdd06f5f8

     

  • 相关阅读:
    Ajax 传递json字符串到客户端时报 Internal server error
    Java 判断字符串的存储长度
    5个数组Array方法: indexOf、filter、forEach、map、reduce使用实例
    databales详解(一)
    JQuery总结
    《JavaScript 高级程序设计》总结
    ASP.NET MVC中controller和view相互传值的方式
    Jquery tmpl详解
    @section script{}的使用
    messager(消息窗口)
  • 原文地址:https://www.cnblogs.com/Allen-rg/p/10986311.html
Copyright © 2011-2022 走看看