zoukankan      html  css  js  c++  java
  • hive学习笔记之七:内置函数

    欢迎访问我的GitHub

    https://github.com/zq2599/blog_demos

    内容:所有原创文章分类汇总及配套源码,涉及Java、Docker、Kubernetes、DevOPS等;

    《hive学习笔记》系列导航

    1. 基本数据类型
    2. 复杂数据类型
    3. 内部表和外部表
    4. 分区表
    5. 分桶
    6. HiveQL基础
    7. 内置函数
    8. Sqoop
    9. 基础UDF
    10. 用户自定义聚合函数(UDAF)
    11. UDTF

    本篇概览

    • 本文是《hive学习笔记》系列的第七篇,前文熟悉了HiveQL的常用语句,接下来把常用的内置函数简单过一遍,分为以下几部分:
    1. 数学
    2. 字符
    3. json处理
    4. 转换
    5. 日期
    6. 条件
    7. 聚合

    准备数据

    1. 本次实战要准备两个表:学生表和住址表,字段都很简单,如下图所示,学生表有个住址ID字段,是住址表里的记录的唯一ID:

    在这里插入图片描述
    2. 先创建住址表:

    create table address (addressid int, province string, city string) 
    row format delimited 
    fields terminated by ',';
    
    1. 创建address.txt文件,内容如下:
    1,guangdong,guangzhou
    2,guangdong,shenzhen
    3,shanxi,xian
    4,shanxi,hanzhong
    6,jiangshu,nanjing
    
    1. 加载数据到address表:
    load data 
    local inpath '/home/hadoop/temp/202010/25/address.txt' 
    into table address;
    
    1. 创建学生表,其addressid字段关联了address表的addressid字段:
    create table student (name string, age int, addressid int) 
    row format delimited 
    fields terminated by ',';
    
    1. 创建student.txt文件,内容如下:
    tom,11,1
    jerry,12,2
    mike,13,3
    john,14,4
    mary,15,5
    
    1. 加载数据到student表:
    load data 
    local inpath '/home/hadoop/temp/202010/25/student.txt' 
    into table student;
    
    1. 至此,本次操作所需数据已准备完毕,如下所示:
    hive> select * from address;
    OK
    1	guangdong	guangzhou
    2	guangdong	shenzhen
    3	shanxi	xian
    4	shanxi	hanzhong
    6	jiangshu	nanjing
    Time taken: 0.043 seconds, Fetched: 5 row(s)
    hive> select * from student;
    OK
    tom	11	1
    jerry	12	2
    mike	13	3
    john	14	4
    mary	15	5
    Time taken: 0.068 seconds, Fetched: 5 row(s)
    
    • 开始体验内置函数;

    总览

    1. 进入hive控制台;
    2. 执行命令show functions;显示内置函数列表:
    hive> show functions;
    OK
    !
    !=
    %
    &
    *
    +
    -
    /
    <
    <=
    <=>
    <>
    =
    ==
    >
    >=
    ^
    abs
    acos
    add_months
    and
    array
    array_contains
    ascii
    asin
    assert_true
    atan
    avg
    base64
    between
    bin
    case
    cbrt
    ceil
    ceiling
    coalesce
    collect_list
    collect_set
    compute_stats
    concat
    concat_ws
    context_ngrams
    conv
    corr
    cos
    count
    covar_pop
    covar_samp
    create_union
    cume_dist
    current_database
    current_date
    current_timestamp
    current_user
    date_add
    date_format
    date_sub
    datediff
    day
    dayofmonth
    decode
    degrees
    dense_rank
    div
    e
    elt
    encode
    ewah_bitmap
    ewah_bitmap_and
    ewah_bitmap_empty
    ewah_bitmap_or
    exp
    explode
    factorial
    field
    find_in_set
    first_value
    floor
    format_number
    from_unixtime
    from_utc_timestamp
    get_json_object
    greatest
    hash
    hex
    histogram_numeric
    hour
    if
    in
    in_file
    index
    initcap
    inline
    instr
    isnotnull
    isnull
    java_method
    json_tuple
    lag
    last_day
    last_value
    lcase
    lead
    least
    length
    levenshtein
    like
    ln
    locate
    log
    log10
    log2
    lower
    lpad
    ltrim
    map
    map_keys
    map_values
    matchpath
    max
    min
    minute
    month
    months_between
    named_struct
    negative
    next_day
    ngrams
    noop
    noopstreaming
    noopwithmap
    noopwithmapstreaming
    not
    ntile
    nvl
    or
    parse_url
    parse_url_tuple
    percent_rank
    percentile
    percentile_approx
    pi
    pmod
    posexplode
    positive
    pow
    power
    printf
    radians
    rand
    rank
    reflect
    reflect2
    regexp
    regexp_extract
    regexp_replace
    repeat
    reverse
    rlike
    round
    row_number
    rpad
    rtrim
    second
    sentences
    shiftleft
    shiftright
    shiftrightunsigned
    sign
    sin
    size
    sort_array
    soundex
    space
    split
    sqrt
    stack
    std
    stddev
    stddev_pop
    stddev_samp
    str_to_map
    struct
    substr
    substring
    sum
    tan
    to_date
    to_unix_timestamp
    to_utc_timestamp
    translate
    trim
    trunc
    ucase
    unbase64
    unhex
    unix_timestamp
    upper
    var_pop
    var_samp
    variance
    weekofyear
    when
    windowingtablefunction
    xpath
    xpath_boolean
    xpath_double
    xpath_float
    xpath_int
    xpath_long
    xpath_number
    xpath_short
    xpath_string
    year
    |
    ~
    Time taken: 0.003 seconds, Fetched: 216 row(s)
    
    1. lower函数为例,执行命令describe function lower;即可查看lower函数的说明:
    hive> describe function lower;
    OK
    lower(str) - Returns str with all characters changed to lowercase
    Time taken: 0.005 seconds, Fetched: 1 row(s)
    
    • 接下来从计算函数开始,体验常用函数;
    • 先执行以下命令,使查询结果中带有字段名:
    set hive.cli.print.header=true;
    

    计算函数

    1. 加法+
    hive> select name, age, age+1 as add_value from student;
    OK
    name	age	add_value
    tom	11	12
    jerry	12	13
    mike	13	14
    john	14	15
    mary	15	16
    Time taken: 0.098 seconds, Fetched: 5 row(s)
    
    1. 减法(-)、乘法(*)、除法(/)的使用与加法类似,不再赘述了;
    2. 四舍五入round
    hive> select round(1.1), round(1.6);
    OK
    _c0	_c1
    1.0	2.0
    Time taken: 0.028 seconds, Fetched: 1 row(s)
    
    1. 向上取整ceil
    hive> select ceil(1.1);
    OK
    _c0
    2
    Time taken: 0.024 seconds, Fetched: 1 row(s)
    
    1. 向下取整floor
    hive> select floor(1.1);
    OK
    _c0
    1
    Time taken: 0.024 seconds, Fetched: 1 row(s)
    
    1. 平方pow,例如pow(2,3)表示2的三次方,等于8:
    hive> select pow(2,3);
    OK
    _c0
    8.0
    Time taken: 0.027 seconds, Fetched: 1 row(s)
    
    1. 取模pmod
    hive> select pmod(10,3);
    OK
    _c0
    1
    Time taken: 0.059 seconds, Fetched: 1 row(s)
    

    字符函数

    1. 转小写lower,转大写upper
    hive> select lower(name), upper(name) from student;
    OK
    _c0	_c1
    tom	TOM
    jerry	JERRY
    mike	MIKE
    john	JOHN
    mary	MARY
    Time taken: 0.051 seconds, Fetched: 5 row(s)
    
    1. 字符串长度length
    hive> select name, length(name) from student;
    OK
    tom	3
    jerry	5
    mike	4
    john	4
    mary	4
    Time taken: 0.322 seconds, Fetched: 5 row(s)
    
    1. 字符串拼接concat
    hive> select concat("prefix_", name) from student;
    OK
    prefix_tom
    prefix_jerry
    prefix_mike
    prefix_john
    prefix_mary
    Time taken: 0.106 seconds, Fetched: 5 row(s)
    
    1. 子串substr,substr(xxx,2)表示从第二位开始到右边所有,substr(xxx,2,3)表示从第二位开始取三个字符:
    hive> select substr("0123456",2);
    OK
    123456
    Time taken: 0.067 seconds, Fetched: 1 row(s)
    hive> select substr("0123456",2,3);
    OK
    123
    Time taken: 0.08 seconds, Fetched: 1 row(s)
    
    1. 去掉前后空格trim
    hive> select trim("   123   ");
    OK
    123
    Time taken: 0.065 seconds, Fetched: 1 row(s)
    

    json处理(get_json_object)

    为了使用json处理的函数,先准备一些数据:

    1. 先创建表t15,只有一个字段用于保存字符串:
    create table t15(json_raw string) 
    row format delimited;
    
    1. 创建t15.txt文件,内容如下:
    {"name":"tom","age":"10"}
    {"name":"jerry","age":"11"}
    
    1. 加载数据到t15表:
    load data 
    local inpath '/home/hadoop/temp/202010/25/015.txt' 
    into table t15;
    
    1. 使用get_json_object函数,解析json_raw字段,分别取出指定nameage属性:
    select 
    get_json_object(json_raw, "$.name"), 
    get_json_object(json_raw, "$.age") 
    from t15;
    

    得到结果:

    hive> select 
        > get_json_object(json_raw, "$.name"), 
        > get_json_object(json_raw, "$.age") 
        > from t15;
    OK
    tom	10
    jerry	11
    Time taken: 0.081 seconds, Fetched: 2 row(s)
    

    日期

    1. 获取当前日期current_date
    hive> select current_date();
    OK
    2020-11-02
    Time taken: 0.052 seconds, Fetched: 1 row(s)
    
    1. 获取当前时间戳current_timestamp
    hive> select current_timestamp();
    OK
    2020-11-02 10:07:58.967
    Time taken: 0.049 seconds, Fetched: 1 row(s)
    
    1. 获取年份year、月份month、日期day
    hive> select year(current_date()), month(current_date()), day(current_date());
    OK
    2020	11	2
    Time taken: 0.054 seconds, Fetched: 1 row(s)
    
    1. 另外,yearcurrent_timestamp也能搭配使用:
    hive> select year(current_timestamp()), month(current_timestamp()), day(current_timestamp());
    OK
    2020	11	2
    Time taken: 0.042 seconds, Fetched: 1 row(s)
    
    1. 返回日期部分to_date
    hive> select to_date(current_timestamp());
    OK
    2020-11-02
    Time taken: 0.051 seconds, Fetched: 1 row(s)
    

    条件函数

    • 条件函数的作用和java中的switch类似,语法是case X when XX then XXX else XXXX end
    • 示例如下,作用是判断name字段,如果等于tom就返回tom_case,如果等于jerry就返回jerry_case,其他情况都返回other_case
    select name,
    case name when 'tom' then 'tom_case'
              when 'jerry' then 'jerry_case'
              else 'other_case'
    end
    from student;
    

    结果如下:

    hive> select name,
        > case name when 'tom' then 'tom_case'
        >           when 'jerry' then 'jerry_case'
        >           else 'other_case'
        > end
        > from student;
    OK
    tom	tom_case
    jerry	jerry_case
    mike	other_case
    john	other_case
    mary	other_case
    Time taken: 0.08 seconds, Fetched: 5 row(s)
    

    聚合函数

    1. 返回行数count
    select count(*) from student;
    

    触发MR,结果如下:

    Total MapReduce CPU Time Spent: 2 seconds 170 msec
    OK
    5
    Time taken: 20.823 seconds, Fetched: 1 row(s)
    
    1. 分组后组内求和sum
    select province, sum(1) from address group by province;
    

    触发MR,结果如下:

    Total MapReduce CPU Time Spent: 1 seconds 870 msec
    OK
    guangdong	2
    jiangshu	1
    shanxi	2
    Time taken: 19.524 seconds, Fetched: 3 row(s)
    
    1. 分组后,组内最小值min,最大值max,平均值avg
    select province, min(addressid), max(addressid), avg(addressid) from address group by province;
    

    触发MR,结果如下:

    Total MapReduce CPU Time Spent: 1 seconds 650 msec
    OK
    guangdong	1	2	1.5
    jiangshu	6	6	6.0
    shanxi	3	4	3.5
    Time taken: 20.106 seconds, Fetched: 3 row(s)
    
    • 至此,hive常用到内置函数咱们都体验过一遍了,希望能给您提供一些参考,接下来的文章会体验一个常用工具:Sqoop

    你不孤单,欣宸原创一路相伴

    1. Java系列
    2. Spring系列
    3. Docker系列
    4. kubernetes系列
    5. 数据库+中间件系列
    6. DevOps系列

    欢迎关注公众号:程序员欣宸

    微信搜索「程序员欣宸」,我是欣宸,期待与您一同畅游Java世界...
    https://github.com/zq2599/blog_demos

  • 相关阅读:
    题解 CF432D 【Prefixes and Suffixes】
    题解 P3199 【[HNOI2009]最小圈】
    题解 AT1221 【水筒】
    题解 P4381 【[IOI2008]Island】
    题解 CF1242B 【0-1 MST】
    《数据库系统概论》 -- 3.1 SQL概论和DDL
    《数据库系统概论》 -- 2. 关系操作
    《数据库系统概论》 -- 1. 绪论
    配置vim成为IDE
    ubuntu--软件管理
  • 原文地址:https://www.cnblogs.com/bolingcavalry/p/14975212.html
Copyright © 2011-2022 走看看