zoukankan      html  css  js  c++  java
  • 2.13 Hive中自带Function使用及自定义UDF编程

    UDF:User Definition Function

    一、function

    #查看自带的函数
    hive (db_hive)> show functions;
    
    #查看一个函数的详细用法
    hive (db_hive)> desc function extended split;
    OK
    tab_name
    split(str, regex) - Splits str around occurances that match regex
    Example:
      > SELECT split('oneAtwoBthreeC', '[ABC]') FROM src LIMIT 1;
      ["one", "two", "three"]
    Time taken: 0.005 seconds, Fetched: 4 row(s)


    二、UDF

    https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF

    https://cwiki.apache.org/confluence/display/Hive/HivePlugins        #自定义UDF

    Hive自带了一些函数,比如:max/min等,但是数量有限,自己可以通过自定义UDF来方便的扩展。
    
    UDF:用户自定义函数,允许用户扩展HiveQL功能;
    
    
    
    ##
    UDF(User-Defined-Function)
    一进一出
    
    UDAF(User-Defined Aggregation Funcation)
    聚集函数,多进一出;类似于:count/max/min 
    
    UDTF(User-Defined Table-Generating Functions)
    一进多出;如lateral view explore()
    
    
    
    编程步骤:
    1、继承org.apache.hadoop.hive.ql.UDF
    2、需要实现evaluate函数;evaluate函数支持重载;
    
    注意事项:
    1、UDF必须要有返回类型,可以返回null,但是返回类型不能为void;
    2、UDF中常用Text/LongWritable等类型,不推荐使用java类型;


    创建一个UDF-方式一:

    1、Creating Custom UDFs

    ### LowerUDF.java###
    package com.beifeng.senior.hive.udf;
    
    import org.apache.hadoop.hive.ql.exec.UDF;
    import org.apache.hadoop.io.Text;
    
    /**
     * 1. Implement one or more methods named
     * "evaluate" which will be called by Hive.
     * 
     * 2."evaluate" should never be a void method. However it can return "null" if
     * needed.
     * @author root
     *
     */
    
    public class LowerUDF extends UDF{
        
        public Text evaluate(Text str) {
            //validate
            if(null == str.toString()) {
                return null;
            }
            //lower
            return new Text (str.toString().toLowerCase());
        }
    
        public static void main(String[] args) {
            System.out.println(new LowerUDF().evaluate(new Text("HIVE")));
        }
    }
    
    
    
    #然后打成jar包
    [root@hadoop-senior datas]# pwd
    /opt/datas
    [root@hadoop-senior datas]# ls hiveudf.jar 
    hiveudf.jar


    2、usage

    #添加
    hive (db_hive)> add jar /opt/datas/hiveudf.jar;
    Added /opt/datas/hiveudf.jar to class path
    Added resource: /opt/datas/hiveudf.jar
    
    
    #注册,my_lower是要注册的函数名,com.beifeng.senior.hive.udf.LowerUDF是类名
    hive (db_hive)> create temporary function my_lower as "com.beifeng.senior.hive.udf.LowerUDF";
    OK
    Time taken: 0.012 seconds
    
    
    #查看
    hive (db_hive)> show functions;
    ...
    my_lower
    ...
    
    
    #测试使用
    hive (db_hive)> select ename, my_lower(ename) lowername from emp limit 5;
    Total jobs = 1
    Launching Job 1 out of 1
    Number of reduce tasks is set to 0 since there's no reduce operator
    Starting Job = job_1554717689707_0031, Tracking URL = http://hadoop-senior.ibeifeng.com:8088/proxy/application_1554717689707_0031/
    Kill Command = /opt/modules/hadoop-2.5.0/bin/hadoop job  -kill job_1554717689707_0031
    Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
    2019-04-24 15:32:42,268 Stage-1 map = 0%,  reduce = 0%
    2019-04-24 15:32:47,387 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.28 sec
    MapReduce Total cumulative CPU time: 1 seconds 280 msec
    Ended Job = job_1554717689707_0031
    MapReduce Jobs Launched: 
    Job 0: Map: 1   Cumulative CPU: 1.28 sec   HDFS Read: 894 HDFS Write: 60 SUCCESS
    Total MapReduce CPU Time Spent: 1 seconds 280 msec
    OK
    ename    lowername
    SMITH    smith
    ALLEN    allen
    WARD    ward
    JONES    jones
    MARTIN    martin
    Time taken: 10.548 seconds, Fetched: 5 row(s)


    创建一个UDF-方式二:

    此方法jar包要位于hdfs上;

    CREATE FUNCTION myfunc AS 'myclass' USING JAR 'hdfs:///path/to/jar';

    1、

    ##上传jar包到hdfs
    hive (db_hive)> dfs -mkdir -p /user/root/hive/jars/;
    hive (db_hive)> dfs -put /opt/datas/hiveudf.jar /user/root/hive/jars/;
    hive (db_hive)> dfs -ls -R /user/root/hive/jars;
    -rw-r--r--   1 root supergroup        910 2019-04-24 15:40 /user/root/hive/jars/hiveudf.jar
    
    
    #创建function
    hive (db_hive)> create function self_lower as 'com.beifeng.senior.hive.udf.LowerUDF' using jar 'hdfs://hadoop-senior.ibeifeng.com:8020/user/root/hive/jars/hiveudf.jar';
    converting to local hdfs://hadoop-senior.ibeifeng.com:8020/user/root/hive/jars/hiveudf.jar
    Added /tmp/5356b66f-bf56-4de6-abf8-30be8029fa8b_resources/hiveudf.jar to class path
    Added resource: /tmp/5356b66f-bf56-4de6-abf8-30be8029fa8b_resources/hiveudf.jar
    OK
    Time taken: 0.025 seconds
    
    
    #使用
    hive (db_hive)> select ename, self_lower(ename) lowername from emp limit 5;                                                                                             
    Total jobs = 1
    Launching Job 1 out of 1
    Number of reduce tasks is set to 0 since there's no reduce operator
    Starting Job = job_1554717689707_0032, Tracking URL = http://hadoop-senior.ibeifeng.com:8088/proxy/application_1554717689707_0032/
    Kill Command = /opt/modules/hadoop-2.5.0/bin/hadoop job  -kill job_1554717689707_0032
    Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
    2019-04-24 15:53:28,378 Stage-1 map = 0%,  reduce = 0%
    2019-04-24 15:53:33,504 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.35 sec
    MapReduce Total cumulative CPU time: 1 seconds 350 msec
    Ended Job = job_1554717689707_0032
    MapReduce Jobs Launched: 
    Job 0: Map: 1   Cumulative CPU: 1.35 sec   HDFS Read: 894 HDFS Write: 60 SUCCESS
    Total MapReduce CPU Time Spent: 1 seconds 350 msec
    OK
    ename    lowername
    SMITH    smith
    ALLEN    allen
    WARD    ward
    JONES    jones
    MARTIN    martin
    Time taken: 10.549 seconds, Fetched: 5 row(s)
  • 相关阅读:
    Java59道常见面试题,内附答案
    2020年一个月时间面试字节跳动,面试经历分享(已拿offer)
    阿里总结的《Java成神之路》 PDF 火了,完整版开放下载!
    Github14k堪称神级的Spring Boot手册,从基础入门到实战进阶
    三面阿里(支付宝)Java高开岗,复习两月有幸拿到offer
    Java 最常见的 200+ 面试题:面试必备
    【面试总结】终于拿到了美团offer了,没有辜负了这三个月的努力啊
    阿里P9都赞不绝口的面试宝典!半月看完25大专题,居然斩获阿里P7offer
    Integer是int
    抽象类和接口
  • 原文地址:https://www.cnblogs.com/weiyiming007/p/10762936.html
Copyright © 2011-2022 走看看