zoukankan      html  css  js  c++  java
  • hive 学习系列四(用户自定义函数)

    如果入参是简单的数据类型,直接继承UDF,实现一个或者多个evaluate 方法。

    具体流程如下:

    1,实现大写字符转换成小写字符的UDF

    package com.example.hive.udf;
    
    import org.apache.hadoop.hive.ql.exec.UDF;
    import org.apache.hadoop.io.Text;
    
    public class Lower extends UDF {
        public Text evaluate(final Text s) {
            if (s == null) {
                return null;
            }
            return new Text(s.toString().toLowerCase());
        }
    }
    

    2,打包成jar 包。

    建立maven 项目,使用maven 打包。
    这里打包成的jar 包是,hiveudf-1.0.0.jar

    3,上传到hdfs 路径上。

    [root@master /opt]# hadoop fs -mkdir -p /user/hive/udf
    18/06/07 09:41:09 WARN util.NativeCodeLoader: Unable 
    to load native-hadoop library for your platform... using builtin-java classes where applicable
    [root@master /opt]# hadoop fs -put hiveudf-1.0.0.jar  /user/hive/udf
    18/06/07 09:41:24 WARN util.NativeCodeLoader: Unable to 
    load native-hadoop library for your platform... using builtin-java classes where applicable
    [root@master /opt]# hadoop fs -ls /user/hive/udf 
    18/06/07 09:41:47 WARN util.NativeCodeLoader: Unable to load native-hadoop library
     for your platform... using builtin-java classes where applicable
    Found 1 items
    -rw-r--r--   3 root supergroup       8020 2018-06-07 09:41 /user/hive/udf/hiveudf-1.0.0.jar
    [root@master /opt]#
    

    4, 在Hive 命令行里面创建函数。

    add jar hdfs:////udf/hiveudf-1.0.0.jar;
    create temporary function lower as 'com.example.hive.udf.Lower';
    
    hive> delete jar  hiveudf-1.0.0.jar;
    hive> list jars
        > ;
    hive> add jar hdfs:///user/hive/udf/hiveudf-1.0.0.jar
        > ;
    Added [/tmp/416cfcca-9ea0-4eaf-9e54-8154b440f3a9_resources/hiveudf-1.0.0.jar] to class path
    Added resources: [hdfs:///user/hive/udf/hiveudf-1.0.0.jar]
    hive> list jars;
    /tmp/416cfcca-9ea0-4eaf-9e54-8154b440f3a9_resources/hiveudf-1.0.0.jar
    hive> create temporary function lower as 'com.example.hive.udf.Lower';
    OK
    Time taken: 0.594 seconds
    hive> 
    

    5,然后就可以用这个注册的函数了。

    hive> select lower('AbcDEfg')
        > ;
    OK
    abcdefg
    Time taken: 1.718 seconds, Fetched: 1 row(s)
    hive> 
    
    

    至于入参是复杂数据类型,比如Array 等, 可以继承GenericUDF

    1,同样的,先写一个类,继承GenericUDF,

    此自定义函数实现的是,把一个点,根据经纬度,转换成一个字符串。

    package com.zbra.udf;
    
    
    import org.apache.hadoop.hive.ql.exec.UDFArgumentException;
    import org.apache.hadoop.hive.ql.exec.UDFArgumentLengthException;
    import org.apache.hadoop.hive.ql.metadata.HiveException;
    import org.apache.hadoop.hive.ql.udf.generic.GenericUDF;
    import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
    import org.apache.hadoop.hive.serde2.objectinspector.primitive.DoubleObjectInspector;
    import org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory;
    
    /**
     * 针对复杂数据
     */
    public class GeoUdf extends GenericUDF {
    
        private DoubleObjectInspector doubleObjectInspector01;
        private DoubleObjectInspector doubleObjectInspector02;
    
        public ObjectInspector initialize(ObjectInspector[] objectInspectors) throws UDFArgumentException {
            if (objectInspectors.length != 2) {
                throw new UDFArgumentLengthException("arrayContainsExample only takes 2 arguments: String,  String");
            }
            // 1. 检查是否接收到正确的参数类型
            ObjectInspector a = objectInspectors[0];
            ObjectInspector b = objectInspectors[1];
            if (!(a instanceof DoubleObjectInspector) || !(b instanceof DoubleObjectInspector)) {
                throw new UDFArgumentException("first argument must be a double, second argument must be a double");
            }
    
            this.doubleObjectInspector01 = (DoubleObjectInspector) a;
            this.doubleObjectInspector02 = (DoubleObjectInspector) b;
    
            return PrimitiveObjectInspectorFactory.javaStringObjectInspector;
        }
    
        public Object evaluate(DeferredObject[] deferredObjects) throws HiveException {
    
            Double lat = this.doubleObjectInspector01.get(deferredObjects[0].get());
            Double lng = this.doubleObjectInspector02.get(deferredObjects[1].get());
    
            if (lat == null || lng == null) {
                return new String("");
            }
    
            return new GeoHash(lat, lng).getGeoHashBase32();
        }
    
        public String getDisplayString(String[] strings) {
            if (strings.length == 2) {
                return "geo_hash(" + strings[0] + ", " + strings[1] + ")";
            } else {
                return "传入的参数不对...";
            }
        }
    }
    

    2,打包成jar 包

    本文中打包成hiveudf-1.0.0.jar

    3,同样的上传到hdfs 路径中

    [root@master /opt]# hadoop fs -mkdir -p /user/hive/udf
    18/06/07 09:41:09 WARN util.NativeCodeLoader: Unable 
    to load native-hadoop library for your platform... using builtin-java classes where applicable
    [root@master /opt]# hadoop fs -put hiveudf-1.0.0.jar  /user/hive/udf
    18/06/07 09:41:24 WARN util.NativeCodeLoader: Unable to 
    load native-hadoop library for your platform... using builtin-java classes where applicable
    [root@master /opt]# hadoop fs -ls /user/hive/udf 
    18/06/07 09:41:47 WARN util.NativeCodeLoader: Unable to load native-hadoop library
     for your platform... using builtin-java classes where applicable
    Found 1 items
    -rw-r--r--   3 root supergroup       8020 2018-06-07 09:41 /user/hive/udf/hiveudf-1.0.0.jar
    [root@master /opt]#
    

    4, 创建自定义函数。

    hive> list jars;
    /tmp/3794df3a-687a-45dd-93d3-d6a712c43e85_resources/hiveudf-1.0.0.jar
    hive> delete jar /tmp/3794df3a-687a-45dd-93d3-d6a712c43e85_resources/hiveudf-1.0.0.jar
        > ;
    Deleted [/tmp/3794df3a-687a-45dd-93d3-d6a712c43e85_resources/hiveudf-1.0.0.jar] from class path
    hive> add jar hdfs:///user/hive/udf/hiveudf-1.0.0.jar;
    Added [/tmp/3794df3a-687a-45dd-93d3-d6a712c43e85_resources/hiveudf-1.0.0.jar] to class path
    Added resources: [hdfs:///user/hive/udf/hiveudf-1.0.0.jar]
    hive> create temporary function geohash as 'com.zbra.udf.GeoUdf';
    OK
    Time taken: 0.145 seconds
    

    5, 使用如下:

    hive> select geohash(12.0d, 123.0d);
    OK
    wdpkqbtc
    Time taken: 0.8 seconds, Fetched: 1 row(s)
    hive> select geohash(cast('12' as Double), cast('123' as Double));
    OK
    wdpkqbtc
    Time taken: 0.733 seconds, Fetched: 1 row(s)
    hive> 
    
  • 相关阅读:
    Open source cryptocurrency exchange
    Salted Password Hashing
    95. Unique Binary Search Trees II
    714. Best Time to Buy and Sell Stock with Transaction Fee
    680. Valid Palindrome II
    Java compiler level does not match the version of the installed Java project facet.
    eclipse自动编译
    Exception in thread "main" java.lang.StackOverflowError(栈溢出)
    博客背景美化——动态雪花飘落
    java九九乘法表
  • 原文地址:https://www.cnblogs.com/unnunique/p/9362103.html
Copyright © 2011-2022 走看看