zoukankan      html  css  js  c++  java
  • Hadoop3集群搭建之——hive添加自定义函数UDTF (一行输入,多行输出)

    上篇:

    Hadoop3集群搭建之——虚拟机安装

    Hadoop3集群搭建之——安装hadoop,配置环境

    Hadoop3集群搭建之——配置ntp服务

    Hadoop3集群搭建之——hive安装

    Hadoop3集群搭建之——hbase安装及简单操作

    Hadoop3集群搭建之——hive添加自定义函数UDF

    Hadoop3集群搭建之——hive添加自定义函数UDTF

    上篇中,udtf函数,只有为一行输入,一行输出。udtf是可以一行输入,多行输出的。

    简述下需求:  

    输入开始时间,结束时间,返回每个小时的时长

    直接上代码:

    package com.venn.udtf;
    
    import org.apache.hadoop.hive.ql.exec.UDFArgumentException;
    import org.apache.hadoop.hive.ql.exec.UDFArgumentLengthException;
    import org.apache.hadoop.hive.ql.metadata.HiveException;
    import org.apache.hadoop.hive.ql.udf.generic.GenericUDTF;
    import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
    import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory;
    import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector;
    import org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory;
    
    import java.util.ArrayList;
    
    /**
     * Created by venn on 5/20/2018.
     * SplitHour : split hour
     */
    public class SplitHour extends GenericUDTF {
    
        /**
         * add the column name
         * @param args
         * @return
         * @throws UDFArgumentException
         */
        @Override
        public StructObjectInspector initialize(ObjectInspector[] args) throws UDFArgumentException {
            if (args.length != 1) {
                throw new UDFArgumentLengthException("ExplodeMap takes only one argument");
            }
            if (args[0].getCategory() != ObjectInspector.Category.PRIMITIVE) {
                throw new UDFArgumentException("ExplodeMap takes string as a parameter");
            }
    
            ArrayList<String> fieldNames = new ArrayList<String>();
            ArrayList<ObjectInspector> fieldOIs = new ArrayList<ObjectInspector>();
            fieldNames.add("begintime");
            fieldOIs.add(PrimitiveObjectInspectorFactory.javaStringObjectInspector);
            fieldNames.add("endtime");
            fieldOIs.add(PrimitiveObjectInspectorFactory.javaStringObjectInspector);
            fieldNames.add("hour");
            fieldOIs.add(PrimitiveObjectInspectorFactory.javaStringObjectInspector);
            fieldNames.add("seconds");
            fieldOIs.add(PrimitiveObjectInspectorFactory.javaStringObjectInspector);
    
    
            return ObjectInspectorFactory.getStandardStructObjectInspector(fieldNames, fieldOIs);
        }
    
        /**
         * process the column
         * @param objects
         * @throws HiveException
         */
        public void process(Object[] objects) throws HiveException {
    
    
            String [] input = objects[0].toString().split(",");
            // 2018-06-06 10:25:35
            String beginTime = input[0];
            String endTime = input[1];
    
            String[] result = new String[4];
            result[0] = beginTime;
            result[1] = endTime;
    
            // begintime
            int bhour = Integer.parseInt(beginTime.substring(11, 13));
            int bmin = Integer.parseInt(beginTime.substring(14, 16));
            int bsecond = Integer.parseInt(beginTime.substring(17, 19));
            // endtime
            int ehour = Integer.parseInt(endTime.substring(11, 13));
            int emin = Integer.parseInt(endTime.substring(14, 16));
            int esecond = Integer.parseInt(endTime.substring(17, 19));
    
            // 1.if begin hour equal end hour, second is : (emin - bmin) * 60 + (esecond - bsecond)
            if (bhour == ehour) {
                result[2] = String.valueOf(bhour);
                result[3] = String.valueOf((emin - bmin) * 60 + (esecond - bsecond));
                forward(result);
                return;
            }
    
            boolean flag = true;
            //TODO 待优化,先输出第一个循环的时长,再循环后面的就不用判断
            while (bhour != ehour) {
                result[2] = String.valueOf(bhour);
    
                if(flag){
                    flag = false;
                // 2. if begintime hour != endtime, the first hour, second is : 3600 - bmin * 60 - bsecond
                    result[3] = String.valueOf(3600 - bmin * 60 - bsecond);
                }else {
                    // 3. next hour is 3600
                    result[3] = String.valueOf(3600);
                }
                bhour += 1;
                // 输出到hive
                forward(result);
            }
    
            result[2] = String.valueOf(bhour);
            // 4. the end hour is : emin  * 60 + esecond
            result[3] = String.valueOf( emin  * 60 + esecond);
            forward(result);
    
        }
    
        public void close() throws HiveException {
    
        }
    
    }

    udtf 函数介绍参加上篇

    使用方式见上篇

    Hadoop3集群搭建之——hive添加自定义函数UDTF

    样例:

    hive> select split_hour( concat(begintime,',',endtime)) from viewlog where log_date=20180401 limit 10;
    OK
    begintime    endtime    hour    seconds
    2018-04-01 10:26:14    2018-04-01 10:26:21    10    7
    2018-04-01 07:21:47    2018-04-01 07:22:23    7    36
    2018-04-01 15:18:08    2018-04-01 15:18:11    15    3
    2018-04-01 18:05:13    2018-04-01 18:05:28    18    15
    2018-04-01 07:18:34    2018-04-01 07:18:52    7    18
    2018-04-01 23:28:32    2018-04-01 23:29:44    23    72
    2018-04-01 06:34:11    2018-04-01 06:34:17    6    6
    2018-04-01 14:02:40    2018-04-01 14:03:33    14    53
    2018-04-01 17:30:23    2018-04-01 17:30:26    17    3
    2018-04-01 12:15:07    2018-04-01 12:15:11    12    4
    2018-04-01 06:53:40    2018-04-01 07:02:09    6    380
    2018-04-01 06:53:40    2018-04-01 07:02:09    7    129
    Time taken: 2.238 seconds, Fetched: 12 row(s)

    搞定

  • 相关阅读:
    js字符串截取函数slice()、substring()、substr()
    js获取字符串最后一位方法
    支持xhr浏览器:超时设定、加载事件、进度事件
    深入理解ajax系列第一篇——XHR对象
    MySQL命令行操作
    nodejs中mysql用法
    大衍数列
    牌型种数
    加法变乘法
    三羊献瑞
  • 原文地址:https://www.cnblogs.com/Springmoon-venn/p/9286670.html
Copyright © 2011-2022 走看看