zoukankan      html  css  js  c++  java
  • hive--UDF、UDAF

    1、UDF
    package com.example.hive.udf;
    
    import org.apache.hadoop.hive.ql.exec.UDF;
    import org.apache.hadoop.io.Text;
    
    public final class Lower extends UDF {
      public Text evaluate(final Text s) {
        if (s == null) { return null; }
        return new Text(s.toString().toLowerCase());
      }
    }
    View Code

    add jar my_jar.jar; 

    create temporary function my_lower as 'com.example.hive.udf.Lower';  

    主要描述了实现一个udf的过程,首先自然是实现一个UDF函数,然后编译为jar并加入到hive的classpath中,最后创建一个临时变量名字让hive中调用。

    2、UDAF
    package org.apache.hadoop.hive.contrib.udaf.example;
    
    import org.apache.hadoop.hive.ql.exec.UDAF;
    import org.apache.hadoop.hive.ql.exec.UDAFEvaluator;
    
    /**
     * This is a simple UDAF that calculates average.
     * 
     * It should be very easy to follow and can be used as an example for writing
     * new UDAFs.
     * 
     * Note that Hive internally uses a different mechanism (called GenericUDAF) to
     * implement built-in aggregation functions, which are harder to program but
     * more efficient.
     * 
     */
    public final class UDAFExampleAvg extends UDAF {
    
      /**
       * The internal state of an aggregation for average.
       * 
       * Note that this is only needed if the internal state cannot be represented
       * by a primitive.
       * 
       * The internal state can also contains fields with types like
       * ArrayList<String> and HashMap<String,Double> if needed.
       */
      public static class UDAFAvgState {
        private long mCount;
        private double mSum;
      }
    
      /**
       * The actual class for doing the aggregation. Hive will automatically look
       * for all internal classes of the UDAF that implements UDAFEvaluator.
       */
      public static class UDAFExampleAvgEvaluator implements UDAFEvaluator {
    
        UDAFAvgState state;
    
        public UDAFExampleAvgEvaluator() {
          super();
          state = new UDAFAvgState();
          init();
        }
    
        /**
         * Reset the state of the aggregation.
         */
        public void init() {
          state.mSum = 0;
          state.mCount = 0;
        }
    
        /**
         * Iterate through one row of original data.
         * 
         * The number and type of arguments need to the same as we call this UDAF
         * from Hive command line.
         * 
         * This function should always return true.
         */
        public boolean iterate(Double o) {
          if (o != null) {
            state.mSum += o;
            state.mCount++;
          }
          return true;
        }
    
        /**
         * Terminate a partial aggregation and return the state. If the state is a
         * primitive, just return primitive Java classes like Integer or String.
         */
        public UDAFAvgState terminatePartial() {
          // This is SQL standard - average of zero items should be null.
          return state.mCount == 0 ? null : state;
        }
    
        /**
         * Merge with a partial aggregation.
         * 
         * This function should always have a single argument which has the same
         * type as the return value of terminatePartial().
         */
        public boolean merge(UDAFAvgState o) {
          if (o != null) {
            state.mSum += o.mSum;
            state.mCount += o.mCount;
          }
          return true;
        }
    
        /**
         * Terminates the aggregation and return the final result.
         */
        public Double terminate() {
          // This is SQL standard - average of zero items should be null.
          return state.mCount == 0 ? null : Double.valueOf(state.mSum
              / state.mCount);
        }
      }
    
      private UDAFExampleAvg() {
        // prevent instantiation
      }
    
    }
    View Code

    关于UDAF开发注意点:

    1.需要import org.apache.hadoop.hive.ql.exec.UDAF以及org.apache.hadoop.hive.ql.exec.UDAFEvaluator,这两个包都是必须的

    2.函数类需要继承UDAF类,内部类Evaluator实现UDAFEvaluator接口

    3.Evaluator需要实现 init、iterate、terminatePartial、merge、terminate这几个函数

        1)init函数类似于构造函数,用于UDAF的初始化

        2)iterate接收传入的参数,并进行内部的轮转。其返回类型为boolean

        3)terminatePartial无参数,其为iterate函数轮转结束后,返回乱转数据,iterate和terminatePartial类似于hadoop的Combiner

        4)merge接收terminatePartial的返回结果,进行数据merge操作,其返回类型为boolean

        5)terminate返回最终的聚集函数结果

  • 相关阅读:
    java--Compara比较字符串排序(引用类型都可以)
    java---递归遍历文件
    java 增强for循坏遍历set 集合嵌套
    java-手写实现map
    ajax请求拿到多条数据拼接显示在页面中
    ajax取到数据后如何拿到data.data中的属性值
    .NET CORE IIS 500.21
    ConfigurationErrorsException: Unrecognized configuration section system.data.
    关于ajax中return并不能作为方法的返回值
    .net core 的跨域
  • 原文地址:https://www.cnblogs.com/liutoutou/p/3546693.html
Copyright © 2011-2022 走看看