Flink 函数:
org.apache.flink.api.common.functions
org.apache.flink.api.java.functions
org.apache.flink.streaming.api.functions
group
reduce
combine
public interface FilterFunction<T> extends Function, Serializable {
boolean filter(T var1) throws Exception;
}
public interface MapFunction<T, O> extends Function, Serializable {
O map(T var1) throws Exception;
}
public interface FlatMapFunction<T, O> extends Function, Serializable {
void flatMap(T var1, Collector<O> var2) throws Exception;
}
2.聚合
Flink在reduce之前大量使用了Combine操作。
Combine可以理解为是在map端的reduce的操作,对单个map任务的输出结果数据进行合并的操作
combine是对一个map的
map函数操作所产生的键值对会作为combine函数的输入,
经combine函数处理后再送到reduce函数进行处理,
减少了写入磁盘的数据量,同时也减少了网络中键值对的传输量
---------------------------------------------------------------
public interface CombineFunction<IN, OUT> extends Function, Serializable {
OUT combine(Iterable<IN> var1) throws Exception;
}
public interface ReduceFunction<T> extends Function, Serializable {
T reduce(T var1, T var2) throws Exception;
}
public interface AggregateFunction<IN, ACC, OUT> extends Function, Serializable {
ACC createAccumulator();
ACC add(IN var1, ACC var2);
ACC merge(ACC var1, ACC var2);
OUT getResult(ACC var1);
}
----------------------------------------------------------------------
3.关联
--Join操作DataStream时只能用在window中,和cogroup操作一样
------关联---------这种 JoinFunction 和 FlatJoinFunction 与 MapFunction 和 FlatMapFunction 的关系类似------------
public interface JoinFunction<IN1, IN2, OUT> extends Function, Serializable {
OUT join(IN1 var1, IN2 var2) throws Exception;
}
public interface FlatJoinFunction<IN1, IN2, OUT> extends Function, Serializable {
void join(IN1 var1, IN2 var2, Collector<OUT> var3) throws Exception;
}
public interface CrossFunction<IN1, IN2, OUT> extends Function, Serializable {
OUT cross(IN1 var1, IN2 var2) throws Exception;
}
-- CoGroup 两个数据流/集合按照key进行group,然后将相同key的数据进行处理,
--- 但是它和join操作稍有区别,它在一个流/数据集中没有找到与另一个匹配的数据还是会输出。
-- 如果是 DataStream中 则需要在window中使用
--- 如果在Dataser中
public interface CoGroupFunction<IN1, IN2, O> extends Function, Serializable {
void coGroup(Iterable<IN1> var1, Iterable<IN2> var2, Collector<O> var3) throws Exception;
}
4.分组
-- 分组reduce,即 GroupReduce GroupReduceFunction
-- 这与Reduce的区别在于用户定义的函数会立即获得整个组。在组的所有元素上使用Iterable调用该函数,
并且可以返回任意数量的结果元素
public interface GroupReduceFunction<T, O> extends Function, Serializable {
void reduce(Iterable<T> var1, Collector<O> var2) throws Exception;
}
-- 使group-reduce函数可组合,它必须实现GroupCombineFunction接口
-- GroupCombine转换是可组合GroupReduceFunction中组合步骤的通用形式。
它在某种意义上被概括为允许将输入类型I组合到任意输出类型O.
-- 相反,GroupReduce中的组合步骤仅允许从输入类型I到输出类型I的组合。
这是因为reduce步骤中, GroupReduceFunction 期望输入类型为I.
public interface GroupCombineFunction<IN, OUT> extends Function, Serializable {
void combine(Iterable<IN> var1, Collector<OUT> var2) throws Exception;
}
5.分区
-----------------------------------------------
public interface Partitioner<K> extends Serializable, Function {
int partition(K var1, int var2);
}
public interface MapPartitionFunction<T, O> extends Function, Serializable {
void mapPartition(Iterable<T> var1, Collector<O> var2) throws Exception;
}
6.其他
其他
org.apache.flink.streaming.api.functions.co;
public interface CoMapFunction<IN1, IN2, OUT> extends Function, Serializable {
OUT map1(IN1 var1) throws Exception;
OUT map2(IN2 var1) throws Exception;
}
public interface CoFlatMapFunction<IN1, IN2, OUT> extends Function, Serializable {
void flatMap1(IN1 var1, Collector<OUT> var2) throws Exception;
void flatMap2(IN2 var1, Collector<OUT> var2) throws Exception;
}
sink:org.apache.flink.streaming.api.functions.sink;
public interface SinkFunction<IN> extends Function, Serializable {
source:
public interface SourceFunction<T> extends Function, Serializable {
public interface ParallelSourceFunction<OUT> extends SourceFunction<OUT> {}
org.apache.flink.streaming.api.functions.windowing
public interface AllWindowFunction<IN, OUT, W extends Window> extends Function, Serializable {
void apply(W var1, Iterable<IN> var2, Collector<OUT> var3) throws Exception;
}
org.apache.flink.streaming.api.functions;
public interface TimestampAssigner<T> extends Function {
org.apache.flink.api.java.functions
public interface KeySelector<IN, KEY> extends Function, Serializable
Flink 中的function 实现以及 rich functions继承
function
1.Implementing an interface
2.Anonymous classes
3.Java 8 Lambdas
rich functions
RichFunction是接口,然后AbstractRichFunction实现了 RichFunction的抽象类,其余对应的函数继承抽象类
public interface RichFunction extends Function
public abstract class AbstractRichFunction implements RichFunction, Serializable
All transformations that require a user-defined function can instead take as argument a rich function
@Public
public abstract class RichMapFunction<IN, OUT> extends AbstractRichFunction implements MapFunction<IN, OUT> {
private static final long serialVersionUID = 1L;
public RichMapFunction() {}
public abstract OUT map(IN var1) throws Exception;
}
1.class MyMapFunction extends RichMapFunction and pass the function as usual to a map transformation
2.Rich functions can also be defined as an anonymous class:
特点:有四个方法
five methods: open, close, setRuntimeContext, and getRuntimeContext getIterationRuntimeContext
These are useful for
parameterizing the function (see Passing Parameters to Functions),
creating and finalizing local state,
accessing broadcast variables (see Broadcast Variables),
and for accessing runtime information such as accumulators and counters (see Accumulators and Counters),
and information on iterations (see Iterations).
Spark SQL UDAF与Flink的 aggreagte
Spark SQL的UDAF UserDefinedAggregateFunction
inputSchema bufferSchema dataType
initialize update merge evaluate
Flink:
AggregateFunction
<IN, ACC, OUT>
createAccumulator, add,merge;getResult;
}
Spark functions
org.apache.spark.api.java.function
public interface Function<T1, R> extends Serializable { R call(T1 var1) throws Exception;}
public interface Function2<T1, T2, R> extends Serializable { R call(T1 v1, T2 v2) throws Exception;}
public interface PairFunction<T, K, V> extends Serializable { Tuple2<K, V> call(T t) throws Exception;}
public interface FilterFunction<T> extends Serializable {boolean call(T value) throws Exception;}
public interface ReduceFunction<T> extends Serializable { T call(T v1, T v2) throws Exception;}
public interface FlatMapFunction<T, R> extends Serializable {Iterator<R> call(T t) throws Exception;}