解决Flume向Kafka多分区写数据

zoukankan html css js c++ java

解决Flume向Kafka多分区写数据

1 问题背景

Flume向kafka发布数据时，发现kafka接收到的数据总是在一个partition中，而我们希望发布来的数据在所有的partition平均分布

2 解决办法

Flume的官方文档是这么说的：

Kafka Sink uses the topic and key properties from the FlumeEvent headers to send events to Kafka. If topic exists in the headers, the event will be sent to that specific topic, overriding the topic configured for the Sink. If key exists in the headers, the key will used by Kafka to partition the data between the topic partitions. Events with same key will be sent to the same partition. If the key is null, events will be sent to random partitions.

其实以上文档中说的很清楚了，kafka-sink是从header里的key参数来确定将数据发到kafka的哪个分区中。如果为null，那么就会随机发布至分区中。但我测试的结果是flume发布的数据会发布到一个分区中的。

所以，我们需要向header中写上随机的key，然后数据才会真正的向kafka分区进行随机发布。

我们的办法是，向flume添加拦截器，官方文档说有一个UUID Interceptor，会为每个event的head添加一个随机唯一的key。其实我们直接用这个即可。

在flume添加的配置文件如下：

hiveview.sources.tailSource.interceptors = i2

hiveview.sources.tailSource.interceptors.i2.type=org.apache.flume.sink.solr.morphline.UUIDInterceptor$Builder

hiveview.sources.tailSource.interceptors.i2.headerName=key

hiveview.sources.tailSource.interceptors.i2.preserveExisting=false

查看全文

相关阅读:
js 刷新页面
 移动端web资源
 [LeetCode][JavaScript]Serialize and Deserialize Binary Tree
[LeetCode][JavaScript]Bulls and Cows
[LeetCode][JavaScript]Remove Duplicates from Sorted List II
[LeetCode][JavaScript]Remove Duplicates from Sorted List
[LeetCode][JavaScript]Remove Duplicates from Sorted Array
[LeetCode][JavaScript]Remove Duplicates from Sorted Array II
[LeetCode][JavaScript]Nim Game
[LeetCode][JavaScript]Path Sum II

原文地址：https://www.cnblogs.com/hark0623/p/4710804.html

解决Flume向Kafka多分区写数据

1 问题背景

2 解决办法