flume 的source 、channel和sink 多种组合 2017年01月11日 22:51:15 阅读数:2744 乐高积木flume flume 有三大组件source 、channel和sink,各个组件之间都可以相互组合使用,各组件间耦合度低。使用灵活,方便。 1.多sink channel 的内容只输出一次,同一个event 如果sink1 输出,sink2 不输出;如果sink1 输出,sink1 不输出。 最终 sink1+sink2=channel 中的数据。 配置文件如下: a1.sources = r1 a1.sinks = k1 k2 a1.channels = c1 # Describe/configure the source a1.sources.r1.type = exec a1.sources.r1.shell = /bin/bash -c a1.sources.r1.channels = c1 a1.sources.r1.command = tail -F /opt/apps/logs/tail4.log # channel a1.channels.c1.type = memory a1.channels.c1.capacity = 1000 a1.channels.c1.transactionCapacity = 100 #sink1 a1.sinks.k1.channel = c1 a1.sinks.k1.type = org.apache.flume.sink.kafka.KafkaSink a1.sinks.k1.kafka.topic = mytopic a1.sinks.k1.kafka.bootstrap.servers = localhost:9092 a1.sinks.k1.kafka.flumeBatchSize = 20 a1.sinks.k1.kafka.producer.acks = 1 a1.sinks.k1.kafka.producer.linger.ms = 1 a1.sinks.ki.kafka.producer.compression.type = snappy #sink2 a1.sinks.k2.type = file_roll a1.sinks.k2.channel = c1 #a1.sinks.k2.sink.rollInterval=0 a1.sinks.k2.sink.directory = /opt/apps/tmp 2.多 channel 多sink ,每个sink 输出内容一致 (memory channel 用于kafka操作,实时性高,file channel 用于 sink file 数据安全性高) (多channel 单 sink 的情况没有举例,个人感觉用处不广泛。) 配置文件如下: a1.sources = r1 a1.sinks = k1 k2 a1.channels = c1 c2 # Describe/configure the source a1.sources.r1.type = exec a1.sources.r1.shell = /bin/bash -c a1.sources.r1.channels = c1 c2 a1.sources.r1.command = tail -F /opt/apps/logs/tail4.log #多个channel 的数据相同 a1.sources.r1.selector.type=replicating # channel1 a1.channels.c1.type = memory a1.channels.c1.capacity = 1000 a1.channels.c1.transactionCapacity = 100 #channel2 a1.channels.c2.type = file a1.channels.c2.checkpointDir = /opt/apps/flume-1.7.0/checkpoint a1.channels.c2.dataDirs = /opt/apps/flume-1.7.0/data #sink1 a1.sinks.k1.channel = c1 a1.sinks.k1.type = org.apache.flume.sink.kafka.KafkaSink a1.sinks.k1.kafka.topic = mytopic a1.sinks.k1.kafka.bootstrap.servers = localhost:9092 a1.sinks.k1.kafka.flumeBatchSize = 20 a1.sinks.k1.kafka.producer.acks = 1 a1.sinks.k1.kafka.producer.linger.ms = 1 a1.sinks.ki.kafka.producer.compression.type = snappy #sink2 a1.sinks.k2.type = file_roll a1.sinks.k2.channel = c2 #a1.sinks.k2.sink.rollInterval=0 a1.sinks.k2.sink.directory = /opt/apps/tmp
3. 多source 单 channel 单 sink 多个source 可以读取多种信息放在一个channel 然后输出到同一个地方 配置文件如下: a1.sources = r1 r2 a1.sinks = k1 a1.channels = c1 # source1 a1.sources.r1.type = exec a1.sources.r1.shell = /bin/bash -c a1.sources.r1.channels = c1 a1.sources.r1.command = tail -F /opt/apps/logs/tail4.log # source2 a1.sources.r2.type = exec a1.sources.r2.shell = /bin/bash -c a1.sources.r2.channels = c1 a1.sources.r2.command = tail -F /opt/apps/logs/tail2.log # channel1 in memory a1.channels.c1.type = memory a1.channels.c1.capacity = 1000 a1.channels.c1.transactionCapacity = 100 #sink1 a1.sinks.k1.channel = c1 a1.sinks.k1.type = org.apache.flume.sink.kafka.KafkaSink a1.sinks.k1.kafka.topic = mytopic a1.sinks.k1.kafka.bootstrap.servers = localhost:9092 a1.sinks.k1.kafka.flumeBatchSize = 20 a1.sinks.k1.kafka.producer.acks = 1 a1.sinks.k1.kafka.producer.linger.ms = 1 a1.sinks.ki.kafka.producer.compression.type = snappy