flume 配置文件
# Define a memory channel called ch1 on agent1
agent1.channels.ch1.type = memory
agent1.channels.ch1.capacity = 100000
agent1.channels.ch1.transactionCapacity = 100000
agent1.channels.ch1.keep-alive = 30
# Define an Avro source called avro-source1 on agent1 and tell it
# to bind to 0.0.0.0:41414. Connect it to channel ch1.
#agent1.sources.avro-source1.channels = ch1
#agent1.sources.avro-source1.type = avro
#agent1.sources.avro-source1.bind = 0.0.0.0
#agent1.sources.avro-source1.port = 41414
#agent1.sources.avro-source1.threads = 5
#define source monitor a file
agent1.sources.avro-source1.type = exec
agent1.sources.avro-source1.shell = /bin/bash -c
agent1.sources.avro-source1.command = tail -n +0 -F node3:/home/d2
agent1.sources.avro-source1.channels = ch1
agent1.sources.avro-source1.threads = 5
# Define a logger sink that simply logs all events it receives
# and connect it to the other end of the same channel.
agent1.sinks.log-sink1.channel = ch1
agent1.sinks.log-sink1.type = hdfs
agent1.sinks.log-sink1.hdfs.path = hdfs://node2:8020/flume
agent1.sinks.log-sink1.hdfs.writeFormat = Text
agent1.sinks.log-sink1.hdfs.fileType = DataStream
agent1.sinks.log-sink1.hdfs.rollInterval = 0
agent1.sinks.log-sink1.hdfs.rollSize = 1000000
agent1.sinks.log-sink1.hdfs.rollCount = 0
agent1.sinks.log-sink1.hdfs.batchSize = 1000
agent1.sinks.log-sink1.hdfs.txnEventMax = 1000
agent1.sinks.log-sink1.hdfs.callTimeout = 60000
agent1.sinks.log-sink1.hdfs.appendTimeout = 60000
# Finally, now that we've defined all of our components, tell
# agent1 which ones we want to activate.
agent1.channels = ch1
agent1.sources = avro-source1
agent1.sinks = log-sink1
日志错误信息:
2016-01-08 16:27:12,370 INFO org.apache.flume.conf.FlumeConfiguration: Processing:log-sink1 2016-01-08 16:27:12,370 INFO org.apache.flume.conf.FlumeConfiguration: Processing:log-sink1 2016-01-08 16:27:12,370 INFO org.apache.flume.conf.FlumeConfiguration: Processing:log-sink1 2016-01-08 16:27:12,370 INFO org.apache.flume.conf.FlumeConfiguration: Processing:log-sink1 2016-01-08 16:27:12,371 INFO org.apache.flume.conf.FlumeConfiguration: Processing:log-sink1 2016-01-08 16:27:12,402 INFO org.apache.flume.conf.FlumeConfiguration: Post-validation flume configuration contains configuration for agents: [agent1] 2016-01-08 16:27:12,402 WARN org.apache.flume.node.AbstractConfigurationProvider: No configuration found for this host:tier1 2016-01-08 16:27:12,412 INFO org.apache.flume.node.Application: Starting new configuration:{ sourceRunners:{} sinkRunners:{} channels:{} } 2016-01-08 16:27:12,460 INFO org.mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog 2016-01-08 16:27:12,514 INFO org.mortbay.log: jetty-6.1.26.cloudera.4 2016-01-08 16:27:12,538 INFO org.mortbay.log: Started SelectChannelConnector@0.0.0.0:41414 2016-01-08 16:27:12,568 WARN com.cloudera.cmf.event.publish.EventStorePublisherWithRetry:
Failed to publish event: SimpleEvent{attributes={ROLE_TYPE=[AGENT],
CATEGORY=[LOG_MESSAGE], ROLE=[flume-AGENT-7602b25bae87ced3f6367e647e645100], SEVERITY=[IMPORTANT],
SERVICE=[flume], HOST_IDS=[cd0fc0b9-f0c1-4419-a33b-9020aab77a11],
SERVICE_TYPE=[FLUME], LOG_LEVEL=[WARN], HOSTS=[node4], EVENTCODE=[EV_LOG_EVENT]},
content=No configuration found for this host:tier1, timestamp=1452241632402}
flume 配置步骤:
第一: agent
source 配置:
channel 配置:
skin 配置:
配置方式2
agent1.sources=source1 agent1.sinks=sink1 agent1.channels=channel1 #Spooling Directory是监控指定文件夹中新文件的变化,一旦新文件出现,就解析该文件内容,然后写入到channle。写入完成后,标记该文件已完成或者删除该文件。 #配置source1 agent1.sources.source1.type=spooldir agent1.sources.source1.spoolDir=/home/flumeData agent1.sources.source1.channels=channel1 agent1.sources.source1.fileHeader = false agent1.sources.source1.interceptors = i1 agent1.sources.source1.interceptors.i1.type = timestamp #配置channel1 agent1.channels.channel1.type=file agent1.channels.channel1.checkpointDir=/home/flumeData/hdfss agent1.channels.channel1.dataDirs=/home/flumeData/flumetemp #配置sink1 agent1.sinks.sink1.type=hdfs agent1.sinks.sink1.hdfs.path=hdfs://node8:8020/flume agent1.sinks.sink1.hdfs.fileType=DataStream agent1.sinks.sink1.hdfs.writeFormat=TEXT agent1.sinks.sink1.hdfs.rollInterval=1 agent1.sinks.sink1.channel=channel1 agent1.sinks.sink1.hdfs.filePrefix=%Y-%m-%d
错误提示:
... 12 more 2016-01-19 15:43:35,975 ERROR org.apache.flume.lifecycle.LifecycleSupervisor: Unable to start EventDrivenSourceRunner: { source:Spool Directory source source1: { spoolDir: /home/flumeData/hdfss } } - Exception follows. org.apache.flume.FlumeException: Unable to read and modify files in the spooling directory: /home/flumeData/hdfss at org.apache.flume.client.avro.ReliableSpoolingFileEventReader.<init>(ReliableSpoolingFileEventReader.java:162) at org.apache.flume.client.avro.ReliableSpoolingFileEventReader.<init>(ReliableSpoolingFileEventReader.java:77) at org.apache.flume.client.avro.ReliableSpoolingFileEventReader$Builder.build(ReliableSpoolingFileEventReader.java:671) at org.apache.flume.source.SpoolDirectorySource.start(SpoolDirectorySource.java:85) at org.apache.flume.source.EventDrivenSourceRunner.start(EventDrivenSourceRunner.java:44) at org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:251) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: Permission denied at java.io.UnixFileSystem.createFileExclusively(Native Method) at java.io.File.createTempFile(File.java:2001) at org.apache.flume.client.avro.ReliableSpoolingFileEventReader.<init>(ReliableSpoolingFileEventReader.java:152)
提示权限你问题:
org.apache.hadoop.security.AccessControlException: Permission denied: user=flume, access=WRITE, inode="/flume":hdfs.flume:supergroup:drwxr-xr-x
解决办法:
参考:
hdfs dfs -
chgrp
-R [GROOP] Path
hadoop chown -R hdfs.flume /flume
hdfs dfs -chgrp -R flume /flume