zoukankan      html  css  js  c++  java
  • Data Collection with Apache Flume(二)

          今天继续讨论几个agent的配置。

          第一个agent是从终端捕获特定命令执行的输出结果,并将文件输出到特定目录。先看一下配置的代码:      

    agent2.sources = execsource      //指定为从命令获取输出的source
    agent2.sinks = filesink          //输出到文件的sink
    agent2.channels = filechannel     //输出到文件的channel
    
    agent2.sources.execsource.type = exec  //类型
    agent2.sources.execsource.command = cat /home/leung/message   //指定命令
    
    agent2.sinks.filesink.type = FILE_ROLL
    agent2.sinks.filesink.sink.directory = /home/leung/flume/files     //输出目录
    agent2.sinks.filesink.sink.rollInterval = 0
    
    agent2.channels.filechannel.type = file
    agent2.channels.filechannel.checkpointDir = /home/leung/flume/fc/checkpoint //检查点
    agent2.channels.filechannel.dataDirs = /home/leung/flume/fc/data  //channel的数据目录
    
    agent2.sources.execsource.channels = filechannel
    agent2.sinks.filesink.channel = filechannel
    

       OK,启动agent2,然后查看结果。

          

          结果如下图。可以看到,执行 cat /home/leung/message命令之后,输出的结果与files目录中的文件内容是一致的,证明已经成功写入文件。

          

          下一个agent是从网络端口 获取数据然后写到Hadoop集群的HDFS中。先看看配置代码:

    agent4.sources = netsource 
    agent4.sinks = hdfssink //HDFS sink
    agent4.channels = memorychannel
    
    agent4.sources.netsource.type = netcat
    agent4.sources.netsource.bind = localhost
    agent4.sources.netsource.port = 3000
    
    
    agent4.sinks.hdfssink.type = hdfs
    agent4.sinks.hdfssink.hdfs.path = /flume  //写出到HDFS上的文件目录,不需要提前创建
    agent4.sinks.hdfssink.hdfs.filePrefix = log  //指定写出文件的文件名前缀
    agent4.sinks.hdfssink.hdfs.rollInterval = 0
    agent4.sinks.hdfssink.hdfs.rollCount = 3
    agent4.sinks.hdfssink.hdfs.fileType = DataStream
    
    agent4.channels.memorychannel.type = memory
    agent4.channels.memorychannel.capacity = 1000
    agent4.channels.memorychannel.transactionCapacity = 100
    
    agent4.sources.netsource.channels = memorychannel
    agent4.sinks.hdfssink.channel = memorychannel
    

        下面启动agent4以及查看一下结果。 

         

           下面查看一下结果。发现在HDFS中已经新建了一个flume文件夹,并且已经写入了指定的内容。

          

          接着我们为文件夹名加一个时间戳。详细看如下配置代码。

    agent5.sources = netsource
    agent5.sinks = hdfssink
    agent5.channels = memorychannel
    
    agent5.sources.netsource.type = netcat
    agent5.sources.netsource.bind = localhost
    agent5.sources.netsource.port = 3000
    agent5.sources.netsource.interceptors = ts
    agent5.sources.netsource.interceptors.ts.type = org.apache.flume.interceptor.TimestampInterceptor$Builder //引用这个类方法添加时间戳
    
    
    agent5.sinks.hdfssink.type = hdfs
    agent5.sinks.hdfssink.hdfs.path = /flume-%Y-%m-%d //定义文件夹名格式
    agent5.sinks.hdfssink.hdfs.filePrefix = log-
    agent5.sinks.hdfssink.hdfs.rollInterval = 0
    agent5.sinks.hdfssink.hdfs.rollCount = 3
    agent5.sinks.hdfssink.hdfs.fileType = DataStream
    
    agent5.channels.memorychannel.type = memory
    agent5.channels.memorychannel.capacity = 1000
    agent5.channels.memorychannel.transactionCapacity = 100
    
    agent5.sources.netsource.channels = memorychannel
    agent5.sinks.hdfssink.channel = memorychannel
    

      OK,下面启动agent5。

         

         下面查看一下结果。可以看到文件夹的名字被如期地加上了日期。

         

         

         OK,先到这里,还有两个稍微复杂一点点的agent下次再讨论。本人水平有限,请各位不吝指正!谢谢!

  • 相关阅读:
    【Codeforces 349B】Color the Fence
    【Codeforces 459D】Pashmak and Parmida's problem
    【Codeforces 467C】George and Job
    【Codeforces 161D】Distance in Tree
    【Codeforces 522A】Reposts
    【Codeforces 225C】Barcode
    【Codeforces 446A】DZY Loves Sequences
    【Codeforces 429B】Working out
    【Codeforces 478C】Table Decorations
    【Codeforces 478C】Table Decorations
  • 原文地址:https://www.cnblogs.com/UUhome/p/4306609.html
Copyright © 2011-2022 走看看