zoukankan      html  css  js  c++  java
  • 利用Flume采集IIS日志到HDFS

    1.下载flume 1.7

    到官网上下载 flume 1.7版本

    2.配置flume配置文件

    刚开始的想法是从IIS--->Flume-->Hdfs

    但在采集的时候一直报错,无法直接连接到远程的hdfs

    22 二月 2017 14:59:04,566 WARN  [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.sink.hdfs.HDFSEventSink.process:443)  - HDFS IO error
    java.io.IOException: Callable timed out after 10000 ms on file: hdfs://192.168.1.75:9008/iis/2017-02-22/u_ex151127.log.1487746609021.tmp
        at org.apache.flume.sink.hdfs.BucketWriter.callWithTimeout(BucketWriter.java:682)
        at org.apache.flume.sink.hdfs.BucketWriter.open(BucketWriter.java:232)
        at org.apache.flume.sink.hdfs.BucketWriter.append(BucketWriter.java:504)
        at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:406)
        at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:67)
        at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:145)
        at java.lang.Thread.run(Thread.java:745)
    Caused by: java.util.concurrent.TimeoutException
        at java.util.concurrent.FutureTask.get(FutureTask.java:205)
        at org.apache.flume.sink.hdfs.BucketWriter.callWithTimeout(BucketWriter.java:675)
        ... 6 more

    所以后面有选用折中的办法,从 windows flume 采集到linux的flume,再到hdfs

    IIS-->(Windows)Flume-->(Linux)Flume-->Hdfs

    采集端windows flume配置文件如下:

    a1.sources = r1
    a1.sinks = k1
    a1.channels = c1
     
    # Describe/configure the source
    a1.sources.r1.type = spooldir
    a1.sources.r1.channels = c1
    a1.sources.r1.spoolDir = C:\inetpub\logs\LogFiles\W3SVC4
    a1.sources.r1.fileHeader = true
    a1.sources.r1.basenameHeader = true
    a1.sources.r1.basenameHeaderKey = fileName
    a1.sources.r1.ignorePattern = ^(.)*\.tmp$
    a1.sources.r1.interceptors = i1
    a1.sources.r1.interceptors.i1.type = timestamp
    
    a1.sinks.k1.type = avro
    a1.sinks.k1.hostname = 192.168.1.75
    a1.sinks.k1.port = 44444
     
    # Use a channel which buffers events in memory
    a1.channels.c1.type=memory  
    a1.channels.c1.capacity=10000  
    a1.channels.c1.transactionCapacity=1000  
    a1.channels.c1.keep-alive=30  
    
    # Bind the source and sink to the channel
    a1.sources.r1.channels = c1
    a1.sinks.k1.channel = c1

    其中主要就是将sinks配置到linux中的flume地址,采集目录就是IIS的某个网站日志文件地址:C:\inetpub\logs\LogFiles\W3SVC4

    接收端linux flume的配置如下:

    tier1.sources=source1
    tier1.channels=channel1  
    tier1.sinks=sink1  
          
    tier1.sources.source1.type=avro  
    tier1.sources.source1.bind=192.168.1.75  
    tier1.sources.source1.port=44444  
    tier1.sources.source1.channels=channel1  
          
    tier1.channels.channel1.type=memory  
    tier1.channels.channel1.capacity=10000  
    tier1.channels.channel1.transactionCapacity=1000  
    tier1.channels.channel1.keep-alive=30  
          
    tier1.sinks.sink1.channel=channel1  
    
    tier1.sinks.sink1.type = hdfs
    tier1.sinks.sink1.hdfs.path = hdfs://127.0.0.1:9008/iis
    tier1.sinks.sink1.hdfs.writeFormat = Text
    tier1.sinks.sink1.hdfs.fileType = DataStream
    tier1.sinks.sink1.hdfs.rollInterval = 0
    tier1.sinks.sink1.hdfs.rollSize = 0
    tier1.sinks.sink1.hdfs.rollCount = 0
    tier1.sinks.sink1.hdfs.filePrefix = localhost-%Y-%m-%d
    tier1.sinks.sink1.hdfs.useLocalTimeStamp = true
    tier1.sinks.sink1.hdfs.idleTimeout = 60
     

    3.启动linux中的flume 

    ./flume-ng agent -c ../conf -f ../conf/avro_hdfs.conf -n tier1 -Dflume.root.logger=DEBUG,console

    4.启动windows中的flume

    需要在flume的bin目录中启动

    flume-ng.cmd agent --conf ..conf --conf-file ..confavro.conf --name a1
  • 相关阅读:
    NodeJs 的Module.export 和 export
    Angular 调试
    设计模式 -- 访问者
    typescript 枚举
    Swagger 实践 <二>
    eventFlow 系列 <三> 查询所有
    成员变量的隐藏和方法的重写
    Facetoprocess_program_design
    ATM_tests
    transmission protocol
  • 原文地址:https://www.cnblogs.com/Gyoung/p/6429710.html
Copyright © 2011-2022 走看看