zoukankan      html  css  js  c++  java
  • 《OD大数据实战》Flume入门实例

    一、netcat source + memory channel + logger sink

    1. 修改配置

    1)修改$FLUME_HOME/conf下的flume-env.sh文件,修改内容如下

    export JAVA_HOME=/opt/modules/jdk1.7.0_67

    2)在$FLUME_HOME/conf目录下,创建agent子目录,新建netcat-memory-logger.conf,配置内容如下:

    # netcat-memory-logger
    
    # Name the components on this agent
    a1.sources = r1
    a1.sinks = k1
    a1.channels = c1
    
    # Describe/configure the source
    a1.sources.r1.type = netcat
    a1.sources.r1.bind = beifeng-hadoop-02
    a1.sources.r1.port = 44444
    
    # Describe the sink
    a1.sinks.k1.type = logger
    
    # Use a channel which buffers events in memory
    a1.channels.c1.type = memory
    a1.channels.c1.capacity = 1000
    a1.channels.c1.transactionCapacity = 100
    
    # Bind the source and sink to the channel
    a1.sources.r1.channels = c1
    a1.sinks.k1.channel = c1

    2. 启动flume并测试

    1) 启动

    bin/flume-ng agent -n a1 -c conf/ -f conf/agent/netcat-memory-logger.conf -Dflume.root.logger=INFO,console

    2) 测试

    nc beifeng-hadoop-02 44444

    输入任意字符串,观察服务器的日志文件即可。

    使用linux的nc命令,如果命令不存在则先安装一下。 

    安装netcat:sudo yum -y install nc

    二、agent: avro source + file channel + hdfs sink 

    1. 增加配置

    在$FLUME_HOME/conf目录下,创建agent子目录,新建avro-file-hdfs.conf,配置内容如下:

    # Name the components on this agent
    a1.sources = r1
    a1.sinks = k1
    a1.channels = c1
    
    # Describe/configure the source
    a1.sources.r1.type = netcat
    a1.sources.r1.bind = beifeng-hadoop-02
    a1.sources.r1.port = 4141
    
    # Describe the sink
    a1.sinks.k1.type = hdfs
    a1.sinks.k1.hdfs.path = hdfs://beifeng-hadoop-02:9000/flume/events/%Y-%m-%d
    # default:FlumeData
    a1.sinks.k1.hdfs.filePrefix = FlumeData
    a1.sinks.k1.hdfs.useLocalTimeStamp = true
    a1.sinks.k1.hdfs.rollInterval = 0
    a1.sinks.k1.hdfs.rollCount = 0
    # 一般接近block 128 120 125
    a1.sinks.k1.hdfs.rollSize = 10240
    a1.sinks.k1.hdfs.fileType = DataStream
    #a1.sinks.k1.hdfs.round = true
    #a1.sinks.k1.hdfs.roundValue = 10
    #a1.sinks.k1.hdfs.roundUnit = minute
    
    # Use a channel which buffers events in memory
    a1.channels.c1.type = file
    a1.channels.c1.checkpointDir = /opt/modules/cdh/apache-flume-1.5.0-cdh5.3.6-bin/checkpoint
    a1.channels.c1.dataDirs = /opt/modules/cdh/apache-flume-1.5.0-cdh5.3.6-bin/data
    
    # Bind the source and sink to the channel
    a1.sources.r1.channels = c1
    a1.sinks.k1.channel = c1

    2. 启动并测试

    1)启动flume agent

    bin/flume-ng agent -n a1 -c conf/ -f conf/agent/avro-file-hdfs.conf -Dflume.root.logger=INFO,console

    2)使用flume自带的avro-client测试

    bin/flume-ng avro-client --host beifeng-hadoop-02 --port 4141 --filename /home/beifeng/order_info.txt
  • 相关阅读:
    C#可视化程序设计第三章(3,4)
    "Can’t be opened because Apple cannot check it for malicious software" 解决方案
    Mac系统.DS_Store文件导致IOError: [Errno 20] Not a directory:解决方案
    读书笔记 《局外人》
    Chrome 67之后无法离线安装插件
    函数和方法的区别
    github|webstorm
    webstorm
    Markdown
    github
  • 原文地址:https://www.cnblogs.com/yeahwell/p/5746057.html
Copyright © 2011-2022 走看看