zoukankan      html  css  js  c++  java
  • 【慕课网实战】Spark Streaming实时流处理项目实战笔记三之铭文升级版

    铭文一级:

    Flume概述
    Flume is a distributed, reliable,
    and available service for efficiently collecting(收集),
    aggregating(聚合), and moving(移动) large amounts of log data


    webserver(源端) ===> flume ===> hdfs(目的地)


    设计目标:
    可靠性
    扩展性
    管理性


    业界同类产品的对比
    (***)Flume: Cloudera/Apache Java
    Scribe: Facebook C/C++ 不再维护
    Chukwa: Yahoo/Apache Java 不再维护
    Kafka:
    Fluentd: Ruby
    (***)Logstash: ELK(ElasticSearch,Kibana)


    Flume发展史
    Cloudera 0.9.2 Flume-OG
    flume-728 Flume-NG ==> Apache
    2012.7 1.0
    2015.5 1.6 (*** + )
    ~ 1.7


    Flume架构及核心组件
    1) Source 收集

    2) Channel 聚集

    3) Sink 输出


    Flume安装前置条件
    Java Runtime Environment - Java 1.7 or later
    Memory - Sufficient memory for configurations used by sources, channels or sinks
    Disk Space - Sufficient disk space for configurations used by channels or sinks
    Directory Permissions - Read/Write permissions for directories used by agent


    安装jdk
    下载
    解压到~/app
    将java配置系统环境变量中: ~/.bash_profile
    export JAVA_HOME=/home/hadoop/app/jdk1.8.0_144
    export PATH=$JAVA_HOME/bin:$PATH
    source下让其配置生效
    检测: java -version


    安装Flume
    下载
    解压到~/app
    将java配置系统环境变量中: ~/.bash_profile
    export FLUME_HOME=/home/hadoop/app/apache-flume-1.6.0-cdh5.7.0-bin
    export PATH=$FLUME_HOME/bin:$PATH
    source下让其配置生效
    flume-env.sh的配置:export JAVA_HOME=/home/hadoop/app/jdk1.8.0_144
    检测: flume-ng version


    example.conf: A single-node Flume configuration

    使用Flume的关键就是写配置文件

    A) 配置Source
    B) 配置Channel
    C) 配置Sink
    D) 把以上三个组件串起来

    a1: agent名称
    r1: source的名称
    k1: sink的名称
    c1: channel的名称

    # Name the components on this agent
    a1.sources = r1
    a1.sinks = k1
    a1.channels = c1

    # Describe/configure the source
    a1.sources.r1.type = netcat
    a1.sources.r1.bind = hadoop000
    a1.sources.r1.port = 44444

    # Describe the sink
    a1.sinks.k1.type = logger

    # Use a channel which buffers events in memory
    a1.channels.c1.type = memory

    # Bind the source and sink to the channel
    a1.sources.r1.channels = c1
    a1.sinks.k1.channel = c1


    启动agent
    flume-ng agent
    --name a1
    --conf $FLUME_HOME/conf
    --conf-file $FLUME_HOME/conf/example.conf
    -Dflume.root.logger=INFO,console

    使用telnet进行测试: telnet hadoop000 44444


    Event: { headers:{} body: 68 65 6C 6C 6F 0D hello. }
    Event是FLume数据传输的基本单元
    Event = 可选的header + byte array

    铭文二级:

    Flume设计目标:可靠性,扩展性,管理性

    官网:flume.apache.org -> Documentation(左栏目) -> Flume User Guide 

    左栏为目录,较常用的有:

    Flume Sources:avro、exec、kafka、netcat

    Flume Channels:memory、file、kafka

    Flume Sinks:HDFS、Hive、logger、avro、ElasticSearch、Hbase、kafka

    注意:每个source、channel、sink都有custom自定义类型

    Setting multi-agent flow

    Consolidation

    Multiplexing the flow

    实战准备=>

    1.前置要求为以上铭文一4点,Flume的下载可以在cdh5里wget下来

    wget http://archive.cloudera.com/cdh5/cdh/5/flume-ng-1.6.0-cdh5.5.0.tar.gz

    2.安装jdk,指令:tar -zxvf * -C ~/app/ ,最后勿忘:source ~/.bash_profile

    配置cp flume-env.sh.template flume-env.sh ,export JAVA_HOME=/home/hadoop/app/jdk1.8.0_144

    3.检测是否安装成功:flume-ng version

    实战步骤=>

    实战需求:从指定的网络端口采集数据输出到控制台

    配置文件(创建example.conf于conf文件夹中,主要是看官网!):

    1、a1.后面的source、channel、sink、均有"s"

    2、后面连接是,sources后面的channel有"s",sink后面的chanel无"s"

    启动agent=>
    flume-ng agent
    --name a1
    --conf $FLUME_HOME/conf
    --conf-file $FLUME_HOME/conf/example.conf
    -Dflume.root.logger=INFO,console

    启动另一终端ssh上,使用telnet进行监听: telnet hadoop000 44444

    原本的终端输入内容,可以在此终端接受到

  • 相关阅读:
    寻找道路
    联合权值
    二分图
    最优贸易
    读入优化
    专属空间五——新世界(新闻浏览功能)中
    专属空间四——新世界(新闻浏览功能)上
    专属空间三——文件管理器
    专属空间二-记账本的实现
    专属空间一-主界面设计
  • 原文地址:https://www.cnblogs.com/kkxwz/p/8350753.html
Copyright © 2011-2022 走看看