zoukankan      html  css  js  c++  java
  • flume topology design . tier num 分层数目

    32:+:1

    x:1

    x<=8

    https://flume.apache.org/FlumeUserGuide.html#flume-topology-design

    Flume topology design

    【拓扑 分层原因 0-缓冲1-路由】

    The first step in designing a Flume topology is to enumerate all sources and destinations (terminal sinks) for your data. These will define the edge points of your topology. The next consideration is whether to introduce intermediate aggregation tiers or event routing. If you are collecting data form a large number of sources, it can be helpful to aggregate the data in order to simplify ingestion at the terminal sink. An aggregation tier can also smooth out burstiness from sources or unavailability at sinks, by acting as a buffer. If you are routing data between different locations, you may also want to split flows at various points: this creates sub-topologies which may themselves include aggregation points.

    Sizing a Flume deployment

    【猝发 事件数 字节数 每层最大吞吐量】

    Once you have an idea of what your topology will look like, the next question is how much hardware and networking capacity is needed. This starts by quantifying how much data you generate. That is not always a simple task! Most data streams are bursty (for instance, due to diurnal patterns) and potentially unpredictable. A good starting point is to think about the maximum throughput you’ll have in each tier of the topology, both in terms of events per second and bytes per second. Once you know the required throughput of a given tier, you can calulate a lower bound on how many nodes you require for that tier. To determine attainable throughput, it’s best to experiment with Flume on your hardware, using synthetic or sampled event data. In general, disk-based channels should get 10’s of MB/s and memory based channels should get 100’s of MB/s or more. Performance will vary widely, however depending on hardware and operating environment.

    Sizing aggregate throughput gives you a lower bound on the number of nodes you will need to each tier. There are several reasons to have additional nodes, such as increased redundancy and better ability to absorb bursts in load.

  • 相关阅读:
    C+= concurrent_queue 线程安全测试
    c++ 枚举 在函数中的应用
    shell脚本积累
    hibernate+spring整合增删改事务错误
    checkbox批量删除功能
    html全选和取消全选JS
    hibernate+pageBean实现分页dao层功能代码
    table样式
    WebStorm 11 Lisence server
    Kb,KB,Kbps,Mb,Mbps等一些列概念
  • 原文地址:https://www.cnblogs.com/rsapaper/p/7745950.html
Copyright © 2011-2022 走看看