zoukankan      html  css  js  c++  java
  • flume topology design . tier num 分层数目

    32:+:1

    x:1

    x<=8

    https://flume.apache.org/FlumeUserGuide.html#flume-topology-design

    Flume topology design

    【拓扑 分层原因 0-缓冲1-路由】

    The first step in designing a Flume topology is to enumerate all sources and destinations (terminal sinks) for your data. These will define the edge points of your topology. The next consideration is whether to introduce intermediate aggregation tiers or event routing. If you are collecting data form a large number of sources, it can be helpful to aggregate the data in order to simplify ingestion at the terminal sink. An aggregation tier can also smooth out burstiness from sources or unavailability at sinks, by acting as a buffer. If you are routing data between different locations, you may also want to split flows at various points: this creates sub-topologies which may themselves include aggregation points.

    Sizing a Flume deployment

    【猝发 事件数 字节数 每层最大吞吐量】

    Once you have an idea of what your topology will look like, the next question is how much hardware and networking capacity is needed. This starts by quantifying how much data you generate. That is not always a simple task! Most data streams are bursty (for instance, due to diurnal patterns) and potentially unpredictable. A good starting point is to think about the maximum throughput you’ll have in each tier of the topology, both in terms of events per second and bytes per second. Once you know the required throughput of a given tier, you can calulate a lower bound on how many nodes you require for that tier. To determine attainable throughput, it’s best to experiment with Flume on your hardware, using synthetic or sampled event data. In general, disk-based channels should get 10’s of MB/s and memory based channels should get 100’s of MB/s or more. Performance will vary widely, however depending on hardware and operating environment.

    Sizing aggregate throughput gives you a lower bound on the number of nodes you will need to each tier. There are several reasons to have additional nodes, such as increased redundancy and better ability to absorb bursts in load.

  • 相关阅读:
    常用的CSS命名规则 (web标准化设计)
    有哪些概率论和数理统计的深入教材可以推荐?
    CV2X国内现状分析
    隐私计算,新能源汽车“安全上路”的“救命稻草”?
    2022年中国车联网行业全景图谱
    2022年十大AI预测:气候独角兽涌现、中美竞争加剧
    OSEK/VDX介绍
    Adaptive Autosar
    基于我国商密算法的车联网5GV2X通信安全可信体系
    行研篇 | 汽车域控制器研究
  • 原文地址:https://www.cnblogs.com/rsapaper/p/7745950.html
Copyright © 2011-2022 走看看