zoukankan      html  css  js  c++  java
  • Kafka Architecture and Its Fundamental Concepts(转发)

    原文:

    https://data-flair.training/blogs/kafka-architecture/

    Kafka Architecture

    In our last Kafka Tutorial, we discussed Kafka Use Cases and Applications.

    Today, in this Kafka Tutorial, we will discuss Kafka Architecture.

    In this Kafka Architecture article, we will see API’s in Kafka.

    Moreover, we will learn about Kafka Broker, Kafka Consumer, Zookeeper, and Kafka Producer.

    Also, we will see some fundamental concepts of Kafka.

    So, let’s start Apache Kafka Architecture.

    Kafka Architecture – Apache Kafka APIs

    Apache Kafka Architecture has four core APIs, producer API, Consumer API, Streams API, and Connector API. Let’s discuss them one by one:

    a. Producer API

    In order to publish a stream of records to one or more Kafka topics, the Producer API allows an application. 

    Did you check an amazing article on – Kafka Security

    b. Consumer API

    This API permits an application to subscribe to one or more topics and also to process the stream of records produced to them.

    c. Streams API

    Moreover, to act as a stream processor,

    consuming an input stream from one or more topics and producing an output stream to one or more output topics,

    effectively transforming the input streams to output streams, the streams API permits an application.

    d. Connector API

    While it comes to building and running reusable producers or consumers that connect Kafka topics to existing applications or data systems, we use the Connector API.

    For example, a connector to a relational database might capture every change to a table.

    Have a look at Top 5 Apache Kafka Books.

    Apache Kafka Architecture – Cluster

    The below diagram shows the cluster diagram of Apache Kafka:

     Let’s describe each component of Kafka Architecture shown in the above diagram:

    a. Kafka Broker

    Basically, to maintain load balance Kafka cluster typically consists of multiple brokers.

    However, these are stateless, hence for maintaining the cluster state they use ZooKeeper.

    Although, one Kafka Broker instance can handle hundreds of thousands of reads and writes per second. Whereas, without performance impact, each broker can handle TB of messages.

    In addition, make sure ZooKeeper performs Kafka broker leader election.

    b. Kafka – ZooKeeper

    For the purpose of managing and coordinating, Kafka broker uses ZooKeeper.

    Also, uses it to notify producer and consumer about the presence of any new broker in the Kafka system or failure of the broker in the Kafka system.

    As soon as Zookeeper send the notification regarding presence or failure of the broker then producer and consumer, take the decision and starts coordinating their task with some other broker.

    d. Kafka Consumers

    Basically, by using partition offset the Kafka Consumer maintains that how many messages have been consumed because Kafka brokers are stateless.

    Moreover, you can assure that the consumer has consumed all prior messages once the consumer acknowledges a particular message offset.

    Also, in order to have a buffer of bytes ready to consume, the consumer issues an asynchronous pull request to the broker.

    Then simply by supplying an offset value, consumers can rewind or skip to any point in a partition.

    In addition, ZooKeeper notifies Consumer offset value.

    Kafka Architecture – Fundamental Concepts

    Here, we are listing some of the fundamental concepts of Kafka Architecture that you must know:

    a. Kafka Topics

    The topic is a logical channel to which producers publish message and from which the consumers receive messages.

    1. A topic defines the stream of a particular type/classification of data, in Kafka.
    2. Moreover, here messages are structured or organized. A particular type of messages is published on a particular topic.
    3. Basically, at first, a producer writes its messages to the topics. Then consumers read those messages from topics.
    4. In a Kafka cluster, a topic is identified by its name and must be unique.
    5. There can be any number of topics, there is no limitation.
    6. We can not change or update data, as soon as it gets published.

    Below is the image which shows the relationship between Kafka Topics and Partitions:

    Kafka Architecture

    Kafka Architecture – Relation between Kafka Topics and Partitions

    b. Partitions in Kafka

    In a Kafka cluster, Topics are split into Partitions and also replicated across brokers.

    1. However, to which partition a published message will be written, there is no guarantee about that.
    2. Also, we can add a key to a message. Basically, we will get ensured that all these messages (with the same key) will end up in the same partition if a producer publishes a message with a key. Due to this feature, Kafka offers message sequencing guarantee. Though, unless a key is added to it, data is written to partitions randomly.
    3. Moreover, in one partition, messages are stored in the sequenced fashion.
    4. In a partition, each message is assigned an incremental id, also called offset.
    5. However, only within the partition, these offsets are meaningful. Moreover, in a topic, it does not have any value across partitions.
    6. There can be any number of Partitions, there is no limitation.

    c. Topic Replication Factor in Kafka

    While designing a Kafka system, it’s always a wise decision to factor in topic replication.

    As a result, its topics’ replicas from another broker can solve the crisis, if a broker goes down. For example, we have 3 brokers and 3 topics. Broker1 has Topic 1 and Partition 0, its replica is in Broker2, so on and so forth. It has got a replication factor of 2; it means it will have one additional copy other than the primary one. Below is the image of Topic Replication Factor:

    Don’t forget to check –  Apache Kafka Streams Tutorial

    Kafka Architecture

    Kafka Architecture – Topic Replication Factor

     

    Some key points –

    1. Replication takes place in the partition level only.
    2. For a given partition, only one broker can be a leader, at a time. Meanwhile, other brokers will have in-sync replica; what we call ISR.
    3. It is not possible to have the number of replication factor more than the number of available brokers.

    d. Consumer Group

    1. It can have multiple consumer process/instance running.
    2. Basically, one consumer group will have one unique group-id.
    3. Moreover, exactly one consumer instance reads the data from one partition in one consumer group, at the time of reading.
    4. Since, there is more than one consumer group, in that case, one instance from each of these groups can read from one single partition.
    5. However, there will be some inactive consumers, if the number of consumers exceeds the number of partitions. Let’s understand it with an example if there are 8 consumers and 6 partitions in a single consumer group, that means there will be 2 inactive consumers.

    So, this was all about Apache Kafka Architecture. Hope you like our explanation.

  • 相关阅读:
    Postfix邮件服务器搭建及配置
    利用linux漏洞进行提权
    NFS部署和优化
    LAMP环境搭建
    Apache2.4.6服务器安装及配置
    linux笔记_防止ddos攻击
    CentOS6.5恢复误删除的文件
    linux计划任务
    linux软连接和硬链接
    linux用户和用户组的基本操作
  • 原文地址:https://www.cnblogs.com/panpanwelcome/p/13533944.html
Copyright © 2011-2022 走看看