zoukankan      html  css  js  c++  java
  • Kafka源码分析及图解原理之Producer端

    一.前言

      任何消息队列都是万变不离其宗都是3部分,消息生产者(Producer)、消息消费者(Consumer)和服务载体(在Kafka中用Broker指代)。那么本篇主要讲解Producer端,会有适当的图解帮助理解底层原理。

     

    一.开发应用

      首先介绍一下开发应用,如何构建一个KafkaProducer及使用,还有一些重要参数的简介。

    1.1 一个栗子

     1 /**
     2  * Kafka Producer Demo实例类。
     3  *
     4  * @author GrimMjx
     5  */
     6 public class ProducerDemo {
     7     public static void main(String[] args) throws ExecutionException, InterruptedException {
     8         Properties prop = new Properties();
     9         prop.put("client.id", "DemoProducer");
    10 
    11         // 以下三个参数必须指定
    12         // 用于创建与Kafka broker服务器的连接,集群的话则用逗号分隔
    13         prop.put("bootstrap.servers", "localhost:9092");
    14         // 消息的key序列化方式
    15         prop.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
    16         // 消息的value序列化方式
    17         prop.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
    18 
    19         // 以下参数为可配置选项
    20         prop.put("acks", "-1");
    21         prop.put("retries", "3");
    22         prop.put("batch.size", "323840");
    23         prop.put("linger.ms", "10");
    24         prop.put("buffer.memory", "33554432");
    25         prop.put("max.block.ms", "3000");
    26 
    27         KafkaProducer<String, String> producer = new KafkaProducer<String, String>(prop);
    28         try {
    29             // 异步发送,继续发送消息不用等待。当有结果返回时,callback会被通知执行
    30             producer.send(new ProducerRecord<String, String>("test", "key1", "value1"),
    31                     new Callback() {
    32                         // 返回结果RecordMetadata记录元数据包括了which partition的which offset
    33                         public void onCompletion(RecordMetadata metadata, Exception e) {
    34                             // 发送成功
    35                             if (e == null) {
    36                                 System.out.println("The offset of the record we just sent is: " + metadata.offset());
    37 
    38                                 // 发送失败
    39                             } else {
    40                                 if (e instanceof RetriableException) {
    41                                     // 处理可重试的异常,比如分区leader副本不可用
    42                                     // 一般用retries参数来设置重置,毕竟这里也没有什么其他能做的,也是同样的重试发送消息
    43                                 } else {
    44                                     // 处理不可重试异常
    45                                 }
    46                             }
    47                         }
    48                     }
    49             );
    50 
    51             // 同步发送,send方法返回Future,然后get。在没有返回结果一直阻塞
    52             producer.send(new ProducerRecord<String, String>("test", "key1", "value1")).get();
    53 
    54         } finally {
    55             // producer运行的时候占用系统额外资源,最后一定要关闭
    56             producer.close();
    57         }
    58     }
    59 }

      注释已经写得十分详细了,参数的下面会说,这里就只说一下异步发送和同步发送。我们先看下KafkaProducer.send方法,可以看到返回的是一个Future,那么如何实现同步阻塞和异步非阻塞呢?

    • 同步阻塞:send方法返回Future,然后get。在没有返回结果一直阻塞,无限等待
    • 异步非阻塞:send方法提供callback,调用send方法后可以继续发送消息不用等待。当有结果返回时,callback会被通知执行

    1.2 重要参数

      这里分析一下broker端的重要参数,前3个是必要参数。Kafka的文档真的很吊,可以看这个类,每个参数和注释都解释的十分详细:org.apache.kafka.clients.producer.ProducerConfig

    • bootstrap.server(必要):broker服务器列表,如果集群的机器很多,不用全配,producer可以发现集群中所有broker
    • key.serializer/value.serializer(必要):key和value的序列化方式。这两个参数都必须是全限定类名,可以自定义拓展。
    • acks:有3个值,0、1和all(-1)
    • buffer.memory:producer启动会创建一个内存缓冲区保存待发送的消息,这部分的内存大小就是这个参数来控制的
    • commpression.type:压缩算法的选择,目前有GZIP、Snappy和LZ4。目前结合LZ4性能最好
    • retries:重试次数,0.11.0.0版本之前可能导致消息重发
    • batch.size:相同分区多条消息集合叫batch,当batch满了则发送给broker
    • linger.ms:难道batch没满就不发了么?当然不是,不满则等linger.ms时间再发。延时权衡行为
    • max.request.size:控制发送请求的大小
    • request.timeout.ms:超过时间则会在回调函数抛出TimeoutException异常
    • partitioner.class:分区机制,可自定义,默认分区器的处理是:有key则用murmur2算法计算key的哈希值,对总分区取模算出分区号,无key则轮询
    • enable.idempotence:Apache Kafka 0.11.0.0版本用于实现EOS的利器

    二.源码分析及图解原理

    2.1 RecordAccumulator

      上面介绍的参数中buffer.memory是缓冲区的大小,RecordAccmulator就是承担了缓冲区的角色。默认是32MB。

      还有上面介绍的参数中batch.size提到了batch的概念,在kafka producer中,消息不是一条一条发给broker的,而是多条消息组成一个ProducerBatch,然后由Sender一次性发出去,这里的batch.size并不是消息的条数(凑满多少条即发送),而是一个大小。默认是16KB,可以根据具体情况来进行优化。

      在RecordAccumulator中,最核心的参数就是:

    private final ConcurrentMap<TopicPartition, Deque<ProducerBatch>> batches;

      它是一个ConcurrentMap,key是TopicPartition类,代表一个topic的一个partition。value是一个包含ProducerBatch的双端队列。等待Sender线程发送给broker。画张图来看下:

      再从源码角度来看如何添加到缓冲区队列里的,主要看这个方法:org.apache.kafka.clients.producer.internals.RecordAccumulator#append:

      注释写的十分详细了,这里需要思考一点,为什么分配内存的代码没有放在synchronized同步块里?看起来这里很多余,导致下面的synchronized同步块中还要tryAppend一下,因为这时候可能其他线程已经创建好RecordBatch了。造成多余的内存申请。但是仔细想想,如果把分配内存放在synchronized同步块会有什么问题?

      内存申请不到线程会一直等待,如果放在同步块中会造成一直不释放Deque队列的锁,那其他线程将无法对Deque队列进行线程安全的同步操作。那不是走远了?

     1 /**
     2  * Add a record to the accumulator, return the append result
     3  * <p>
     4  * The append result will contain the future metadata, and flag for whether the appended batch is full or a new batch is created
     5  * <p>
     6  *
     7  * @param tp The topic/partition to which this record is being sent
     8  * @param timestamp The timestamp of the record
     9  * @param key The key for the record
    10  * @param value The value for the record
    11  * @param headers the Headers for the record
    12  * @param callback The user-supplied callback to execute when the request is complete
    13  * @param maxTimeToBlock The maximum time in milliseconds to block for buffer memory to be available
    14  */
    15 public RecordAppendResult append(TopicPartition tp,
    16                                  long timestamp,
    17                                  byte[] key,
    18                                  byte[] value,
    19                                  Header[] headers,
    20                                  Callback callback,
    21                                  long maxTimeToBlock) throws InterruptedException {
    22     // We keep track of the number of appending thread to make sure we do not miss batches in
    23     // abortIncompleteBatches().
    24     appendsInProgress.incrementAndGet();
    25     ByteBuffer buffer = null;
    26     if (headers == null) headers = Record.EMPTY_HEADERS;
    27     try {
    28         // check if we have an in-progress batch
    29         // 其实就是一个putIfAbsent操作的方法,不展开分析
    30         Deque<ProducerBatch> dq = getOrCreateDeque(tp);
    31         // batches是线程安全的,但是Deque不是线程安全的
    32         // 已有在处理中的batch
    33         synchronized (dq) {
    34             if (closed)
    35                 throw new IllegalStateException("Cannot send after the producer is closed.");
    36             RecordAppendResult appendResult = tryAppend(timestamp, key, value, headers, callback, dq);
    37             if (appendResult != null)
    38                 return appendResult;
    39         }
    40 
    41         // we don't have an in-progress record batch try to allocate a new batch
    42         // 创建一个新的ProducerBatch
    43         byte maxUsableMagic = apiVersions.maxUsableProduceMagic();
    44         // 分配一个内存
    45         int size = Math.max(this.batchSize, AbstractRecords.estimateSizeInBytesUpperBound(maxUsableMagic, compression, key, value, headers));
    46         log.trace("Allocating a new {} byte message buffer for topic {} partition {}", size, tp.topic(), tp.partition());
    47         // 申请不到内存
    48         buffer = free.allocate(size, maxTimeToBlock);
    49         synchronized (dq) {
    50             // Need to check if producer is closed again after grabbing the dequeue lock.
    51             if (closed)
    52                 throw new IllegalStateException("Cannot send after the producer is closed.");
    53 
    54             // 再次尝试添加,因为分配内存的那段代码并不在synchronized块中
    55             // 有可能这时候其他线程已经创建好RecordBatch了,finally会把分配好的内存还回去
    56             RecordAppendResult appendResult = tryAppend(timestamp, key, value, headers, callback, dq);
    57             if (appendResult != null) {
    58                 // 作者自己都说了,希望不要总是发生,多个线程都去申请内存,到时候还不是要还回去?
    59                 // Somebody else found us a batch, return the one we waited for! Hopefully this doesn't happen often...
    60                 return appendResult;
    61             }
    62 
    63             // 创建ProducerBatch
    64             MemoryRecordsBuilder recordsBuilder = recordsBuilder(buffer, maxUsableMagic);
    65             ProducerBatch batch = new ProducerBatch(tp, recordsBuilder, time.milliseconds());
    66             FutureRecordMetadata future = Utils.notNull(batch.tryAppend(timestamp, key, value, headers, callback, time.milliseconds()));
    67 
    68             dq.addLast(batch);
    69             // incomplete是一个Set集合,存放不完整的batch
    70             incomplete.add(batch);
    71 
    72             // Don't deallocate this buffer in the finally block as it's being used in the record batch
    73             buffer = null;
    74 
    75             // 返回记录添加结果类
    76             return new RecordAppendResult(future, dq.size() > 1 || batch.isFull(), true);
    77         }
    78     } finally {
    79         // 释放要还的内存
    80         if (buffer != null)
    81             free.deallocate(buffer);
    82         appendsInProgress.decrementAndGet();
    83     }
    84 }

      附加tryAppend()方法,不多说,都在代码注释里:

     1 /**
     2  *  Try to append to a ProducerBatch.
     3  *
     4  *  If it is full, we return null and a new batch is created. We also close the batch for record appends to free up
     5  *  resources like compression buffers. The batch will be fully closed (ie. the record batch headers will be written
     6  *  and memory records built) in one of the following cases (whichever comes first): right before send,
     7  *  if it is expired, or when the producer is closed.
     8  */
     9 private RecordAppendResult tryAppend(long timestamp, byte[] key, byte[] value, Header[] headers, Callback callback, Deque<ProducerBatch> deque) {
    10     // 获取最新加入的ProducerBatch
    11     ProducerBatch last = deque.peekLast();
    12     if (last != null) {
    13         FutureRecordMetadata future = last.tryAppend(timestamp, key, value, headers, callback, time.milliseconds());
    14         if (future == null)
    15             last.closeForRecordAppends();
    16         else
    17             // 记录添加结果类包含future、batch是否已满的标记、是否是新batch创建的标记
    18             return new RecordAppendResult(future, deque.size() > 1 || last.isFull(), false);
    19     }
    20     // 如果这个Deque没有ProducerBatch元素,或者已经满了不足以加入本条消息则返回null
    21     return null;
    22 }

       以上代码见图解:

     

    2.2 Sender

      Sender里最重要的方法莫过于run()方法,其中比较核心的方法是org.apache.kafka.clients.producer.internals.Sender#sendProducerData

      其中pollTimeout需要认真读注释,意思是最长阻塞到至少有一个通道在你注册的事件就绪了。返回0则表示走起发车了

     1 private long sendProducerData(long now) {
     2     // 获取当前集群的所有信息
     3     Cluster cluster = metadata.fetch();
     4     // get the list of partitions with data ready to send
     5     // @return ReadyCheckResult类的三个变量解释
     6     // 1.Set<Node> readyNodes 准备好发送的节点
     7     // 2.long nextReadyCheckDelayMs 下次检查节点的延迟时间
     8     // 3.Set<String> unknownLeaderTopics 哪些topic找不到leader节点
     9     RecordAccumulator.ReadyCheckResult result = this.accumulator.ready(cluster, now);
    10     // if there are any partitions whose leaders are not known yet, force metadata update
    11     // 如果有些topic不知道leader信息,更新metadata
    12     if (!result.unknownLeaderTopics.isEmpty()) {
    13         // The set of topics with unknown leader contains topics with leader election pending as well as
    14         // topics which may have expired. Add the topic again to metadata to ensure it is included
    15         // and request metadata update, since there are messages to send to the topic.
    16         for (String topic : result.unknownLeaderTopics)
    17             this.metadata.add(topic);
    18         this.metadata.requestUpdate();
    19     }
    20 
    21     // 去除不能发送信息的节点
    22     // remove any nodes we aren't ready to send to
    23     Iterator<Node> iter = result.readyNodes.iterator();
    24     long notReadyTimeout = Long.MAX_VALUE;
    25     while (iter.hasNext()) {
    26         Node node = iter.next();
    27         if (!this.client.ready(node, now)) {
    28             iter.remove();
    29             notReadyTimeout = Math.min(notReadyTimeout, this.client.connectionDelay(node, now));
    30         }
    31     }
    32 
    33     // 获取将要发送的消息
    34     // create produce requests
    35     Map<Integer, List<ProducerBatch>> batches = this.accumulator.drain(cluster, result.readyNodes,
    36             this.maxRequestSize, now);
    37 
    38     // 保证发送消息的顺序
    39     if (guaranteeMessageOrder) {
    40         // Mute all the partitions drained
    41         for (List<ProducerBatch> batchList : batches.values()) {
    42             for (ProducerBatch batch : batchList)
    43                 this.accumulator.mutePartition(batch.topicPartition);
    44         }
    45     }
    46 
    47     // 过期的batch
    48     List<ProducerBatch> expiredBatches = this.accumulator.expiredBatches(this.requestTimeout, now);
    49     boolean needsTransactionStateReset = false;
    50     // Reset the producer id if an expired batch has previously been sent to the broker. Also update the metrics
    51     // for expired batches. see the documentation of @TransactionState.resetProducerId to understand why
    52     // we need to reset the producer id here.
    53     if (!expiredBatches.isEmpty())
    54         log.trace("Expired {} batches in accumulator", expiredBatches.size());
    55     for (ProducerBatch expiredBatch : expiredBatches) {
    56         failBatch(expiredBatch, -1, NO_TIMESTAMP, expiredBatch.timeoutException());
    57         if (transactionManager != null && expiredBatch.inRetry()) {
    58             needsTransactionStateReset = true;
    59         }
    60         this.sensors.recordErrors(expiredBatch.topicPartition.topic(), expiredBatch.recordCount);
    61     }
    62     if (needsTransactionStateReset) {
    63         transactionManager.resetProducerId();
    64         return 0;
    65     }
    66     sensors.updateProduceRequestMetrics(batches);
    67     // If we have any nodes that are ready to send + have sendable data, poll with 0 timeout so this can immediately
    68     // loop and try sending more data. Otherwise, the timeout is determined by nodes that have partitions with data
    69     // that isn't yet sendable (e.g. lingering, backing off). Note that this specifically does not include nodes
    70     // with sendable data that aren't ready to send since they would cause busy looping.
    71     // 到底返回的这个pollTimeout是啥,我觉得用英文的注释解释比较清楚
    72     // 1.The amount of time to block if there is nothing to do
    73     // 2.waiting for a channel to become ready; if zero, block indefinitely;
    74     long pollTimeout = Math.min(result.nextReadyCheckDelayMs, notReadyTimeout);
    75     if (!result.readyNodes.isEmpty()) {
    76         log.trace("Nodes with data ready to send: {}", result.readyNodes);
    77         // if some partitions are already ready to be sent, the select time would be 0;
    78         // otherwise if some partition already has some data accumulated but not ready yet,
    79         // the select time will be the time difference between now and its linger expiry time;
    80         // otherwise the select time will be the time difference between now and the metadata expiry time;
    81         pollTimeout = 0;
    82     }
    83 
    84     // 发送消息
    85     // 最后调用client.send()
    86     sendProduceRequests(batches, now);
    87     return pollTimeout;
    88 }

      其中也需要了解这个方法:org.apache.kafka.clients.producer.internals.RecordAccumulator#ready。返回的类中3个关键参数的解释都在注释里。烦请看注释,我解释不好的地方可以看英文,原汁原味最好:

     1 /**
     2  * Get a list of nodes whose partitions are ready to be sent, and the earliest time at which any non-sendable
     3  * partition will be ready; Also return the flag for whether there are any unknown leaders for the accumulated
     4  * partition batches.
     5  * <p>
     6  * A destination node is ready to send data if:
     7  * <ol>
     8  * <li>There is at least one partition that is not backing off its send
     9  * <li><b>and</b> those partitions are not muted (to prevent reordering if
    10  *   {@value org.apache.kafka.clients.producer.ProducerConfig#MAX_IN_FLIGHT_REQUESTS_PER_CONNECTION}
    11  *   is set to one)</li>
    12  * <li><b>and <i>any</i></b> of the following are true</li>
    13  * <ul>
    14  *     <li>The record set is full</li>
    15  *     <li>The record set has sat in the accumulator for at least lingerMs milliseconds</li>
    16  *     <li>The accumulator is out of memory and threads are blocking waiting for data (in this case all partitions
    17  *     are immediately considered ready).</li>
    18  *     <li>The accumulator has been closed</li>
    19  * </ul>
    20  * </ol>
    21  */
    22 /**
    23  * @return ReadyCheckResult类的三个变量解释
    24  * 1.Set<Node> readyNodes 准备好发送的节点
    25  * 2.long nextReadyCheckDelayMs 下次检查节点的延迟时间
    26  * 3.Set<String> unknownLeaderTopics 哪些topic找不到leader节点
    27  *
    28  * 一个节点满足以下任一条件则表示可以发送数据 
    29  * 1.batch满了
    30  * 2.batch没满,但是等了lingerMs的时间
    31  * 3.accumulator满了
    32  * 4.accumulator关了
    33  */
    34 public ReadyCheckResult ready(Cluster cluster, long nowMs) {
    35     Set<Node> readyNodes = new HashSet<>();
    36     long nextReadyCheckDelayMs = Long.MAX_VALUE;
    37     Set<String> unknownLeaderTopics = new HashSet<>();
    38     boolean exhausted = this.free.queued() > 0;
    39     for (Map.Entry<TopicPartition, Deque<ProducerBatch>> entry : this.batches.entrySet()) {
    40         TopicPartition part = entry.getKey();
    41         Deque<ProducerBatch> deque = entry.getValue();
    42         Node leader = cluster.leaderFor(part);
    43         synchronized (deque) {
    44             // leader没有且队列非空则添加unknownLeaderTopics
    45             if (leader == null && !deque.isEmpty()) {
    46                 // This is a partition for which leader is not known, but messages are available to send.
    47                 // Note that entries are currently not removed from batches when deque is empty.
    48                 unknownLeaderTopics.add(part.topic());
    49 
    50                 // 如果readyNodes不包含leader且muted不包含part
    51                 // mute这个变量跟producer端的一个配置有关系:max.in.flight.requests.per.connection=1
    52                 // 主要防止topic同分区下的消息乱序问题,限制了producer在单个broker连接上能够发送的未响应请求的数量
    53                 // 如果设置为1,则producer在收到响应之前无法再给该broker发送该topic的PRODUCE请求
    54             } else if (!readyNodes.contains(leader) && !muted.contains(part)) {
    55                 ProducerBatch batch = deque.peekFirst();
    56                 if (batch != null) {
    57                     long waitedTimeMs = batch.waitedTimeMs(nowMs);
    58                     boolean backingOff = batch.attempts() > 0 && waitedTimeMs < retryBackoffMs;
    59                     // 等待时间
    60                     long timeToWaitMs = backingOff ? retryBackoffMs : lingerMs;
    61                     // batch满了
    62                     boolean full = deque.size() > 1 || batch.isFull();
    63                     // batch过期
    64                     boolean expired = waitedTimeMs >= timeToWaitMs;
    65                     boolean sendable = full || expired || exhausted || closed || flushInProgress();
    66                     if (sendable && !backingOff) {
    67                         readyNodes.add(leader);
    68                     } else {
    69                         long timeLeftMs = Math.max(timeToWaitMs - waitedTimeMs, 0);
    70                         // Note that this results in a conservative estimate since an un-sendable partition may have
    71                         // a leader that will later be found to have sendable data. However, this is good enough
    72                         // since we'll just wake up and then sleep again for the remaining time.
    73                         // 目前还没有leader,下次重试
    74                         nextReadyCheckDelayMs = Math.min(timeLeftMs, nextReadyCheckDelayMs);
    75                     }
    76                 }
    77             }
    78         }
    79     }
    80     return new ReadyCheckResult(readyNodes, nextReadyCheckDelayMs, unknownLeaderTopics);
    81 }

      还有一个方法就是org.apache.kafka.clients.producer.internals.RecordAccumulator#drain,从accumulator缓冲区获取要发送的数据,最大一次性发max.request.size大小的数据(最上面的配置参数里有):

      1 /**
      2  * Drain all the data for the given nodes and collate them into a list of batches that will fit within the specified
      3  * size on a per-node basis. This method attempts to avoid choosing the same topic-node over and over.
      4  *
      5  * @param cluster The current cluster metadata
      6  * @param nodes The list of node to drain
      7  * @param maxSize The maximum number of bytes to drain
      8  * maxSize也就是producer端配置参数max.request.size来控制的,一次最多发多少
      9  * @param now The current unix time in milliseconds
     10  * @return A list of {@link ProducerBatch} for each node specified with total size less than the requested maxSize.
     11  */
     12 public Map<Integer, List<ProducerBatch>> drain(Cluster cluster, Set<Node> nodes, int maxSize, long now) {
     13     if (nodes.isEmpty())
     14         return Collections.emptyMap();
     15     Map<Integer, List<ProducerBatch>> batches = new HashMap<>();
     16     for (Node node : nodes) {
     17         // for循环获取要发的batch
     18         List<ProducerBatch> ready = drainBatchesForOneNode(cluster, node, maxSize, now);
     19         batches.put(node.id(), ready);
     20     }
     21     return batches;
     22 }
     23 
     24 private List<ProducerBatch> drainBatchesForOneNode(Cluster cluster, Node node, int maxSize, long now) {
     25     int size = 0;
     26     // 获取node的partition
     27     List<PartitionInfo> parts = cluster.partitionsForNode(node.id());
     28     List<ProducerBatch> ready = new ArrayList<>();
     29     /* to make starvation less likely this loop doesn't start at 0 */
     30     // 避免每次都从一个partition取,要雨露均沾
     31     int start = drainIndex = drainIndex % parts.size();
     32     do {
     33         PartitionInfo part = parts.get(drainIndex);
     34         TopicPartition tp = new TopicPartition(part.topic(), part.partition());
     35         this.drainIndex = (this.drainIndex + 1) % parts.size();
     36 
     37         // Only proceed if the partition has no in-flight batches.
     38         if (isMuted(tp, now))
     39             continue;
     40 
     41         Deque<ProducerBatch> deque = getDeque(tp);
     42         if (deque == null)
     43             continue;
     44 
     45         // 加锁,不用说了吧
     46         synchronized (deque) {
     47             // invariant: !isMuted(tp,now) && deque != null
     48             ProducerBatch first = deque.peekFirst();
     49             if (first == null)
     50                 continue;
     51 
     52             // first != null
     53             // 查看是否在backoff期间
     54             boolean backoff = first.attempts() > 0 && first.waitedTimeMs(now) < retryBackoffMs;
     55             // Only drain the batch if it is not during backoff period.
     56             if (backoff)
     57                 continue;
     58 
     59             // 超过maxSize且ready里有东西
     60             if (size + first.estimatedSizeInBytes() > maxSize && !ready.isEmpty()) {
     61                 // there is a rare case that a single batch size is larger than the request size due to
     62                 // compression; in this case we will still eventually send this batch in a single request
     63                 // 有一种特殊的情况,batch的大小超过了maxSize,且batch是空的。也就是一个batch大小直接大于一次发送的maxSize
     64                 // 这种情况下最终还是会发送这个batch在一次请求
     65                 break;
     66             } else {
     67                 if (shouldStopDrainBatchesForPartition(first, tp))
     68                     break;
     69 
     70                 // 这块配置下面会讲
     71                 boolean isTransactional = transactionManager != null ? transactionManager.isTransactional() : false;
     72                 ProducerIdAndEpoch producerIdAndEpoch =
     73                     transactionManager != null ? transactionManager.producerIdAndEpoch() : null;
     74                 ProducerBatch batch = deque.pollFirst();
     75                 if (producerIdAndEpoch != null && !batch.hasSequence()) {
     76                     // If the batch already has an assigned sequence, then we should not change the producer id and
     77                     // sequence number, since this may introduce duplicates. In particular, the previous attempt
     78                     // may actually have been accepted, and if we change the producer id and sequence here, this
     79                     // attempt will also be accepted, causing a duplicate.
     80                     //
     81                     // Additionally, we update the next sequence number bound for the partition, and also have
     82                     // the transaction manager track the batch so as to ensure that sequence ordering is maintained
     83                     // even if we receive out of order responses.
     84                     batch.setProducerState(producerIdAndEpoch, transactionManager.sequenceNumber(batch.topicPartition), isTransactional);
     85                     transactionManager.incrementSequenceNumber(batch.topicPartition, batch.recordCount);
     86                     log.debug("Assigned producerId {} and producerEpoch {} to batch with base sequence " +
     87                             "{} being sent to partition {}", producerIdAndEpoch.producerId,
     88                         producerIdAndEpoch.epoch, batch.baseSequence(), tp);
     89 
     90                     transactionManager.addInFlightBatch(batch);
     91                 }
     92                 // 添加batch,并且close
     93                 batch.close();
     94                 size += batch.records().sizeInBytes();
     95                 ready.add(batch);
     96 
     97                 batch.drained(now);
     98             }
     99         }
    100     } while (start != drainIndex);
    101     return ready;
    102 }

    三.幂等性producer

      上面说到一个参数,enable.idempotence。0.11.0.0版本引入的幂等性producer表示它的发送操作是幂等的,也就是说,不会存在各种错误导致的重复消息。(比如说瞬时的发送错误可能导致producer端出现重试,同一个消息可能发送多次)

      producer发送到broker端的每批消息都会有一个序列号(用于去重),Kakfa会把这个序列号存在底层日志,保存序列号只需要几个字节,开销很小。producer端会分配一个PID,对于PID、分区和序列号的关系,可以想象成一个哈希表,key就是(PID,分区),value就是序列号。比如第一次给broker发送((PID=1,分区=1),序列号=2),第二次发送的value比2小或者等于2,则broker会拒绝PRODUCE请求,实现去重。

      这个只能保证单个producer实例的EOS语义

  • 相关阅读:
    scala与java的区别
    寒假第四天
    冲刺(第六天)
    冲刺(第五天)
    冲刺(第四天)
    冲刺(第三天)
    冲刺(第二天)
    第十周总结
    冲刺(第一天)
    文本中单词统计
  • 原文地址:https://www.cnblogs.com/GrimMjx/p/11354987.html
Copyright © 2011-2022 走看看