Kafka Producer配置
acks String,默认是acks=1
acks=0 如果设置为0,那么生产者将不等待任何消息确认。消息将立刻添加到socket缓冲区并考虑发送。在这种情况下不能保障消息被服务器接收到。并且重试机制不会生效(因为客户端不知道故障了没有)。每个消息返回的offset始终设置为-1。
acks=all 这意味着leader将等待所有副本同步后应答消息。此配置保障消息不会丢失(只要至少有一个同步的副本或者)。这是最强壮的可用性保障。等价于acks=-1。
retries int,默认retries=1
max.in.flight.requests.per.connection int , 默认=5
max.request.size int , 默认=1048576 Byte=1 M
Kafka Record
•Every message publish to Kafka called “Record”
•Record contain two parts:
•Used by compaction or for message grouping
•If a key is sent, then the producer has the guarantee that all messages for that key will always go to the same partition
•This enables to guarantee ordering for a specific key
•The content of data goes
•Consumers read data from a topic
•They only have to specify the topic name and one broker to connect to, and Kafka will automatically take care of pulling the data from the right brokers
•Data is read in order for each partitions
Partitions Count
•Roughly, each partition can get a throughput of 10 MB / sec
•More partitions implies :
•Better parallelism, better throughput
•BUT more files opened on your system
•BUT if a broker fails (unclean shutdown), lots of concurrent leader elections
•BUT added latency to replicate (in the order of milliseconds)
•Partitions per topic = (1 to 2) x (# of brokers), max 10 partitions
•Example: in a 3 brokers setup, 3 or 6 partitions is a good number to start with
Replication Factor
•Should be at least 2, maximum of 3
•The higher the replication factor:
•Better resilience of your system (N-1 brokers can fail)
•BUT longer replication (higher latency is acks=all)
•BUT more disk space on your system (50% more if RF is 3 instead of 2)
•Set it to 2(if you have 3 brokers)
•Set it to 3 (if you have greater than 5 brokers)
•If replication performance is an issue, get a better broker instead of less replication factor
Partitions and Segments
•Topics are made of partitions (we already know that)
•Partitions are made of … segments(files)!
•Only one segment is ACTIVE (the one data is being written to)
Segments and Indexes
•Segments come with two indexes (files):
•An offset to position index: allows Kafka where to read to find a message
•A timestamp to offset index: allow Kafka to find messages with a timestamp
•Therefore, Kafka knows where to find data in a constant time!