Kafka message ordering in partition while producer retry

Kafka message ordering in partition while producer retry - java

According to producer configs, there are: retries and max.in.flight.requests.per.connection. Suppose that retries > 0 and max.in.flight.requests.per.connection > 1.
Can messages arrive out of order within ONE partition of topic (e.g. if first message has retries, but second message delivered to broker with the first attempt)?
Or do out of order only happen across several partitions of topic, but within partition order is preserved?

If you set retries to more than 0 and max.in.flight.requests.per.connection to more than 1, then yes messages can arrive out of order on the broker even if they are for the same partition.
You can also have duplicates if for example a message is correctly added to the Kafka logs and an error happens when sending the response back to the client.
Since Kafka 0.11, you can use the Idempotent producer to solve these 2 issues. See http://kafka.apache.org/documentation/#semantics

As per latest update documentation, you can have maximum 5 max.in.flight.requests.per.connection and Kafka can maintain order for this.

Related

When can a Flink job consume from Kafka?

We have a Flink job which has the following topology:
source -> filter -> map -> sink
We set a live(ready) status at the sink operator open-override function. After we get that status, we send events. Sometimes it can't consume the events sent early.
We want to know the exact time/step that we can send data which will not be missing.

It looks like you want to ensure that no message is missed for processing. Kafka will retain your messages, so there is no need to send messages only when the Flink consumer is ready. You can simplify your design by avoiding the status message.
Any Kafka Consumer (not just Flink Connector) will have an offset associated with it in Kafka Server to track the id of the last message that was consumed.
From kafka docs:
Kafka maintains a numerical offset for each record in a partition. This
offset acts as a unique identifier of a record within that partition,
and also denotes the position of the consumer in the partition. For
example, a consumer which is at position 5 has consumed records with
offsets 0 through 4 and will next receive the record with offset 5
In your Flink Kafka Connector, specify the offset as the committed offset.
OffsetsInitializer.committedOffsets(OffsetResetStrategy.EARLIEST)
This will ensure that if your Flink Connector is restarted, it will consume from the last position that it left off, before the restart.
If for some reason, the offset is lost, this will read from the beginning (earliest message) in your Kafka topic. Note that this approach will cause you to reprocess the messages.
There are many more offset strategies you can explore to choose the right one for you.
Refer - https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/datastream/kafka/#starting-offset

Why kakfa topic partition is not receiving messages?

I have a kafka cluster with 3 brokers and a topic with 8 partitions.
A producer written in java using spring boot and without custom rule for load balancing. It means it should do round robin.
The issue is that there are some partitions there are not receiving messages into it. I figured it out checking what the 4 consumers are receiving and even they are processing all messages there is a consumer idle all the time because it has received just one message.
What could be the issue?
Kafka version I'm using is 0.10.1.1
Additional note in this case I'm not using replicas for the partitions

It means it should do round robin.
It will only do round robin, if you have no keys in your Kafka messages. Otherwise, the messages are partitioned based on a hash value of the key:
hash(key) % number_of_partitions
It is not unusal, that this will cause some partitions to not receive any messages at all. Imagine a case, where you are using a key that can only have two different values. In that case, all your data will flow into only two partitions, independent of the number of partitions in your topic.

Multiple Kafka Listeners With Same GroupId All Receiving Message

I have a kafka listener configured in our Spring Boot application as follows:
#KafkaListener(topicPartitions = #TopicPartition(topic = 'data.all', partitions = { "0", "1", "2" }), groupId = "kms")
public void listen(ObjectNode message) throws JsonProcessingException {
// Code to convert to json string and write to ElasticSearch
}
This application gets deployed to and run on 3 servers and, despite all having the group id of kms, they all get a copy of the message which means I get 3 identical records in Elastic. When I'm running an instance locally, 4 copies get written.
I've confirmed that the producer is only writing 1 message to the topic by checking the count of all messages on the topic before and after the write occurs; it only increases by 1. How can I prevent this?

When you manually assign partitions like that, you are responsible for distributing the partitions across the instances.
The group is ignored for the purpose of partition distribution, but is still used to track offsets, if needed.
You must use group management and let Kafka do the partition assignment for you, or assign the partitions manually for each instance.
Instead of topicPartitions, use topics = "data.all"

What happens when you don't assign partition manually
Producer Side
When producer sends a message without any strategy or specifying to which partition message should go then kafka tries to use round robin technic and splits all messages among all available partition.
Message in 2 partitions are unique because maximum of only 1 consumer is recommended to listen to a particular partition of a topic.
Consumer Side
For example there are 2 partitions of a topic
Then a consumer(let say A) joins with consumer group (lets say consumer)
Partition reassignment happens whenever new consumer joins and 2 partitions get assigned to A as we have only one consumer group consumer
Now, consumer B tries to join same consumer group consumer then again partition reassignment will happen and both A & B will get partition to listen to messages
As we have only 2 partitions, even if we add more consumers to same consumer group, there will be only 2 consumer who will be listening to the messages sent to the topic because at a time only 2 consumers can get 1-1 partition. To maintain exclusivity of the message consumed at consumer.
What is happening in your case is, more than 1 consumer is listening to same partitions so all the consumers who are listening to same partitions within same consumer group also, will receive messages from that partition. So mutual exclusivity between consumers in a consumer group is lost due to more than 1 consumer is listening same partitions.

Producer throughput with varying acks=0,1,-1

I have been doing some performance tests with kafka cluster for my project. I have a question regarding the send call and the 'acks' property of producer. I observed below numbers with below invocation of send call. This is a simple fire and forget call.
producer.send(record); // fire and forget call
The topic has 5 partitions and I see below results with different acks value and replication factor. The kafka cluster has 5 nodes running with default values and using local disk
acks Replication factor=1 Replication factor=3
0 1330k msgs/sec 1260k msgs/sec
1 1220k msgs/sec 1200k msgs/sec
-1(all) 1220k msgs/sec 325k msgs/sec
As you can see as the acks value changes from 0 to all, the producer throughput decreases. What I am not able to understand is that if the producer send call is fire and forget in nature (see above) and producer is not waiting for any acknowledgements then why does the producer throughput drops as we move to stronger acks guarantees?
Any insights into how acks and producer send call works internally in Kakfa would be greatly appreciated.
P.S. I had asked this on kafka users mailing list but didn't get a reply so asking this on SO.

The fact that you haven't a callback in the send method doesn't mean that it's fire and forget at the underlying level.
You have configured the producer with 3 different levels of ack which determine the "fire and forget" status or not.
With acks = 0, it means that the producer send the message but doesn't wait for any acks from the broker; it's the real "fire and forget". So as you can see it provides the higher throughput.
With acks = 1, the producer waits for the ack. This ack is sent by the broker (to which the producer is connected and that hosts the leader replica). It's not "fire and forget" of course.
With acks = -1, the producer waits for the ack. This ack is sent by the broker as above but only after having the messages replicated to all the replica followers on the other brokers. Of course in this case the throughput decrease if you increment the replication factor, because the message needs to be copied by more brokers (min.insync.replicas) before the "leader" broker returns back the ack to the producer.
Notice that with replication factor = 1, the ack = 1 and ack = -1 has same throughput because there is just one replica (the leader) so no need to copy to followers.

This is something about how kafka handling the produce request.
First, KafkaProducer.send is async by default. KafkaProducer has taken the heavily working for batch your produce request and send to broker. The broker will acking with produce response which in turn need wait for the min.insync.replicas from the remote followers. That's the reason.

I think the accepted answer is false because the question is about throughput and NOT latency and according to confluent book Kafka: the definitive guide :
If our client code waits for a reply from the server (by calling the
get() method of the Future object returned when sending a message) it
will obviously increase latency significantly (at least by a network
roundtrip). If the client uses callbacks, latency will be hidden, but
throughput will be limited by the number of in-flight messages (i.e.,
how many messages the producer will send before receiving replies from
the server).
So if an asynchronous producer with acks=1,all then the throughput is depending on the max.in.flight.requests.per.connection : The maximum number of unacknowledged requests the client will send on a single connection before blocking

How Kafka Revoke Topic Partitions mechanism Works in 0.9.0.1?

I am bit confused with Kafka Revoke partition mechanism (may be i have implemented my java code bit different.)
As far as i understand:
Under 1st Topic if there are 'N' no. of partitions, 'N' No. of
consumers can consume messages on that topic and when ever new
consumer subscribes to that topic Revoke will occur and partitions
will be re-assigned between those 'N' Consumers.
Now if consumer subscribes to 2nd Topic with multiple Partitions, my
understanding is that Revoke partition on 1st Topic should not
happen (OR it will?)

Revoke usually happens when there is a repartitioning happening. Consider the scenario you have 5 partitions for a topic which 3 consumers under a consumer group are listening to.
Since no of partitions > no of consumers, the consumers jump between partitions at regular intervals. This process is called rebalancing.
Consider consumer 1 is connected to say partition 1 and fetches a batch of messages, process, and then send an acknowledgment back to the broker for these message offsets. But if the consumer is taking more time to send acknowledge for the message batch and a rebalance has happened (i.e consumer 1 left partition 1 and went for some other partition) then this partition revoked event will be thrown.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Kafka message ordering in partition while producer retry - java

As per latest update documentation, you can have maximum 5 max.in.flight.requests.per.connection and Kafka can maintain order for this.

Related

When can a Flink job consume from Kafka?

Why kakfa topic partition is not receiving messages?

Multiple Kafka Listeners With Same GroupId All Receiving Message

Producer throughput with varying acks=0,1,-1

How Kafka Revoke Topic Partitions mechanism Works in 0.9.0.1?

Categories

Resources