Which partition the consumer is connected to? - java

How can I get the partition the consumer is connected to?
If it is subscribed to multiple topics, How to get metrics (incoming-byte-rate) for different topics from consumer.metrics()?

If you are part of a consumer group and are leveraging on the auto partitions assignment, every time partitions are assigned (due to a rebalance), the onPartitionsAssigned() (of the ConsumerRebalanceListener) is called with the collection of the assigned partitions.
Btw in any moment you can call the assignment() method to get the assigned partitions.
not quite sure but yes, the method metrics() should do the work.

Related

Mixing manual and automatic partition assignments in different Kafka consumer groups

According to the Kafka consumer documentation there are two ways for a Kafka consumer to register itself with Kafka: Either it subscribes to a topic or it assigns itself to partitions. In the first case, Kafka will balance the partitions of this topic between multiple instances of consumer with the same group.id, in the second case the consumers themselves are responsible for this.
Obviously it makes little sense to mix these two approaches within a consumer group. And the Kafka documentation explicitly states that this isn't not possible:
Note that it isn't possible to mix manual partition assignment (i.e.
using assign) with dynamic partition assignment through topic
subscription (i.e. using subscribe).
However it does not clearly state the scope within which that is not possible. Therefore my question:
Is it possible to have on the same topic a consumer with manual partition assignment and other consumers with a different group.id with dynamic partition assignment
through topic subscription?
As long as there is a different group ID, then yes, there is no limitation to using assign or subscribe on the same topic

Subscribe to every topic partition

What is the canonical way to subscribe multiple times to a given Kafka topic and receive every message from every partition for each KafkaConsumer.
What I am doing as the moment is generating a random Uuid group.id so that each subscription is a new group, but given these subscriptions are short-lived (and there are many of them), the overhead of Kafka storing metadata about them might be detrimental.
What is the correct way to acheive this?
I believe the answer to this question is to use the assign() method rather than subscribe().
Manual topic assignment through this method does not use the
consumer's group management functionality.
Reference: https://kafka.apache.org/26/javadoc/index.html?org/apache/kafka/clients/consumer/KafkaConsumer.html
Well, having unique consumer groups is the way to ensure that your consumer(s) running inside a group subscribes to all partitions and receives all the messages. That is the purpose of multiple consumer group subscribing to the same topic.
I agree that it requires you to create multiple consumer groups and that gives the overhead of metadata. But it all depends on your usecase requirement whether you want single/multiple consumer groups.

Force kafka consumer to poll partition with highest lag

I have a setup where several KafkaConsumers each handle a number of partitions on a single topic. They are statically assigned the partitions, in a way that ensures that each consumer has an equal number of partitions to handle. The record key is also chosen so that we have equal distribution of messages over all partitions.
At times of heavy load, we often see a small number of partitions build up a considerable lag (thousands of messages/several minutes worth), while other partitions that are getting the same load and are consumed by the same consumer manage to keep the lag down to a few hundred messages / couple of seconds.
It looks like the consumer is fetching records as fast as it can, going around most of the partitions, but now and then there is one partition that gets left out for a long time. Ideally, I'd like to see the lag spread out more evenly across the partitions.
I've been reading about KafkaConsumer poll behaviour and configuration for a while now, and so far I think there's 2 options to work around this:
Build something custom that can monitor the lag per partition, and use KafkaConsumer.pause() and .resume() to essentially force the KafkaConsumer to read from the partitions with the most lag
Restrict our KafkaConsumer to only ever subscribe to one TopicPartition, and work with multiple instances of KafkaConsumer.
Neither of these options seem like the proper way to handle this. Configuration also doesn't seem to have the answer:
max.partition.fetch.bytes only specifies the max fetch size for a single partition, it doesn't guarantee that the next fetch will be from another partition.
max.poll.interval.ms only works for consumer groups and not on a per-partition basis.
Am I missing a way to encourage the KafkaConsumer to switch partition more often? Or a way to implement a preference for the partitions with the highest lag?
Not sure wether the answer is still relevant to you or if my answer exactly replies to your needs, However, you could try a lag aware assignor. This assignor which assign partitions to consumers ensures that consumers are assigned partitions so that the lag among consumers is assigned uniformly/equally. Here is a well written code that I used it that implements a lag based assignor.
https://github.com/grantneale/kafka-lag-based-assignor
All what you need is to configure you consumer to use this assignor. The below statament.
props.put(ConsumerConfig.PARTITION_ASSIGNMENT_STRATEGY_CONFIG, LagBasedPartitionAssignor.class.getName());

Apache Kafka Message broadcasting

I am studying Apache-kafka and have some confusion. Please help me to understand the following scenario.
I have a topic with 5 partitions and 5 brokers in a Kafka cluster. I am maintaining my message order in Partition 1(say P1).I want to broadcast the messages of P1 to 10 consumers.
So my question is; how do these 10 consumers interact with topic partition p1.
This is probably not how you want to use Kafka.
Unless you're being explicit with how you set your keys, you can't really control which partition your messages end up in when producing to a topic. Partitions in Kafka are designed to be more like low-level plumbing, something that exists, but you don't usually have to interact with. On the consumer side, you will be assigned partitions based on how many consumers are active for a particular consumer group at any one time.
One way to get around this is to define a topic to have only a single partition, in which case, of course, all messages will go to that partition. This is not ideal, since Kafka won't be able to parallelize data ingestion or serving, but it is possible.
So, having said that, let's assume that you did manage to put all your messages in partition 1 of a specific topic. When you fire up a consumer of that topic with consumer group id of consumer1, it will be assigned all the partitions for that topic, since that consumer is the only active one for that particular group id. If there is only one partition for that topic, like explained above, then that consumer will get all the data. If you then fire up a second consumer with the same group id, Kafka will notice there's a second consumer for that specific group id, but since there's only one partition, it can't assign any partitions to it, so that consumer will never get any data.
On the other hand, if you fire up a third consumer with a different consumer group id, say consumer2, that consumer will now get all the data, and it won't interfere at all with consumer1 message consumption, since Kafka keeps track of their consuming offsets separately. Kafka keeps track of which offset each particular ConsumerGroupId is at on each partition, so it won't get confused if one of them starts consuming slowly or stops for a while and restarts consuming later that day.
Much more detailed information here on how Kafka works here: https://kafka.apache.org/documentation/#gettingStarted
And more information on how to use the Kafka consumer at this link:
https://kafka.apache.org/20/javadoc/index.html?org/apache/kafka/clients/consumer/KafkaConsumer.html
#mjuarez's answer is absolutely correct - just for brevity I would reduce it to the following;
Don't try and read only from a single partition because it's a low level construct and it somewhat undermines the parallelism of Kafka. You're much better off just creating more topics if you need finer separation of data.
I would also add that most of the time a consumer needn't know which partition a message came from, in the same way that I don't eat a sandwich differently depending on which store it came from.
#mjuarez is actually not correct and I am not sure why his comment is being falsely confirmed by the OP. You can absolutely explicitly tell Kafka which partition a producer record pertains to using the following:
ProducerRecord(
java.lang.String topic,
java.lang.Integer partition, // <--------- !!!
java.lang.Long timestamp,
K key,
V value)
https://kafka.apache.org/10/javadoc/org/apache/kafka/clients/producer/ProducerRecord.html#ProducerRecord-java.lang.String-java.lang.Integer-java.lang.Long-K-V-
So most of what was said after that becomes irrelevant.
Now to address the OP question directly: you want to accomplish a broadcast. To have a message sent once and read more than once you would have to have a different consumer group for each reader.
And that use case is an absolutely valid Kafka usage paradigm.
You can accomplish that using RabbitMQ too:
https://www.rabbitmq.com/tutorials/tutorial-three-java.html
... but the way it is done is not ideal because multiple out-of-process queues are involved.

Order of messages from multiple topics Kafka

I'm developing a software that uses Apache Kafka. I've got one consumer that subscribed to multiple topics, I'd like to know if there is an order for receiving messages from those topics. I tried some combination on my computer but I need to be sure about this.
Example
Consumer sub to topic1 and topic2
Producer1 write something on topic1
Producer2 write something on topic2
Producer1 write something on topic1
When the consumer polls, it receives a list of records containing first the messages from the first topic that he subscribed and then the messages from the other topic.
I'd like to know if it is always like this, i.e. the messages are in order like the topics that I subscribed.
Thanks
[EDIT] I'd like to specify that I have the two topics with one partition each, and only one producer and one consumer. I need to read first all the messages from the first topic and then the messages from the other topic
Kafka gives you only the guarantee of messages ordering inside a partition. It means that even with only one topic but more than one partitions you have no guarantee that messages are received in the same order they are sent.
Regarding your use case with two topics there is no relation between subscription order to the topics and messages ordering even because if the cluster has more than one node, the topic partition leader will be on different brokers and the client receives messages over different connections. Btw even with only one broker with all topics/partitions on that you can't have the guarantee you are describing.
No. Message ordering is only preserved within partitions (not even within topics).
If you need stronger ordering guarantees, you have to re-arrange messages in your application, for example using a timestamp (and a sufficiently large window buffer to catch all the ones that arrive out-of-order). Support for this has improved a bit with the recent addition of timestamps for all messages by Kafka itself, but the principle remains the same.
Why not first subscribe to the first topic and do a poll, and then subscribe to the other topic and do another poll? Without this, I don't think there is any guarantee in which order you receive messages from the two topics.

Categories