According to the Kafka consumer documentation there are two ways for a Kafka consumer to register itself with Kafka: Either it subscribes to a topic or it assigns itself to partitions. In the first case, Kafka will balance the partitions of this topic between multiple instances of consumer with the same group.id, in the second case the consumers themselves are responsible for this.
Obviously it makes little sense to mix these two approaches within a consumer group. And the Kafka documentation explicitly states that this isn't not possible:
Note that it isn't possible to mix manual partition assignment (i.e.
using assign) with dynamic partition assignment through topic
subscription (i.e. using subscribe).
However it does not clearly state the scope within which that is not possible. Therefore my question:
Is it possible to have on the same topic a consumer with manual partition assignment and other consumers with a different group.id with dynamic partition assignment
through topic subscription?
As long as there is a different group ID, then yes, there is no limitation to using assign or subscribe on the same topic
Related
What is the canonical way to subscribe multiple times to a given Kafka topic and receive every message from every partition for each KafkaConsumer.
What I am doing as the moment is generating a random Uuid group.id so that each subscription is a new group, but given these subscriptions are short-lived (and there are many of them), the overhead of Kafka storing metadata about them might be detrimental.
What is the correct way to acheive this?
I believe the answer to this question is to use the assign() method rather than subscribe().
Manual topic assignment through this method does not use the
consumer's group management functionality.
Reference: https://kafka.apache.org/26/javadoc/index.html?org/apache/kafka/clients/consumer/KafkaConsumer.html
Well, having unique consumer groups is the way to ensure that your consumer(s) running inside a group subscribes to all partitions and receives all the messages. That is the purpose of multiple consumer group subscribing to the same topic.
I agree that it requires you to create multiple consumer groups and that gives the overhead of metadata. But it all depends on your usecase requirement whether you want single/multiple consumer groups.
I am studying Apache-kafka and have some confusion. Please help me to understand the following scenario.
I have a topic with 5 partitions and 5 brokers in a Kafka cluster. I am maintaining my message order in Partition 1(say P1).I want to broadcast the messages of P1 to 10 consumers.
So my question is; how do these 10 consumers interact with topic partition p1.
This is probably not how you want to use Kafka.
Unless you're being explicit with how you set your keys, you can't really control which partition your messages end up in when producing to a topic. Partitions in Kafka are designed to be more like low-level plumbing, something that exists, but you don't usually have to interact with. On the consumer side, you will be assigned partitions based on how many consumers are active for a particular consumer group at any one time.
One way to get around this is to define a topic to have only a single partition, in which case, of course, all messages will go to that partition. This is not ideal, since Kafka won't be able to parallelize data ingestion or serving, but it is possible.
So, having said that, let's assume that you did manage to put all your messages in partition 1 of a specific topic. When you fire up a consumer of that topic with consumer group id of consumer1, it will be assigned all the partitions for that topic, since that consumer is the only active one for that particular group id. If there is only one partition for that topic, like explained above, then that consumer will get all the data. If you then fire up a second consumer with the same group id, Kafka will notice there's a second consumer for that specific group id, but since there's only one partition, it can't assign any partitions to it, so that consumer will never get any data.
On the other hand, if you fire up a third consumer with a different consumer group id, say consumer2, that consumer will now get all the data, and it won't interfere at all with consumer1 message consumption, since Kafka keeps track of their consuming offsets separately. Kafka keeps track of which offset each particular ConsumerGroupId is at on each partition, so it won't get confused if one of them starts consuming slowly or stops for a while and restarts consuming later that day.
Much more detailed information here on how Kafka works here: https://kafka.apache.org/documentation/#gettingStarted
And more information on how to use the Kafka consumer at this link:
https://kafka.apache.org/20/javadoc/index.html?org/apache/kafka/clients/consumer/KafkaConsumer.html
#mjuarez's answer is absolutely correct - just for brevity I would reduce it to the following;
Don't try and read only from a single partition because it's a low level construct and it somewhat undermines the parallelism of Kafka. You're much better off just creating more topics if you need finer separation of data.
I would also add that most of the time a consumer needn't know which partition a message came from, in the same way that I don't eat a sandwich differently depending on which store it came from.
#mjuarez is actually not correct and I am not sure why his comment is being falsely confirmed by the OP. You can absolutely explicitly tell Kafka which partition a producer record pertains to using the following:
ProducerRecord(
java.lang.String topic,
java.lang.Integer partition, // <--------- !!!
java.lang.Long timestamp,
K key,
V value)
https://kafka.apache.org/10/javadoc/org/apache/kafka/clients/producer/ProducerRecord.html#ProducerRecord-java.lang.String-java.lang.Integer-java.lang.Long-K-V-
So most of what was said after that becomes irrelevant.
Now to address the OP question directly: you want to accomplish a broadcast. To have a message sent once and read more than once you would have to have a different consumer group for each reader.
And that use case is an absolutely valid Kafka usage paradigm.
You can accomplish that using RabbitMQ too:
https://www.rabbitmq.com/tutorials/tutorial-three-java.html
... but the way it is done is not ideal because multiple out-of-process queues are involved.
How can I get the partition the consumer is connected to?
If it is subscribed to multiple topics, How to get metrics (incoming-byte-rate) for different topics from consumer.metrics()?
If you are part of a consumer group and are leveraging on the auto partitions assignment, every time partitions are assigned (due to a rebalance), the onPartitionsAssigned() (of the ConsumerRebalanceListener) is called with the collection of the assigned partitions.
Btw in any moment you can call the assignment() method to get the assigned partitions.
not quite sure but yes, the method metrics() should do the work.
I'm developing a software that uses Apache Kafka. I've got one consumer that subscribed to multiple topics, I'd like to know if there is an order for receiving messages from those topics. I tried some combination on my computer but I need to be sure about this.
Example
Consumer sub to topic1 and topic2
Producer1 write something on topic1
Producer2 write something on topic2
Producer1 write something on topic1
When the consumer polls, it receives a list of records containing first the messages from the first topic that he subscribed and then the messages from the other topic.
I'd like to know if it is always like this, i.e. the messages are in order like the topics that I subscribed.
Thanks
[EDIT] I'd like to specify that I have the two topics with one partition each, and only one producer and one consumer. I need to read first all the messages from the first topic and then the messages from the other topic
Kafka gives you only the guarantee of messages ordering inside a partition. It means that even with only one topic but more than one partitions you have no guarantee that messages are received in the same order they are sent.
Regarding your use case with two topics there is no relation between subscription order to the topics and messages ordering even because if the cluster has more than one node, the topic partition leader will be on different brokers and the client receives messages over different connections. Btw even with only one broker with all topics/partitions on that you can't have the guarantee you are describing.
No. Message ordering is only preserved within partitions (not even within topics).
If you need stronger ordering guarantees, you have to re-arrange messages in your application, for example using a timestamp (and a sufficiently large window buffer to catch all the ones that arrive out-of-order). Support for this has improved a bit with the recent addition of timestamps for all messages by Kafka itself, but the principle remains the same.
Why not first subscribe to the first topic and do a poll, and then subscribe to the other topic and do another poll? Without this, I don't think there is any guarantee in which order you receive messages from the two topics.
New to Kafka.
I'm really confused by Kafka's API:
Version 0.9 is completely different from 0.8.
Then there are the simpleConsumer, the highlevel Consumer and the consumer group
When I instantiate a SimpleConsumer is it associated to a consumer group? Or is the consumer group an abstraction which is used by the high-level consumer?
If I don't care about ordering of messages or duplicates, can I instantiate 2 simpleConsumers that read from the same partition?
Is there a way to use a simpleConsumer to read from the topic without specifying partitions?
With Kafka 0.9 there is a new consumer API as you noted and the two older consumer APIs still exist but will likely be decommissioned in a future release in favour of the new API.
The consumer group concept relates only to the high-level consumer and is a helper to coordinate consumer instances reading from the same set of topics to avoid duplicated messages and allow parallelism with automatic fail-over in case of a consumer instance crash etc. When using the simple consumer API, you have to take care of this coordination yourself and therefore you also need to specify which partitions to read from and it's also not preventing you from having multiple consumers reading from the same partition.
I don't know of a good use case where you would need multiple consumers reading from the same partition though, if you want to consume it for different purposes, you can just use the high-level API with multiple consumer group IDs and they would work independently from each-other.