I'm trying to create a priority queue using Kafka Consumer, I read about Bucket pattern,and how to use it in one Topic, distributing partitions between the Buckets.
But what I need is to do here, is using different topics high, medium, low ensure I will only consume from High as long as there are message in the topic to be consumed
Kafka Consumer provide a way to pass a list of topics to the consumers but it seems he is doing round-robin balancing between topics
Here a code example that I did to prove it
https://github.com/politrons/reactive/blob/master/kafka/src/test/java/com/politrons/kafka/KafkaOrdering.java
What I would like to know is, if exist any mechanism in Kafka Consumer to tell him, when he is subscribing to several topics, that he have to focus on one, and no read anything from another topic, if this first one is still getting events.
Regards.
Related
I have requirement to implement healthcheck and as part of that I have to find if producer will be able to publish message and consumer will be able to consumer message, for this I have to check that connection to cluster is working which can be checked using "connection_count" metric but that doesn't give true picture especially for consumer which will be tied to certain brokers on which partition for this consumer is.
Situation with producer is even more tricky as Producer might be publishing the message to any broker which holds the partition for topic on which producer is publishing.
In nutshell, how do I find the health of relevant brokers on producer/consumer sude.
Ultimately, I divide the question into a few checks.
Can you reach the broker? AdminClient.describeCluster works for this
Can you descibe the Topic(s) you are using? AdminClient.describeTopic can do that
Is the ISR list for those topics higher than min.in.sync.replicas? Extrapolate data from (2)
On the producer side, if you set at least acks=1, and there is no ack callback, or you could expose JMX data around the buffer size and if the producer's buffer isn't periodically flushed, then it is not healthy.
For the consumer, look at the conditions under which a rebalance will happen (such as long processing times between polls), then you can quickly identify what it means to be "unhealthy" for them. Attaching partition assignment + rebalance listeners can help here.
Some of these concepts I've written between
dropwizard-kafka (also has Producer and Consumer checks)
remora
I would like to think Spring has something similar
I am studying Apache-kafka and have some confusion. Please help me to understand the following scenario.
I have a topic with 5 partitions and 5 brokers in a Kafka cluster. I am maintaining my message order in Partition 1(say P1).I want to broadcast the messages of P1 to 10 consumers.
So my question is; how do these 10 consumers interact with topic partition p1.
This is probably not how you want to use Kafka.
Unless you're being explicit with how you set your keys, you can't really control which partition your messages end up in when producing to a topic. Partitions in Kafka are designed to be more like low-level plumbing, something that exists, but you don't usually have to interact with. On the consumer side, you will be assigned partitions based on how many consumers are active for a particular consumer group at any one time.
One way to get around this is to define a topic to have only a single partition, in which case, of course, all messages will go to that partition. This is not ideal, since Kafka won't be able to parallelize data ingestion or serving, but it is possible.
So, having said that, let's assume that you did manage to put all your messages in partition 1 of a specific topic. When you fire up a consumer of that topic with consumer group id of consumer1, it will be assigned all the partitions for that topic, since that consumer is the only active one for that particular group id. If there is only one partition for that topic, like explained above, then that consumer will get all the data. If you then fire up a second consumer with the same group id, Kafka will notice there's a second consumer for that specific group id, but since there's only one partition, it can't assign any partitions to it, so that consumer will never get any data.
On the other hand, if you fire up a third consumer with a different consumer group id, say consumer2, that consumer will now get all the data, and it won't interfere at all with consumer1 message consumption, since Kafka keeps track of their consuming offsets separately. Kafka keeps track of which offset each particular ConsumerGroupId is at on each partition, so it won't get confused if one of them starts consuming slowly or stops for a while and restarts consuming later that day.
Much more detailed information here on how Kafka works here: https://kafka.apache.org/documentation/#gettingStarted
And more information on how to use the Kafka consumer at this link:
https://kafka.apache.org/20/javadoc/index.html?org/apache/kafka/clients/consumer/KafkaConsumer.html
#mjuarez's answer is absolutely correct - just for brevity I would reduce it to the following;
Don't try and read only from a single partition because it's a low level construct and it somewhat undermines the parallelism of Kafka. You're much better off just creating more topics if you need finer separation of data.
I would also add that most of the time a consumer needn't know which partition a message came from, in the same way that I don't eat a sandwich differently depending on which store it came from.
#mjuarez is actually not correct and I am not sure why his comment is being falsely confirmed by the OP. You can absolutely explicitly tell Kafka which partition a producer record pertains to using the following:
ProducerRecord(
java.lang.String topic,
java.lang.Integer partition, // <--------- !!!
java.lang.Long timestamp,
K key,
V value)
https://kafka.apache.org/10/javadoc/org/apache/kafka/clients/producer/ProducerRecord.html#ProducerRecord-java.lang.String-java.lang.Integer-java.lang.Long-K-V-
So most of what was said after that becomes irrelevant.
Now to address the OP question directly: you want to accomplish a broadcast. To have a message sent once and read more than once you would have to have a different consumer group for each reader.
And that use case is an absolutely valid Kafka usage paradigm.
You can accomplish that using RabbitMQ too:
https://www.rabbitmq.com/tutorials/tutorial-three-java.html
... but the way it is done is not ideal because multiple out-of-process queues are involved.
I am trying to build a application that can read through a kafka topic but I need it to have a "see previous" button. I know how to seek through a particular partition but is it possible to back through all the messages in a topic in the order that they were read in? I am using Java KafkaConsumer.
Offsets are per partition and there is no ordering between messages across different partitions. You can't go back through all the messages in a topic in the order that they were read in as kafka doesn't know the order in which the consumer read in messages from different partitions (different partitions may be read from different brokers across the cluster). However, what you could do is do ordered buffering in your app as you read messages in, that would allow you to go back as far as the buffer capacity.
I'm developing a software that uses Apache Kafka. I've got one consumer that subscribed to multiple topics, I'd like to know if there is an order for receiving messages from those topics. I tried some combination on my computer but I need to be sure about this.
Example
Consumer sub to topic1 and topic2
Producer1 write something on topic1
Producer2 write something on topic2
Producer1 write something on topic1
When the consumer polls, it receives a list of records containing first the messages from the first topic that he subscribed and then the messages from the other topic.
I'd like to know if it is always like this, i.e. the messages are in order like the topics that I subscribed.
Thanks
[EDIT] I'd like to specify that I have the two topics with one partition each, and only one producer and one consumer. I need to read first all the messages from the first topic and then the messages from the other topic
Kafka gives you only the guarantee of messages ordering inside a partition. It means that even with only one topic but more than one partitions you have no guarantee that messages are received in the same order they are sent.
Regarding your use case with two topics there is no relation between subscription order to the topics and messages ordering even because if the cluster has more than one node, the topic partition leader will be on different brokers and the client receives messages over different connections. Btw even with only one broker with all topics/partitions on that you can't have the guarantee you are describing.
No. Message ordering is only preserved within partitions (not even within topics).
If you need stronger ordering guarantees, you have to re-arrange messages in your application, for example using a timestamp (and a sufficiently large window buffer to catch all the ones that arrive out-of-order). Support for this has improved a bit with the recent addition of timestamps for all messages by Kafka itself, but the principle remains the same.
Why not first subscribe to the first topic and do a poll, and then subscribe to the other topic and do another poll? Without this, I don't think there is any guarantee in which order you receive messages from the two topics.
Suppose you have multiple producers and one consumer which wants to receive persistent messages from all publishers available.
Producers work with different speed. Let's say that system A produces 10 requests/sec and system B 1 request/sec. So if you use the only queue you will process 10 messages from A then 1 message from B.
But what if you want to balance load and process one message from A then one message from B etc.? Consuming from multiple queues is not a good option because we can't use wildcard binding in this case.
Update:
Queue per producer seems as the best approach. Producers don't know their speed which changes constantly. Having one queue per consumer I can subscribe to one topic and receive messages from all publishers available. But having a queue per producer I need to code the logic by myself:
Get all available queues through management plugin (AMQP doesn't allow to list queues).
Filter by queue name.
Implement round robin strategy.
Implement notification mechanism to subscribe to new publishers that can appear at any moment.
Remove unnecessary queue when publisher had disappeared and client read all messages.
Well, it seems pretty easy but I thought that broker could provide all of this functionality without any coding. In case with one queue I just create one persistent queue, bind it to a topic exchange then start any number of publishers that send messages to the topic. This option works almost out of the box.
I know I'm late for the party, but still.
In Azure Service Bus terms it's called "partitioning" and it's based on the partition key. The best part is in Azure SB the receiving client is not aware of the partitioning, it simply subscribes to the single queue.
In RabbitMQ there is a X-Consistent-Hashing plugin ("rabbitmq_consistent_hash_exchange") but unfortunately it's not that convenient. The consumers must be explicitly configured to consume from specific queues. If you have ten queues then you need to setup your consumers so that all ten are covered.
Another two options:
Random Exchange Type
Sharding Plugin
Bear in mind that with the Sharding Plugin even though it creates "one logical queue to consume" you'll have to have as many subscribers as there are virtual queues, otherwise some of the queues will be left unconsumed.
You can use the Priority Queue Support and associate a priority according to the producer speed. With the caveat that the priority must be set with caution (for example, if the consumer speed is below the system B, the consumer will only consume messages from B) and producers must be aware of their producing speed.
Another option to consider is creating 3 types of queues according to the producing speed: HIGH, MEDIUM, LOW. The three queues are binded to the exchange with the binding key set according to the producing speed. It could be done using.
Consumer will consume messages from these 3 queues using a round robin strategy. With the caveat that producers must be aware of their producing speed.
But the best option may be a queue per producer especially if producers speed is not stable and cannot be categorized. Thus, producers do not need to know their producing speed.