I have a simple Kafka consumer micro-service application which consumes message from some topic and the same application is running in two different pool .
So when message is produced from producer and my app tries to consume the message from topic and it is consumed by one only in one pool.
How can I stop the concurrent message reading from consumer kafka. I want to consume the same message in both the pool.
What can be the possible solution for this scenario
If your topic only has one partition, then only one consumer of the same group.id setting can read this partition. If you must have one partition for ordering purposes, then you need unique group ids for reading this data in multiple consumers.
Otherwise, if the topic has multiple partitions, then 2 consumers should actively be reading them.
You'd "stop" reading by not calling poll().
Related
Lets say there is a topic in Apache Kafka with 3 partitions. I need to run 3 consumers inside a consumer group and, according to the documentation, it means each consumer will read data from 1 partition.
Consumers are implemented using Spring Kafka. As we all know, by default all messages are received in a single thread, but using ConcurrentMessageListenerContainer should allow us to set up concurrency.
What I want? I want to use server CPU resources efficiently and make each consumer to receive and process messsages in separate threads (3 threads in our case, which is equal to the number of partitions).
As a result - 3 consumers (3 servers) in the consumer group and each consumer receives messages from all 3 partitions.
Is it possible? If yes, will it be enough if I just use ConcurrentMessageListenerContainerand specify 3 listeners for each partition?
I was little confused by your statement. Just to clarify, in Kafka only one consumer can read from one partition within a consumer group. It is not possible for two consumers in same consumer group to read from same partition.
Within a consumer group,
if no of consumers is greater than number of partition, then extra consumer threads will be idle.
if no of consumers is less than number of partition, then same consumer thread will read from multiple partitions
this code snippet will read from topic named "mytopic" and it will use 3 thread to read from 3 partitions #KafkaListener(topics = "mytopic", concurrency = "3", groupId = "myconsumergroup")
From the API document of Kafka I found a description of the send() method in Apache Kafka(api document of producer):
“The send is asynchronous and this method will return immediately once the record has been stored in the buffer of records waiting to be sent. This allows sending many records in parallel without blocking to wait for the response after each one.”
I’m just wondering how are the records send in parallel? If I have 3 brokers, and on each broker with 3 partitions under the same topic, will Kafka producer send records to the 9 partitions in parallel? Or producer just send records to 3 brokers in parallel? How does producer work in a parallel way?
Kafka client uses a org.apache.kafka.common.requests.ProduceRequest that can carry payloads for multiple partitions at once (see http://kafka.apache.org/protocol.html#The_Messages_Produce).
So it sends (using org.apache.kafka.clients.NetworkClient) in three requests in parallel, to each of (three) brokers, i.e.:
- sends records for topic-partition0, topic-partition1, topic-partition2 to broker 1
- sends records for topic-partition3, topic-partition4, topic-partition5 to broker 2
- sends records for topic-partition6, topic-partition7, topic-partition8 to broker 3
You can control how much batching is done with producer configuration.
(notice I answered with 9 unique partitions, if you meant replicated partitions, you send only to leader - then the replication will handle the propagation).
Yes, the Producer will batch up the messages destined for each partition leader and will be sent in parallel. From the API Docs:
The send() method is asynchronous. When called it adds the record to a
buffer of pending record sends and immediately returns. This allows
the producer to batch together individual records for efficiency.
and
The producer maintains buffers of unsent records for each partition.
These buffers are of a size specified by the batch.size config. Making
this larger can result in more batching, but requires more memory
(since we will generally have one of these buffers for each active
partition).
Here's a diagram to help:
I need to call Kafka consumer in publish/subscribe mode 1000 times. As far as I know for kafka to work in pub/subscribe mode I need to give a new groupId to each consumer( props.put("group.id", String.valueOf(Instant.now().toEpochMilli()));). But when I do this if two consumer threads access consumer at the same millisecond there will be problems. How should this problem be solved?
If you want to spread the messages across the consumers you need to use the same group.id. If you have 1000 messages and 1000 consumers, then each of the consumer will normally consume one message.
On the other hand, if you want each of the consumer to consume all the messages from the topics, you need to use a different group.id so that the messages in the topic are consumed by all consumers. If you have a huge number of consumers you can use UUID.randomUUID().toString() in order to produce a distinct group.id for each one.
According to the docs:
Consumers label themselves with a consumer group name, and each record
published to a topic is delivered to one consumer instance within each
subscribing consumer group. Consumer instances can be in separate
processes or on separate machines.
If all the consumer instances have the same consumer group, then the
records will effectively be load balanced over the consumer instances.
If all the consumer instances have different consumer groups, then
each record will be broadcast to all the consumer processes.
A case where senders are sending messages to a Queue, for example message1 is sent by sender1 to a queue. Now a consumer named consumer1 connects to queue and reads the message message1.
There is another consumer named consumer2. But the message message1 is already consumed by consumer1 so it will not be available for consumer2.
When a next message arrives in queue, consumer2 might receive that message if it reads the queue before consumer1.
Does it mean that it all is a case whether one consumer reads the queue before the other in order to get the first message available from the queue?
This is the nature of a Queue in JMS, messages are sent to one consumer and once ack'd they are gone, the next consumer can get the next message and so on. This is often referred to as competing consumers or load balancing. The consumers can share the work as jobs or work items are enqueued which allows for higher throughput when the work associated with the items in the Queue can take significant time.
There are options depending on the messaging broker to make a consumer exclusive such that only that consumer can read messages from the queue while the other consumers sit and wait for the exclusive consumer to leave which makes them backups of a sort.
Other options are to use something like Apache Camel to route a given message to more than one queue, or to use AcitveMQ Virtual Topics to send messages to a Topic and have that message then enqueue onto specific consumer Queues.
The solution depends on the broker you are using and the problem you are trying to solve, none of which you've really made clear in the question.
I'm developing a software that uses Apache Kafka. I've got one consumer that subscribed to multiple topics, I'd like to know if there is an order for receiving messages from those topics. I tried some combination on my computer but I need to be sure about this.
Example
Consumer sub to topic1 and topic2
Producer1 write something on topic1
Producer2 write something on topic2
Producer1 write something on topic1
When the consumer polls, it receives a list of records containing first the messages from the first topic that he subscribed and then the messages from the other topic.
I'd like to know if it is always like this, i.e. the messages are in order like the topics that I subscribed.
Thanks
[EDIT] I'd like to specify that I have the two topics with one partition each, and only one producer and one consumer. I need to read first all the messages from the first topic and then the messages from the other topic
Kafka gives you only the guarantee of messages ordering inside a partition. It means that even with only one topic but more than one partitions you have no guarantee that messages are received in the same order they are sent.
Regarding your use case with two topics there is no relation between subscription order to the topics and messages ordering even because if the cluster has more than one node, the topic partition leader will be on different brokers and the client receives messages over different connections. Btw even with only one broker with all topics/partitions on that you can't have the guarantee you are describing.
No. Message ordering is only preserved within partitions (not even within topics).
If you need stronger ordering guarantees, you have to re-arrange messages in your application, for example using a timestamp (and a sufficiently large window buffer to catch all the ones that arrive out-of-order). Support for this has improved a bit with the recent addition of timestamps for all messages by Kafka itself, but the principle remains the same.
Why not first subscribe to the first topic and do a poll, and then subscribe to the other topic and do another poll? Without this, I don't think there is any guarantee in which order you receive messages from the two topics.