I need to call Kafka consumer in publish/subscribe mode 1000 times. As far as I know for kafka to work in pub/subscribe mode I need to give a new groupId to each consumer( props.put("group.id", String.valueOf(Instant.now().toEpochMilli()));). But when I do this if two consumer threads access consumer at the same millisecond there will be problems. How should this problem be solved?
If you want to spread the messages across the consumers you need to use the same group.id. If you have 1000 messages and 1000 consumers, then each of the consumer will normally consume one message.
On the other hand, if you want each of the consumer to consume all the messages from the topics, you need to use a different group.id so that the messages in the topic are consumed by all consumers. If you have a huge number of consumers you can use UUID.randomUUID().toString() in order to produce a distinct group.id for each one.
According to the docs:
Consumers label themselves with a consumer group name, and each record
published to a topic is delivered to one consumer instance within each
subscribing consumer group. Consumer instances can be in separate
processes or on separate machines.
If all the consumer instances have the same consumer group, then the
records will effectively be load balanced over the consumer instances.
If all the consumer instances have different consumer groups, then
each record will be broadcast to all the consumer processes.
Related
I have a simple Kafka consumer micro-service application which consumes message from some topic and the same application is running in two different pool .
So when message is produced from producer and my app tries to consume the message from topic and it is consumed by one only in one pool.
How can I stop the concurrent message reading from consumer kafka. I want to consume the same message in both the pool.
What can be the possible solution for this scenario
If your topic only has one partition, then only one consumer of the same group.id setting can read this partition. If you must have one partition for ordering purposes, then you need unique group ids for reading this data in multiple consumers.
Otherwise, if the topic has multiple partitions, then 2 consumers should actively be reading them.
You'd "stop" reading by not calling poll().
I'm new to Kafka and would like to know the best approach for configuring the topics, partitions, consumer groups and consumer app replicas.
Working on an existing setup, the configuration handed down is as follows:
10 topics
Each topic has its own group i.e. topic1-group1, topic2-group2 and so on.
Each topic has 5 partitions
The Java consumer app has 5 (intentionally same as number of partitions, I'm told) replicas (k8s pods) which use Spring Kafka's #KafkaListener
Q1. I'd like to know if this is the configuration that will offer the best performance (high throughput and low latency)?
The consumer app sends the messages to only ONE downstream system which makes me think that all the topics (and all their partitions + consumer app replicas) can share a single consumer group (let's call it main-group).
Q2. Would this offer better or worse performance than having dedicated group for each topic?
Sub question:
Q3. Can ONE #KafkaListener work with 10 topics each with dedicated consumer groups given it has only 1 containerGroup and 1 containerFactory parameter?
Thanks
Lets say there is a topic in Apache Kafka with 3 partitions. I need to run 3 consumers inside a consumer group and, according to the documentation, it means each consumer will read data from 1 partition.
Consumers are implemented using Spring Kafka. As we all know, by default all messages are received in a single thread, but using ConcurrentMessageListenerContainer should allow us to set up concurrency.
What I want? I want to use server CPU resources efficiently and make each consumer to receive and process messsages in separate threads (3 threads in our case, which is equal to the number of partitions).
As a result - 3 consumers (3 servers) in the consumer group and each consumer receives messages from all 3 partitions.
Is it possible? If yes, will it be enough if I just use ConcurrentMessageListenerContainerand specify 3 listeners for each partition?
I was little confused by your statement. Just to clarify, in Kafka only one consumer can read from one partition within a consumer group. It is not possible for two consumers in same consumer group to read from same partition.
Within a consumer group,
if no of consumers is greater than number of partition, then extra consumer threads will be idle.
if no of consumers is less than number of partition, then same consumer thread will read from multiple partitions
this code snippet will read from topic named "mytopic" and it will use 3 thread to read from 3 partitions #KafkaListener(topics = "mytopic", concurrency = "3", groupId = "myconsumergroup")
I've saw a lot of questions about of this subject but I'm not very convinced. Is there a way to have more different consumers with a different group.id value than partitions number ?
Is a good workaround to achive this in the java code ?
Consumer Groups in Kafka is one way of parallelism in consuming the data. Multiple consumers can join a consumer group so that every individual consumer can consume data from different partitions of the Kafka Topic.
In addition to the above Kafka can track the Active consumers of a particular Group by using the group.id.
Therefore, having more consumers than partitions is ineffective as each partition is consumed by only one consumer in a consumer group to maintain the total order of messages consumed. Kafka only provides total ordering on all the partitions rather than maintaining order per partition.
But, you can still have multiple consumer groups consuming same topic which is more of a Publish/Subscribe rather than Point-to-Point.
If you have a different group id, then the partitions are not assigned to the same consumer group.
Basically, If you have N partitions and M distinct group ids, you can have at most N * M consumer threads polling from that topic. Any more, and you've oversubscribed for a particular group.
Does Kafka have a limitation on the simultaneous connections (created with
Consumer.createJavaConsumerConnector) for the same topic within the same
group?
My scenario is I need to consume a topic from different process (not
thread), so I need to create lots of high level consumers.
The number of active consumers within the same consumer group is limited by the number of partitions of the topic. Extra consumers will act as backups and will only start consuming when one of the active consumers goes down and the group is re-balanced.
If you need to consume the same copy of the data within multiple processes, your consumers should be in different consumer groups. There is no limitation on the number of consumer groups you can have.
The main limitation is number of partitions that have this topic - you can create more consumers than partitions, but they won't consume anything.