Kafka consumer does not read messages from the queue - java

There are 2 services with reading from the the same topic, the configurations are the same except for groupId, 1 partition, in the logs I see the same consumer configuration and successful connection. One service reads messages from the queue, the other does not. Of the differences found, there are logs in the working service:
Setting offset for partition topic-0 to the committed offset FetchPosition{offset=420, offsetEpoch=Optional.empty, currentLeader=LeaderAndEpoch
And in the second service there are no such. Tried different auto.offset.reset, doesn't help. The implementation of working with kafka in services is identical. What could be the problem?

Do both services read from the same topic?
If the topic has only 1 partition, then only 1 consumer will receive messages.
If you want to distribute messages across multiple consumers, then you'd have to specify more than 1 partition (more than number of consumers).
However, if you want all consumers to receive all messages, then you'd have to specify different consumer group id for each consumer (which in your case is: each service).

The problem was in the #Builder annotation for the domain class of the message, after replacing it with #SuperBuilder, everything began to work as it should. It's weird, I didn't have inheritance from this class.

Related

Apache Camel - Kafka component - single producer multiple consumer

I am creating two apache camel (blueprint XML) kafka projects, one is kafka-producer which accepts requests and stores it in kafka server, and other is kafka-consumer which picks ups messages from kafka server and processes them.
This setup is working fine for single topic and single consumer. However how do I create separate consumer groups within same kafka topic? How to route multiple consumer specific messages within same topic inside different consumer groups? Any help is appreciated. Thank you.
Your question is quite general as it's not very clear what's the problem you are trying to solve, therefore it's hard to understand if there's a better way to implement the solution.
Anyway let's start by saying that, as far as I can understand, you are looking for a Selective Consumer (EIP) which is something that's not supported out-of-the-box by Kafka and Consumer API. Selective Consumer can choose what message to pick from the queue or topic based on specific selectors' values that are put in advance by a producer. This feature must be implemented in the message broker as well, but kafka has not such a capability.
Kafka does implement a hybrid solution between pure pub/sub and queue. That being said, what you can do is subscribing to the topic with one or more consumer groups (more on that later) and filter out all messages you're not interested in, by inspecting messages themselves. In the messaging and EIP world, this pattern is known as Array of Filters. As you can imagine this happen after the message has been broadcasted to all subscribers; therefore if that solution does not fit your requirements or context, then you can think of implementing a Content Based Router which is intended to dispatch the message to a subset of consumers only under your centralized control (this would imply intermediate consumer-specific channels that could be other Kafka topics or seda/VM queues, of course).
Moving to the second question, here is the official Kafka Component website: https://camel.apache.org/components/latest/kafka-component.html.
In order to create different consumer groups, you just have to define multiple routes each of them having a dedicated groupId. By adding the groupdId property, you will inform the Consumer Group coordinators (that reside in Kafka brokers) about the existence of multiple separated groups of consumers and brokers will use those in order to discriminate and treat them separately (by sending them a copy of each log message stored in the topic)...
Here is an example:
public void configure() throws Exception {
from("kafka:myTopic?brokers={{kafkaBootstrapServers}}" +
"&groupId=myFirstConsumerGroup"
.log("Message received by myFirstConsumerGroup : ${body}");
from("kafka:myTopic?brokers={{kafkaBootstrapServers}}" +
"&groupId=mySecondConsumerGroup"
.log("Message received by mySecondConsumerGroup : ${body}");
}
As you can see, I created two routes in the same RouteBuilder, not to say in the same Java process. That's a very bad design decision in most of the use cases I can think of, because there's no single responsibility, segregated concerns and they will not scale. But again, it depends on your requirements/context.
Out of completeness, please consider taking a look at all other Kafka Component properties, as there may be many other configurations of your interest such as the number of consumer threads per group.
I tried to stay high level, in order to initiate the discussion... I'll edit my answer in case of new updates from you. Hope I helped!

Apache Kafka Message broadcasting

I am studying Apache-kafka and have some confusion. Please help me to understand the following scenario.
I have a topic with 5 partitions and 5 brokers in a Kafka cluster. I am maintaining my message order in Partition 1(say P1).I want to broadcast the messages of P1 to 10 consumers.
So my question is; how do these 10 consumers interact with topic partition p1.
This is probably not how you want to use Kafka.
Unless you're being explicit with how you set your keys, you can't really control which partition your messages end up in when producing to a topic. Partitions in Kafka are designed to be more like low-level plumbing, something that exists, but you don't usually have to interact with. On the consumer side, you will be assigned partitions based on how many consumers are active for a particular consumer group at any one time.
One way to get around this is to define a topic to have only a single partition, in which case, of course, all messages will go to that partition. This is not ideal, since Kafka won't be able to parallelize data ingestion or serving, but it is possible.
So, having said that, let's assume that you did manage to put all your messages in partition 1 of a specific topic. When you fire up a consumer of that topic with consumer group id of consumer1, it will be assigned all the partitions for that topic, since that consumer is the only active one for that particular group id. If there is only one partition for that topic, like explained above, then that consumer will get all the data. If you then fire up a second consumer with the same group id, Kafka will notice there's a second consumer for that specific group id, but since there's only one partition, it can't assign any partitions to it, so that consumer will never get any data.
On the other hand, if you fire up a third consumer with a different consumer group id, say consumer2, that consumer will now get all the data, and it won't interfere at all with consumer1 message consumption, since Kafka keeps track of their consuming offsets separately. Kafka keeps track of which offset each particular ConsumerGroupId is at on each partition, so it won't get confused if one of them starts consuming slowly or stops for a while and restarts consuming later that day.
Much more detailed information here on how Kafka works here: https://kafka.apache.org/documentation/#gettingStarted
And more information on how to use the Kafka consumer at this link:
https://kafka.apache.org/20/javadoc/index.html?org/apache/kafka/clients/consumer/KafkaConsumer.html
#mjuarez's answer is absolutely correct - just for brevity I would reduce it to the following;
Don't try and read only from a single partition because it's a low level construct and it somewhat undermines the parallelism of Kafka. You're much better off just creating more topics if you need finer separation of data.
I would also add that most of the time a consumer needn't know which partition a message came from, in the same way that I don't eat a sandwich differently depending on which store it came from.
#mjuarez is actually not correct and I am not sure why his comment is being falsely confirmed by the OP. You can absolutely explicitly tell Kafka which partition a producer record pertains to using the following:
ProducerRecord(
java.lang.String topic,
java.lang.Integer partition, // <--------- !!!
java.lang.Long timestamp,
K key,
V value)
https://kafka.apache.org/10/javadoc/org/apache/kafka/clients/producer/ProducerRecord.html#ProducerRecord-java.lang.String-java.lang.Integer-java.lang.Long-K-V-
So most of what was said after that becomes irrelevant.
Now to address the OP question directly: you want to accomplish a broadcast. To have a message sent once and read more than once you would have to have a different consumer group for each reader.
And that use case is an absolutely valid Kafka usage paradigm.
You can accomplish that using RabbitMQ too:
https://www.rabbitmq.com/tutorials/tutorial-three-java.html
... but the way it is done is not ideal because multiple out-of-process queues are involved.

Order of messages from multiple topics Kafka

I'm developing a software that uses Apache Kafka. I've got one consumer that subscribed to multiple topics, I'd like to know if there is an order for receiving messages from those topics. I tried some combination on my computer but I need to be sure about this.
Example
Consumer sub to topic1 and topic2
Producer1 write something on topic1
Producer2 write something on topic2
Producer1 write something on topic1
When the consumer polls, it receives a list of records containing first the messages from the first topic that he subscribed and then the messages from the other topic.
I'd like to know if it is always like this, i.e. the messages are in order like the topics that I subscribed.
Thanks
[EDIT] I'd like to specify that I have the two topics with one partition each, and only one producer and one consumer. I need to read first all the messages from the first topic and then the messages from the other topic
Kafka gives you only the guarantee of messages ordering inside a partition. It means that even with only one topic but more than one partitions you have no guarantee that messages are received in the same order they are sent.
Regarding your use case with two topics there is no relation between subscription order to the topics and messages ordering even because if the cluster has more than one node, the topic partition leader will be on different brokers and the client receives messages over different connections. Btw even with only one broker with all topics/partitions on that you can't have the guarantee you are describing.
No. Message ordering is only preserved within partitions (not even within topics).
If you need stronger ordering guarantees, you have to re-arrange messages in your application, for example using a timestamp (and a sufficiently large window buffer to catch all the ones that arrive out-of-order). Support for this has improved a bit with the recent addition of timestamps for all messages by Kafka itself, but the principle remains the same.
Why not first subscribe to the first topic and do a poll, and then subscribe to the other topic and do another poll? Without this, I don't think there is any guarantee in which order you receive messages from the two topics.

How to monitor application processing of Kafka messages for load testing

There is an application (not mine) that reads messages from Kafka, does some processing on them, and stores records in a database. I've put together a program in Java that writes messages into the queue at a given rate. Right now, it does a simple measure of performance by querying the database at the end of the test run to ensure that records in = records out. However, I'd like to expand it to periodically check the queue to see how many messages are pending that the application hasn't yet processed to see if it's getting backed up.
I figure that I can check offset of the application's group ID in Zookeeper. I looked at the Kafka documentation, but it only gives basic consumer examples and the API documentation is sparse at best, so I'm not sure how to go about finding this information.
What APIs to I need to call in order to find out where in the queue the application is currently at, and how many messages are in the queue behind that position?
I'm using Kafka 2.10-0.8.2.1 with a single Zookeeper instance and three Kafka instances, and the load tester is using the 0.8.2.1 Java API. The topic in question has three partitions (one on each Kafka server), however for the purpose of the test there is only a single consumer.
I would suggest looking at the already provided tools in Kafka (code is available in src if you need to call the API directly). In particular,
$ bin/kafka-run-class.sh kafka.tools.ConsumerOffsetChecker --group consumer-group1 --zkconnect zkhost:zkport --topic topic1
Will show you the offset and lag:
consumer-group1,topic1,0-0 (Group,Topic,BrokerId-PartitionId)
Owner = consumer-group1-consumer1
Consumer offset = 70121994703
= 70,121,994,703 (65.31G)
Log size = 70122018287
= 70,122,018,287 (65.31G)
Consumer lag = 23584
= 23,584 (0.00G)
References:
Kafka FAQ
Kafka Tools
There are several offsets Kafka exposes (via JMX), that you can use to figure out how much lag consumers have (per topic, per partition). These are Latest Offset (basically where Kafka Broker is with its writes of new data) and Consumer Offset (where Consumers are with their reads). The delta between these two is known as Consumer Lag, and this tells you how far behind real-time Consumers are. Based on this info one can also derive Broker Write Rate and Consumer Read Rate, which are handy to know and see in your Kafka monitoring tool, too. See Kafka Consumer Lag Monitoring for more details. If you just want a tool that can chart a bunch of Kafka metrics, see SPM for Kafka monitoring. HTH.

Reading messages offset in Apache Kafka

I am very much new to Kafka and we are using Kafka 0.8.1.
What I need to do is to consume a message from topic. For that, I will have to write one consumer in Java which will consume a message from topic and then save that message to database. After a message is saved, some acknowledgement will be sent to Java consumer. If acknowledgement is true, then next message should be consumed from the topic. If acknowldgement is false(which means due to some error message,read from the topic, couldn't be saved into the database), then again that message should be read.
I think I need to use Simple Consumer,to have control over message offset and have gone through the Simple Consumer example as given in this link https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+SimpleConsumer+Example.
In this example, offset is evaluated in run method as 'readOffset'. Do I need to play with that? For e.g. I can use LatestTime() instead of EarliestTime() and in case of false, I will reset the offset to the one before using offset - 1.
Is this how I should proceed?
I think you can get along with using the high level consumer (http://kafka.apache.org/documentation.html#highlevelconsumerapi), that should be easier to use than the SimpleConsumer. I don't think the consumer needs to reread messages from Kafka on database failure, as the consumer already has those messages and can resend them to the DB or do anything else it sees fit.
High-level consumers store the last offset read from a specific partition in Zookeeper (based on the consumer group name), so that when a consumer process dies and is later restarted (potentially on an other host), it can continue processing messages where it left off. It's possible to autosave this offset to Zookeeper periodically (see the consumer properties auto.commit.enable and auto.commit.interval.ms), or have it saved by application logic by calling ConsumerConnector.commitOffsets . See also https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Group+Example .
I suggest you turn auto-commit off and commit your offsets yourselves once you received DB acknowledgement. Thus, you can make sure unprocessed messages are reread from Kafka in case of consumer failure and all messages commited to Kafka will eventually reach the DB at least once (but not 'exactly once').

Categories