I have producer which I call and posts a record to Kafka, then I call a consumer which returns the record, but when I call the consumer again the consumer doesn't return any records. (I need to get the record which I had posted to Kafka again). How can I do this?(Any code would be appreciated)
Kafka doesn't delete the message after it has been consumed. But it keeps the offset of reading for any consumer. So after you read a message from it, the offset goes forward. The second read doesn't read anything because the offset point after your only message and there is nothing after that. You should try resetting the offset before you read again. See this post:
Reset consumer offset to the beginning from Kafka Streams
But if you don't want to reset locally or globally, you can create another consumer group and since every consumer group has its own offset, your second read by the new consumers can achieve what you want. See this link:
kafka-tutorial-kafka-consumer
Hope this would be helpful.
You can manually reset the offset to desired offset or if you need to consumer from the start offset ( whatever is available in kafka) , then you can set the consumer property "auto.offset.reset=earliest"
You can also provide every time a new group.id value for the consumer properties. Just generate a random string value. The property auto.offset.reset must be set to earliest.
Related
Is it possible to set up receiving a message from kafka so that the reader receives an uncommited message again and again, and does not take the next offset?
Setting enable.auto.commit is equal to false
And I use java and kafka without Spring
You can set max.poll.records=1 and simply cache&return the singular consumed record "again and again", rather than polling in a loop.
Otherwise, continuing to poll from the same client will seek the offset forward.
There is a situation when Consumer1 reads messages from a kafka topic. When connecting the second Consumer2 with the same groupId, there is a rebalance of partitions. Is it possible to somehow reset the offset, so that after the rebalance process, both Consumers read the topic from the beginning?
I think as per your requirement you can set the offset value as Earliest , so that it can consumed from beginning.
For more details for offset you can visit this article which explains what value you can set offset value to [https://dzone.com/articles/apache-kafka-consumer-group-offset-retention]
I want to get the last record offset in the topic partition. There is endOffsets method in the consumer. And usually endOffsets - 1 works fine. But in the case of transactional producer topic may contain offsets without a records. And endOffsets - 1 will point to the offset without record. So, how should I compute the last record offset in this case?
More interestingly, what if I will have both a simple and transactional producer for my topic? Is there any reliable way to get the last record offset ignoring all this complexity?
I ended up realizing that there is no reliable and simple way to do that in the current version of the java consumer. I created a feature request for that in Kafka's issue tracker: https://issues.apache.org/jira/browse/KAFKA-10009
Using the new Kafka Java consumer api, I run a single consumer to consume messages. When all available messages are consumed, I kill it with kill -15.
Now I would like to reset the offsets to start. I would like to avoid to just use a different consumer group. What I tried is the following sequence of calls, using the same group as the consumer that just had finished reading the data.
assign(topicPartition);
OffsetAndMetadata om = new OffsetAndMetadata(0);
commitSync(Collections.singletonMap(topicPartition, 0));
I thought I had got this working in a test, but now I always just get:
ERROR internals.ConsumerCoordinator: Error UNKNOWN_MEMBER_ID occurred while committing offsets for group queue
Exception in thread "main" org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be completed due to group rebalance
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator$OffsetCommitResponseHandler.handle(ConsumerCoordinator.java:552)
Is it in principle wrong to combine assign with commitSync, possibly because only subscribe and commitSync go together? The docs only say that assign does not go along with subscribe, but I thought this applies only in one consumer process. (In fact I was even hoping to run the offset-reset consumer while the other consumer is up, hoping that the other one might notice the offset change and start over again. But shutting it down first is fine too.)
Any ideas?
Found the problem. The approach described in my question works well, given we respect the following conditions:
There may be no other consumer running with the targeted group.id. Even if a consumer is subscribed only to other topics, this hinders committing topic offsets after calling assign() instead of subscribe().
After the last other consumer has stopped, it takes 30 seconds (I think it is group.max.session.timeout.ms) before the operation can succeed. The indicative log message from kafka is
Group X generation Y is dead and removed
Once this appears in the log, the sequence
assign(topicPartition);
OffsetAndMetadata om = new OffsetAndMetadata(0);
commitSync(Collections.singletonMap(topicPartition, 0));
can succeed.
Why even commit offsets in the first place?
Set enable.auto.commit to false in Properties and don't commit it at all if you just re-read all messages on restart.
To reset offset you can use for example these methods:
public void seek(TopicPartition partition, long offset)
public void seekToBeginning(TopicPartition... partitions)
I am very much new to Kafka and we are using Kafka 0.8.1.
What I need to do is to consume a message from topic. For that, I will have to write one consumer in Java which will consume a message from topic and then save that message to database. After a message is saved, some acknowledgement will be sent to Java consumer. If acknowledgement is true, then next message should be consumed from the topic. If acknowldgement is false(which means due to some error message,read from the topic, couldn't be saved into the database), then again that message should be read.
I think I need to use Simple Consumer,to have control over message offset and have gone through the Simple Consumer example as given in this link https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+SimpleConsumer+Example.
In this example, offset is evaluated in run method as 'readOffset'. Do I need to play with that? For e.g. I can use LatestTime() instead of EarliestTime() and in case of false, I will reset the offset to the one before using offset - 1.
Is this how I should proceed?
I think you can get along with using the high level consumer (http://kafka.apache.org/documentation.html#highlevelconsumerapi), that should be easier to use than the SimpleConsumer. I don't think the consumer needs to reread messages from Kafka on database failure, as the consumer already has those messages and can resend them to the DB or do anything else it sees fit.
High-level consumers store the last offset read from a specific partition in Zookeeper (based on the consumer group name), so that when a consumer process dies and is later restarted (potentially on an other host), it can continue processing messages where it left off. It's possible to autosave this offset to Zookeeper periodically (see the consumer properties auto.commit.enable and auto.commit.interval.ms), or have it saved by application logic by calling ConsumerConnector.commitOffsets . See also https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Group+Example .
I suggest you turn auto-commit off and commit your offsets yourselves once you received DB acknowledgement. Thus, you can make sure unprocessed messages are reread from Kafka in case of consumer failure and all messages commited to Kafka will eventually reach the DB at least once (but not 'exactly once').