There is a situation when Consumer1 reads messages from a kafka topic. When connecting the second Consumer2 with the same groupId, there is a rebalance of partitions. Is it possible to somehow reset the offset, so that after the rebalance process, both Consumers read the topic from the beginning?
I think as per your requirement you can set the offset value as Earliest , so that it can consumed from beginning.
For more details for offset you can visit this article which explains what value you can set offset value to [https://dzone.com/articles/apache-kafka-consumer-group-offset-retention]
Related
We have a Flink job which has the following topology:
source -> filter -> map -> sink
We set a live(ready) status at the sink operator open-override function. After we get that status, we send events. Sometimes it can't consume the events sent early.
We want to know the exact time/step that we can send data which will not be missing.
It looks like you want to ensure that no message is missed for processing. Kafka will retain your messages, so there is no need to send messages only when the Flink consumer is ready. You can simplify your design by avoiding the status message.
Any Kafka Consumer (not just Flink Connector) will have an offset associated with it in Kafka Server to track the id of the last message that was consumed.
From kafka docs:
Kafka maintains a numerical offset for each record in a partition. This
offset acts as a unique identifier of a record within that partition,
and also denotes the position of the consumer in the partition. For
example, a consumer which is at position 5 has consumed records with
offsets 0 through 4 and will next receive the record with offset 5
In your Flink Kafka Connector, specify the offset as the committed offset.
OffsetsInitializer.committedOffsets(OffsetResetStrategy.EARLIEST)
This will ensure that if your Flink Connector is restarted, it will consume from the last position that it left off, before the restart.
If for some reason, the offset is lost, this will read from the beginning (earliest message) in your Kafka topic. Note that this approach will cause you to reprocess the messages.
There are many more offset strategies you can explore to choose the right one for you.
Refer - https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/datastream/kafka/#starting-offset
I want to get the last record offset in the topic partition. There is endOffsets method in the consumer. And usually endOffsets - 1 works fine. But in the case of transactional producer topic may contain offsets without a records. And endOffsets - 1 will point to the offset without record. So, how should I compute the last record offset in this case?
More interestingly, what if I will have both a simple and transactional producer for my topic? Is there any reliable way to get the last record offset ignoring all this complexity?
I ended up realizing that there is no reliable and simple way to do that in the current version of the java consumer. I created a feature request for that in Kafka's issue tracker: https://issues.apache.org/jira/browse/KAFKA-10009
I want to read particular messages from topic. For example there are 12000 messages in topic and I want to read from 2000 to 5000 only. Is there any provision in kafka ? or can I use java consumer code to read particular messages from a topic?
The Java consumer API provides you "seek" methods and more specifically the following one
seek(TopicPartition partition, long offset)
You can specify to read messages starting from the provided offset but you cannot provide an ending offset. The other thing is that specifying an offset is more partition related and for this reason, you have to provide the TopicPartition as the first parameter.
Consider that if the topic partition is compacted and/or some messages are deleted, the offsets are not sequential anymore so you can have some holes. So you should pay attention if you want to read from the message with offset 2000 to the one with offset 5000 or you want to read from the 2000th message to the 5000th message (in this case the ordinal position could be not equal to the offset, i.e. the 2000th message is at offset 2100 because 100 messages before it were deleted).
I have producer which I call and posts a record to Kafka, then I call a consumer which returns the record, but when I call the consumer again the consumer doesn't return any records. (I need to get the record which I had posted to Kafka again). How can I do this?(Any code would be appreciated)
Kafka doesn't delete the message after it has been consumed. But it keeps the offset of reading for any consumer. So after you read a message from it, the offset goes forward. The second read doesn't read anything because the offset point after your only message and there is nothing after that. You should try resetting the offset before you read again. See this post:
Reset consumer offset to the beginning from Kafka Streams
But if you don't want to reset locally or globally, you can create another consumer group and since every consumer group has its own offset, your second read by the new consumers can achieve what you want. See this link:
kafka-tutorial-kafka-consumer
Hope this would be helpful.
You can manually reset the offset to desired offset or if you need to consumer from the start offset ( whatever is available in kafka) , then you can set the consumer property "auto.offset.reset=earliest"
You can also provide every time a new group.id value for the consumer properties. Just generate a random string value. The property auto.offset.reset must be set to earliest.
While polling Kafka, I have subscribed to multiple topics using the subscribe() function. Now, I want to set the offset from which I want to read from each topic, without resubscribing after every seek() and poll() from a topic. Will calling seek() iteratively over each of the topic names, before polling for data achieve the result?
How are the offsets exactly stored in Kafka?
I have one partition per topic and just one consumer to read from all topics.
How does Kafka store offsets for each topic?
Kafka has moved the offset storage from zookeeper to kafka brokers. The reason is below:
Zookeeper is not a good way to service a high-write load such as offset updates because zookeeper routes each write though every node and hence has no ability to partition or otherwise scale writes. We have always known this, but chose this implementation as a kind of "marriage of convenience" since we already depended on zk.
Kafka store the offset commits in a topic, when consumer commit the offset, kafka publish an commit offset message to an "commit-log" topic and keep an in-memory structure that mapped group/topic/partition to the latest offset for fast retrieval. More design infomation could be found in this page about offset management.
Now, I want to set the offset from which I want to read from each topic, without resubscribing after every seek() and poll() from a topic.
There is a new feature about kafka admin tools to reset offset.
kafka-consumer-group.sh --bootstrap-server 127.0.0.1:9092 --group
your-consumer-group **--reset-offsets** --to-offset 1 --all-topics --execute
There are more options you can use.