How does Kafka store offsets for each topic?

How does Kafka store offsets for each topic? - java

While polling Kafka, I have subscribed to multiple topics using the subscribe() function. Now, I want to set the offset from which I want to read from each topic, without resubscribing after every seek() and poll() from a topic. Will calling seek() iteratively over each of the topic names, before polling for data achieve the result?
How are the offsets exactly stored in Kafka?
I have one partition per topic and just one consumer to read from all topics.

How does Kafka store offsets for each topic?
Kafka has moved the offset storage from zookeeper to kafka brokers. The reason is below:
Zookeeper is not a good way to service a high-write load such as offset updates because zookeeper routes each write though every node and hence has no ability to partition or otherwise scale writes. We have always known this, but chose this implementation as a kind of "marriage of convenience" since we already depended on zk.
Kafka store the offset commits in a topic, when consumer commit the offset, kafka publish an commit offset message to an "commit-log" topic and keep an in-memory structure that mapped group/topic/partition to the latest offset for fast retrieval. More design infomation could be found in this page about offset management.
Now, I want to set the offset from which I want to read from each topic, without resubscribing after every seek() and poll() from a topic.
There is a new feature about kafka admin tools to reset offset.
kafka-consumer-group.sh --bootstrap-server 127.0.0.1:9092 --group
your-consumer-group **--reset-offsets** --to-offset 1 --all-topics --execute
There are more options you can use.

Related

When can a Flink job consume from Kafka?

We have a Flink job which has the following topology:
source -> filter -> map -> sink
We set a live(ready) status at the sink operator open-override function. After we get that status, we send events. Sometimes it can't consume the events sent early.
We want to know the exact time/step that we can send data which will not be missing.

It looks like you want to ensure that no message is missed for processing. Kafka will retain your messages, so there is no need to send messages only when the Flink consumer is ready. You can simplify your design by avoiding the status message.
Any Kafka Consumer (not just Flink Connector) will have an offset associated with it in Kafka Server to track the id of the last message that was consumed.
From kafka docs:
Kafka maintains a numerical offset for each record in a partition. This
offset acts as a unique identifier of a record within that partition,
and also denotes the position of the consumer in the partition. For
example, a consumer which is at position 5 has consumed records with
offsets 0 through 4 and will next receive the record with offset 5
In your Flink Kafka Connector, specify the offset as the committed offset.
OffsetsInitializer.committedOffsets(OffsetResetStrategy.EARLIEST)
This will ensure that if your Flink Connector is restarted, it will consume from the last position that it left off, before the restart.
If for some reason, the offset is lost, this will read from the beginning (earliest message) in your Kafka topic. Note that this approach will cause you to reprocess the messages.
There are many more offset strategies you can explore to choose the right one for you.
Refer - https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/datastream/kafka/#starting-offset

Apache Kafka Message broadcasting

I am studying Apache-kafka and have some confusion. Please help me to understand the following scenario.
I have a topic with 5 partitions and 5 brokers in a Kafka cluster. I am maintaining my message order in Partition 1(say P1).I want to broadcast the messages of P1 to 10 consumers.
So my question is; how do these 10 consumers interact with topic partition p1.

This is probably not how you want to use Kafka.
Unless you're being explicit with how you set your keys, you can't really control which partition your messages end up in when producing to a topic. Partitions in Kafka are designed to be more like low-level plumbing, something that exists, but you don't usually have to interact with. On the consumer side, you will be assigned partitions based on how many consumers are active for a particular consumer group at any one time.
One way to get around this is to define a topic to have only a single partition, in which case, of course, all messages will go to that partition. This is not ideal, since Kafka won't be able to parallelize data ingestion or serving, but it is possible.
So, having said that, let's assume that you did manage to put all your messages in partition 1 of a specific topic. When you fire up a consumer of that topic with consumer group id of consumer1, it will be assigned all the partitions for that topic, since that consumer is the only active one for that particular group id. If there is only one partition for that topic, like explained above, then that consumer will get all the data. If you then fire up a second consumer with the same group id, Kafka will notice there's a second consumer for that specific group id, but since there's only one partition, it can't assign any partitions to it, so that consumer will never get any data.
On the other hand, if you fire up a third consumer with a different consumer group id, say consumer2, that consumer will now get all the data, and it won't interfere at all with consumer1 message consumption, since Kafka keeps track of their consuming offsets separately. Kafka keeps track of which offset each particular ConsumerGroupId is at on each partition, so it won't get confused if one of them starts consuming slowly or stops for a while and restarts consuming later that day.
Much more detailed information here on how Kafka works here: https://kafka.apache.org/documentation/#gettingStarted
And more information on how to use the Kafka consumer at this link:
https://kafka.apache.org/20/javadoc/index.html?org/apache/kafka/clients/consumer/KafkaConsumer.html

#mjuarez's answer is absolutely correct - just for brevity I would reduce it to the following;
Don't try and read only from a single partition because it's a low level construct and it somewhat undermines the parallelism of Kafka. You're much better off just creating more topics if you need finer separation of data.
I would also add that most of the time a consumer needn't know which partition a message came from, in the same way that I don't eat a sandwich differently depending on which store it came from.

#mjuarez is actually not correct and I am not sure why his comment is being falsely confirmed by the OP. You can absolutely explicitly tell Kafka which partition a producer record pertains to using the following:
ProducerRecord(
java.lang.String topic,
java.lang.Integer partition, // <--------- !!!
java.lang.Long timestamp,
K key,
V value)
https://kafka.apache.org/10/javadoc/org/apache/kafka/clients/producer/ProducerRecord.html#ProducerRecord-java.lang.String-java.lang.Integer-java.lang.Long-K-V-
So most of what was said after that becomes irrelevant.
Now to address the OP question directly: you want to accomplish a broadcast. To have a message sent once and read more than once you would have to have a different consumer group for each reader.
And that use case is an absolutely valid Kafka usage paradigm.
You can accomplish that using RabbitMQ too:
https://www.rabbitmq.com/tutorials/tutorial-three-java.html
... but the way it is done is not ideal because multiple out-of-process queues are involved.

If a consumer group is subscribed to multiple topic partitions how does kafka decide which it will read first?

I am trying to build a application that can read through a kafka topic but I need it to have a "see previous" button. I know how to seek through a particular partition but is it possible to back through all the messages in a topic in the order that they were read in? I am using Java KafkaConsumer.

Offsets are per partition and there is no ordering between messages across different partitions. You can't go back through all the messages in a topic in the order that they were read in as kafka doesn't know the order in which the consumer read in messages from different partitions (different partitions may be read from different brokers across the cluster). However, what you could do is do ordered buffering in your app as you read messages in, that would allow you to go back as far as the buffer capacity.

How to reset Kafka offsets to match tail position?

We're using Storm with Kafka and ZooKeeper. We had a situation where we had to delete some topics and recreate them with different names. Our Kafka spouts stayed the same, aside from now reading from the new topic names. However now the spouts are using the offsets from the old topic partitions when trying to read from the new topics. So the tail position of my-topic-name partition 0 will be 500 but the offset will be something like 10000.
Is there a way to reset the offset position so it matches the tail of the topic?

There a multiple options (as Storm's KafkaSpout does not provide any API to define the starting offset).
If you want to consumer from the tail of the log you should delete old offsets
depending on you Kafka version
(pre 0.9) you can manipulate ZK (which is a little tricky)
(0.9+) or you try do delete the offset from the topic __consumer_offsets (which is also tricky and might delete other offset you want to preserve, too)
if no offsets are there, you can restart your spout with auto offset reset policy "latest" or "largest" (depending on you Kafka version)
as an alternative (which I would recommend), you can write a small client application that uses seek() to manipulate the offset in the way you need them and commit() the offsets. This client must use the same group ID as you KafkaSpout and must subscribe to the same topic(s). Furthermore, you need to make sure that this client application is running a single consumer group member so it get's all partitions assigned.
for this, you an either seek to the end of the log and commit
or you commit an invalid offset (like -1) and rely on auto offset reset configuration"latest" or "largest" (depending on you Kafka version)
For Kafka Streams, there is a "Application Reset Tool" that does a similar thing to manipulate committed offsets. If you want to get some details, you can read this blog post http://www.confluent.io/blog/data-reprocessing-with-kafka-streams-resetting-a-streams-application/
(disclaimer: I am the author of the post and it is about Kafka Streams -- nevertheless, the underlying offset manipulation ideas are the same)

Kafka offsetcommit request with high level consumer API

I would like to use Kafka high level consumer API, and at the same time I would like to disable auto commit of offsets. I tried to achieve this through the following steps.
1) auto.commit.enable = false
2) offsets.storage = kafka
3) dual.commit.enabled = false
I created a offset manager, which periodically creates offsetcommit request to kafka and commits the offset.
Still I have the following questions
1) Does high level consumer API automatically fetches offset from kafka storage and initializes itself with that offset? Or should I use simple consumer API to achieve this?
2) Does kafka based storage for offsets is repicated across all brokers? Or it is maintained on only one broker?

I created a offset manager, which periodically creates offsetcommit request to kafka and commits the offset.
You need not do that if you are using the high level consumer which provides you with methods to commit the offsets manually, the javadoc (under Manual Offset Control) provides you with examples on how to do that.
1) Does high level consumer API automatically fetches offset from kafka storage and initializes itself with that offset? Or should I use simple consumer API to achieve this?
High level consumer will take care of fetching the last committed offset when you restart it, so you can resume consuming from where you left off.
2) Does kafka based storage for offsets is repicated across all brokers? Or it is maintained on only one broker?
Kafka stores the consumer offsets in an internal topic named __consumer_offsets and by default its replication factor is set to 3 with 50 partitions. So it is replicated across 3 brokers. You can find more info on its configuration in broker config, they start with offset or offsets.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.