how to consume a kafka topic from a specific offset?

how to consume a kafka topic from a specific offset? - java

recently I am using kafka,
I have a topic and I am using the following code to consume
#KafkaListener(topics = "topic_name", groupId = "_id" , id = "pro", containerFactory = "kafkaListenerContainerFactory")
public void consume(ConsumerRecord<String, String> record, Acknowledgment ack) {
kafkaService.proccessorConsumer(record);
ack.acknowledge();
}
every thing works fine, but I need to handle a situation where if the service stopped for any reason, then started I want to continue consuming from the last message that has processed, I do understand that the acknowledgment help with this, but for the sake of certainty I saved the last consumed offset somewhere.
my question is how I could use that offset to start consuming the topic from it.

As #OneCricketeer indicates, what you are trying to achieve is the default behaviour of the Kafka consumer, if you haven't disabled automatic commit.
You can check this by describing your consumer group using the consumer id as follows, just check that the offset of your consumer is the same as the one you have stored elsewhere.
> bin/kafka-consumer-groups.sh --bootstrap-server localhost:9092 --describe --group my-group-id

Related

Kafka Streams application producing topic with same message

I am facing an issue with my Kafka streams application, where messages are being processed multiple times and the result topic is constantly receiving messages. This issue is only present in production and not in my local environment. Can you help me determine the root cause of this problem, based on the transformer code?
#Override
public KeyValue<String, UserClicks> transform(final String user, final Long clicks) {
UserClicks userClicks = tempStore.get(user);
if (userClicks != null) {
userClicks.clicks += clicks;
}
else {
final String region = regionStore.get(user).value();
userClicks = new UserClicks(user, region, clicks);
}
if (userClicks.clicks < CLICKS_THRESHOLD) {
tempStore.put(user, userClicks);
}
else {
tempStore.delete(user);
}
return KeyValue.pair(user, userClicks);
}
`
When I remove KStore from transformer everything seems to work fine.

Usally this problem occures becuase kafka can’t save its state, and it’s reading the same batch of messages. KStore stores it’s state on change log topic, and it stores it by producing messages. If the produces can’t produce for some reson, new offset can never be commited.
To resolve the issue, change the minimum number of in-sync replicas to 1 or set the replication factor to 2. By default, Kafka streams creates a replication factor of 1.
Easy way to configure this is through Conduktor, just go to topic config and changes min.insync.replicas property
It cant also be done through kafka CLI by running this command.
kafka-configs.sh --bootstrap-server localhost:9092 --alter --entity-type topics --entity-name configured-topic min.insync.replicas 1

How to consume messages within specific timestamps using reactor kafka

I need to consume messages from a Kafka topic within a specific time range. It is easy enough to determine the starting offset with the help of partition.seekToTimestamp(startTimeInMillis) like the code below:
ReceiverOptions<ByteBuffer, ByteBuffer> options =
receiverOptions
.subscription(topicConfig.getTopics())
.pollTimeout(Duration.ofMillis(topicConfig.getPollWaitTimeoutMs()))
.addAssignListener(
partitions -> {
partitions.forEach(partition -> partition.seekToTimestamp(startTimeInMillis));
});
but how do I stop consuming once the messages start exceeding the end timestamp?
I can use the below code:
KafkaReceiver.create(options)
.receive()
.takeWhile(record -> record.timestamp() < endTimeInMillis)
.map(this::handleConsumerRecord);
But the problem is that when the condition is met for a message, that message might be the last valid mesage in its partition, but it might be possible that other partitions in the topic still have messages below the end timestamp.
How do I ensure that I have consumed all the messages across partitions within the end timestamp?

How to delete a message from Queue in RabbitMQ

I am using Rabbit MQ to replicate what Jenkins does.
The only issue I am facing is, lets say, when 10 messages are in queue. And there are some duplicate messages which are in unacknowledged state.
And I need to delete those messages from queue, how do I achieve this?
My rabbitmq configuration is as follows, where each queue only has one consumer. So if I have 10 messages, all will get processed through same consumer's thread.
Queue queue = new Queue(sfdcConnectionDetails.getGitRepoId() + "_" + sfdcConnectionDetails.getBranchConnectedTo(), true);
rabbitMqSenderConfig.amqpAdmin().declareQueue(queue);
rabbitMqSenderConfig.amqpAdmin().declareBinding(BindingBuilder.bind(queue).to(new DirectExchange(byRepositoryRepositoryId.getRepository().getRepositoryId())).withQueueName());
RabbitMqConsumer container = new RabbitMqConsumer();
container.setConnectionFactory(rabbitMqSenderConfig.connectionFactory());
container.setQueueNames(queue.getName());
container.setConcurrentConsumers(1);
container.setMessageListener(new MessageListenerAdapter(new ConsumerHandler(****, ***), new Jackson2JsonMessageConverter()));
container.startConsumers();

You can use any plugin (e.g this) for deduplicating messages on the rabbit side.
Use cache on your consumer for detecting if the same message was processing recently.

As already suggested by #ekiryuhin, One of the approach you could take is assign a request_id tag it to the payload before producing message to RabbitMQ & on your consumer's end cache the request_id. Look out for the request_id if already present ignore payload and delete it.
This request_id might work as deduplication-id for your payloads.

Kafka producer is losing messages when broker is down

Given the following scenario:
I bring up zookeeper and a single kafka broker on my local and create "test" topic as described in the kafka quickstart: https://kafka.apache.org/quickstart
Then, I run a simple java program that produces a message to the "test" topic every second. After some time I bring down my local kafka broker and see producer continues producing messages, it doesn't throw any exception. Finally, I bring kafka broker up again, producer is able to reconnect to broker and it continues producing messages, but, all those messages that were produced during kafka broker downtime are lost. Producer doesn't replay them when detects healthy kafka broker.
How can I prevent this? I want kafka producer to replay those messages when it detects kafka broker back online. Here is my producer config:
props.put("bootstrap.servers", "localhost:9092");
props.put("acks", "all");
props.put("linger.ms", 0);
props.put("key.serializer", StringSerializer.class.getName());
props.put("value.serializer", StringSerializer.class.getName());

Kafka Producer library has a retry mechanism built in, however it is turned off by default. Change a retries Producer config to a value bigger that 0 (default value) to turn it on. You should also experiment with retry.backoff.ms and request.timetout.ms in order to customise Producer retries.
Example Kafka Producer config with enabled retries:
retries=2147483647 //Integer.MAX_VALUE
retry.backoff.ms=1000
request.timeout.ms=305000 //5 minutes
max.block.ms=2147483647 //Integer.MAX_VALUE
You can find more information about those properties in Apache Kafka documentation.

Since you're running just one broker, I'm afraid you won't be able to store messages when your broker is down.
However, it is strange that you don't get any exception/warning/errors when you bring your broker down.
I would expect a "Failed to update metadata" or "expiring messages" error because when the producer sends messages to the broker(s) mentioned against the bootstrap.servers property, it first checks with the zookeeper for the active controller (or leader) and partitions. So, in your case since you're running kafka in a stand-alone mode and when the broker is down the producer should not receive the leader information and error out.
Could you please check what the following properties are set to:
request.timeout.ms
max.block.ms
and play around (reducing, may be) with these values? and check the results?
One more option you might want to try out is to send messages to Kafka in a synchronous fashion (blocking send() method until the messages are received) and here's a code snippet that might help (taken from this documentation reference):
If you want to simulate a simple blocking call you can call the get() method immediately:
byte[] key = "key".getBytes();
byte[] value = "value".getBytes();
ProducerRecord<byte[],byte[]> record = new ProducerRecord<byte[],byte[]>("my-topic", key, value)
producer.send(record).get();
In this case, kafka should throw an exception if the messages are not sent successfully for any reason.
I hope this helps.

2 messages produced to same partition one by one - message 1 overridden by next as kafka producer (0.8.2.1) returns same offset

I have a unique problem which is happening like 50-100 times a day with message volume of ~2 millions per day on the topic.I am using Kafka producer API 0.8.2.1 and I have 12 brokers (v 0.8.2.2) running in prod with replication of 4. I have a topic with 60 partitions and I am calculating partition for all my messages and providing the value in the ProducerRecord itself. Now, the issue -
Application creates 'ProducerRecord' using -
new ProducerRecord<String, String>(topic, 30, null, message1);
providing topic, value message1 and partition 30. Then application call the send method and future is returned -
// null is for callback
Future<RecordMetadata> future = producer.send(producerRecord. null);
Now, app prints the offset and partition value by calling get on Future and then getting values from RecordMetadata - this is what i get -
Kafka Response : partition 30, offset 3416092
Now, the app produce the next message - message2 to same partition -
new ProducerRecord<String, String>(topic, 30, null, message2);
and kafka response -
Kafka Response : partition 30, offset 3416092
I receive the same offset again, and if I pull message from the offset of partition 30 using simple consumer, it ends up being the message2 which essentially mean i lost the message1.
Based on KafkaProducer documentation KafkaProducer, I am using a single producer instance (static instance shared) among 10 threads.
The producer is thread safe and should generally be shared among all threads for best performance.
I am using all default properties for producer (except max.request.size: 10000000), the message (String payload) size can be a few kbs to a 500 kbs. I am using ack value of 1.
What am i doing wrong here? Is there something I can look into or any producer property or server property I can tweak to make sure i don't lose any messages. I need some help here soon as I am losing some critical messages in production which is not good at all coz because of no exception its even hard to find out the message lost unless downstream process reports it.
EDIT:
The servers and client are now updated to kafka version 0.8.2.2. Also, the 10 app threads each use their own instance of kafka producer now. We are seeing better performance but still there is message loss.
Producer Properties:
value.serializer: org.apache.kafka.common.serialization.StringSerializer
key.serializer: org.apache.kafka.common.serialization.StringSerializer
bootstrap.servers: {SERVER VIP ENDPOINT}
acks: 1
batch.size: 204800
linger.ms: 10
send.buffer.bytes: 1048576
max.request.size: 10000000

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.