How to handle session timeout while processing Kafka messages?

How to handle session timeout while processing Kafka messages? - java

I am processing messages from Kafka in a standard processing loop:
while (true) {
ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(100));
for (ConsumerRecord<String, String> record : records) {
processMessage(record);
}
}
What should I do if my Kafka Consumer gets into a timeout while processing the records? I mean the timeout controlled by the property session.timeout.ms
When this happens, my consumer should stop processing the records, because it would lose its partitions and the records that it processes could be already processed by another consumer. If the original consumer writes some processing results into a database, it could overwrite the records produced by the "new" consumer that got the partitions after my original consumer timed out.
I know about the ConsumerRebalanceListener, but from my understanding its method onPartitionsLost would only be called after I call the poll method from the consumer. Therefore this doesn't help me to stop the processing loop of the batch of records that I received from the previous poll.
I would expect that the heartbeat thread could notify me that it was not able to contact the broker and that we have a session timeout in the consumer, but there doesn't seem to be anything like that...
Am I missing something?

Adding this as an answer as it would be too long in a comment.
Kafka has a few ways that can be used to process messages
At most once;
At least once; and
Exactly once.
You are describing that you would like to use kafka as exactly once semantics (which by the way is the least common way of using kafka). Also producers need to play nicely as by default kafka can produce the same message more than once.
It's a lot more common to build services that use the at least once mechanism, in this way you can receive (or process) the same message more than once but you need to have a way to deduplicate them (it's the same idea behind idempotency on http APIs). You'll need to have something in the message that is unique and have register that that id has been processed already. If the payload has nothing you can use to deduplicate them, you can add a header on the message and use that.
This is also useful in the scenario that you have to reset the offset, so the service can go through old messages without breaking.
I would suggest you to google a bit for details on how to implement the above.
Here's a blog post from confluent about developing exactly once semantics Improved Robustness and Usability of Exactly-Once Semantics in Apache Kafka and the Kafka docs explaining the different semantics.
About the point of the ConsumerRebalanceListener, you don't need to do anything if you follow the solution of using idempotency in the consumer. Rebalances also happen when an app crashes, and in that scenario the service might have processed some records, but not committed them yet to Kafka.
A mini tip I give to everyone who is starting with Kafka. Kafka looks simple from the outside but it's a complex technology. Don't use it in production until you know the nitty gritty details of how it works including have done some good amount of negative testing (unless you are ok with losing data).

Related

Should I ensure that Kafka messages are send successfully before deleting the data?

I need to read data from the database, send them to Kafka, and then delete those data (which were successfully sent) from the database. I would think to do it straitforward:
public void syncData() {
List<T> data = repository.findAll();
data.forEach(value -> kafkaTemplate.send(topicName, value));
repository.deleteAll(data);
}
But I have never worked with Kafka before and I have a confusion with kafkaTemplate.send operation. As the method returns ListenableFuturethat means that the iteration data.forEach might be finished before all the messages are really sent to a broker. Thus, I might delete the data before they are really sent. What if, for some reason, some messages are not sent. Say I have 10 data, and starting from 7th the broker gets down.
Will Kafka throw an exception if a message is not sent?
Should I introduce an extra logic to ensure that all messages are sent before going to the next stage of deleting the data?
P.S. I use Kafka with Spring-boot

You should implement a callback that will trigger when the producer either succeeds or fails to write the data to Kafka before deleting it from the DB.
https://docs.spring.io/spring-kafka/docs/1.0.0.M2/reference/html/_reference.html
On top of this, you can set required acks to ALL so that every broker acknowledges the messages before it's considered sent.
Also little tid bit worth knowing in this context - Acks=ALL is not all assigned replicas, it's all in sync replicas for the partition need to acknowledge the write. So, it's important to have your min isr settings sensible for this also. If you have min isr = one, in a very strict sense Acks=all is still only guaranteeing that 1 broker saw the write. If you then lose that one broker, you lose the write. That's obviously not going to be a common situation, but it's one that you should be aware of.
The usage of outbox pattern. (as the safe way of doing this)
Also there's some directions that might be helpful are, investigate how the replication factor of a topic relays to the amount of brokers. Get in touch with the min.insync.replicas broker setting. Then read on the ack-setting for the client-(producer) and what it means in terms of communication with the broker. For restarting at the correct data position when something bad happens to your application or database connection, you can get some inspiration from the kafka-connect library (and maybe use this as a separately deployed db-polling-service).

One of the strategies would be to keep those Future objects that are returned and monitor them (possibly in a separate thread). Once all of the tasks complete you can either delete the records that were successfully sent or write the IDs that need to be deleted in DB. And then have a scheduled task (once per hour or day or whatever period that fits you) that would delete all the ids that should be deleted.

Transactions with multiple resources (database and JMS broker)

I have an application where we insert to database and we publish event to ActiveMQ.
I am facing problems with the transaction. I will explain the issue with the code below:
#Transactional(rollbackFor = Exception.class)
public class ProcessInvoice {
public boolean insertInvoice(Object obj){
/* Some processing logic here */
/* DB Insert */
insert(obj);
/* Some processing logic here again */
/* Send event to Queue 1 */
sendEvent(obj);
/* Send event to Queue 2 */
sendEvent(obj);
return true;
}
}
Class is annotated with #Transactional, in the insertInvoice method I am doing some processing, inserting to DB, and sending event's to two queues.
With the above code I am facing two problems:
If the queue is slow then I am facing performance issue as process takes time in sendEvent method.
If for some reason ActiveMQ is down or consumer not able to process the message, how to rollback the transaction?
How to deal with these issue?

If you need to send your message transactionally (i.e. you need to be sure the broker actually got your message when you send it) and the broker is performing slowly which is impacting your application then you only have two choices:
Accept the performance loss in your application.
Improve the broker's performance so that your application performance improves as well. Improving broker performance is a whole other subject.
In JMS (and most other messaging architectures) producers and consumers are unaware of each other by design. Therefore, you will not know if the consumer of the message you send is unable to process the message for any reason, at least not through any automatic JMS mechanism.
When the broker is down the sendEvent method should fail outright. However, I'm not terribly familiar with how Spring handles transactions so I can't say what should happen in that regard.

I have some questions regarding your issue:
If the sendEvent(Object o) method is that expensive (according to what you say) in terms of performance, why do you consider to call it twice (apparently for processing the same object)?
Apparently the result of those 2 calls would be the same, with the difference that they would be sent to 2 different queues. I believe that you could send it to both queues in just one call, in order not to execute the same code twice.
When thinking in transactions, the first things that come to my head are synchronous operations. Do you want to perform those operations asynchronously or synchronously? For example, do you want to wait until the invoice is inserted in the DB for sending right after the message to Queue1 and Queue2?
Maybe you should do it asynchronously. If you don't or cannot, maybe you could opt for an "optimistic" strategy, where you send first the message to Queue1 and Queue2, and afterwards while you are processing those messages on the broker side, you perform the insertion of the invoice into the DB. If the database has a high availability, in most cases the insertion will succeed, so you will not have to wait until it is persisted to send the messages to Queue1 and 2. In case the insertion did not succeed (what would be very unlikely), you could send a second message to undo those changes on the broker side. In case that due to your business logic this "undo" process is not trivial, this alternative might not suit for you.
You mention if ActiveMQ is down, how to rollback. Well, in that case maybe you need some monitoring of the queues to find out if the message reached its destination or not. I would advise you to take a look to the Advisory messages, they may help you to control that and act in consequence.
But maybe what you need could also be re-thought and solved with durable subscribers, in that way once the subscribers were available again, they would receive that message that was en-queued. But this performs slightly worse since it needs to persist the messages to files to recover them afterwards if the broker goes down.
Hope these suggestions help you, but in my opinion I believe you should describe more how should it be the result you want (the flow) since it does not seem to be very clear (at least to me)

Cancel ActiveMQ message

I'm not sure if ActiveMQ is a right tool here...
I have a task queue and multiple consumers, so my idea was to use ActiveMQ to post tasks, which are then consumed by consumers.
But I need to be able to cancel the task, if it was not processed yet...
Is there an API for removing Message from Queue in ActiveMQ?
Destination destination = session.createQueue(TOPIC_NAME);
MessageProducer producer = session.createProducer(destination);
ObjectMessage message = session.createObjectMessage(jobData);
producer.send(message);
...
producer.cancel(message); (?)
The use-case is that, for any reason, performing the task is no longer needed, and the task is resource-consuming.

What about setting an expiry time on the message?
http://activemq.apache.org/how-do-i-set-the-message-expiration.html
If you want a message to be deleted if it has not been processed / consumed in a particular time frame, then message expiry seems the answer to me.

ActiveMQ exposes a JMX interface that allows for operations of this kind. The MBean that models a Queue (e.g., org.apache.activemq:type=broker,brokerName=amq,destinationType=Queue,destinationName=my_queue) exposes a method removeMessage (String id). There are also methods that remove messages that match a particular pattern.
So far as I know, this functionality is not exposed outside JMX.
But...
I have a nasty feeling that JMX operations that work on specific messages only work on messages that are paged into memory. By default that would usually be the 400 messages nearest the head of the queue. I know this is true for selector operations, although I'm not sure about JMX.
Some ActiveMQ message stores (e.g., the JDBC store) might also provide a way to get to the underlying message data and manipulate it. On a relational database this is usually safe to do, because messages that are 'in flight' in a JMS operation will be locked at the database level. However, this is a lot of hassle for what ought to be a simple job.
I wonder if JMS is really the right technology for this job? It isn't really intended for random access. Perhaps some sort of distributed data cache would work better (jgroups, Hazelcast,...)?

For those who are looking for a direct answer, there's a JMS API to control this behaviour:
Per JMS API docs:
setTimeToLive(long timeToLive)
Specifies the time to live of messages that are sent using this JMSProducer.
So you can set this value on the producer before sending:
...
producer.setTimeToLive(30000L);
producer.send();
With this particular setting, messages will be retained for 30 seconds before being deleted by the Message Broker

When consumer gets message from channel in rabbitmq,where does pre-fetch messages reside

I have below configuration for rabbitmq
prefetchCount:1
ack-mode:auto.
I have one exchange and one queue is attached to that exchange and one consumer is attached to that queue. As per my understanding below steps will be happening if queue has multiple messages.
Queue write data on a channel.
As ack-mode is auto,as soon as queue writes message on channel,message is removed from queue.
Message comes to consumer,consumer start performing on that data.
As Queue has got acknowledgement for previous message.Queue writes next data on Channel.
Now,my doubt is,Suppose consumer is not finished with previous data yet.What will happen with that next data queue has written in channel?
Also,suppose prefetchCount is 10 and I have just once consumer attached to queue,where these 10 messages will reside?

The scenario you have described is one that is mentioned in the documentation for RabbitMQ, and elaborated in this blog post. Specifically, if you set a sufficiently large prefetch count, and have a relatively small publish rate, your RabbitMQ server turns into a fancy network switch. When acknowledgement mode is set to automatic, prefetch limiting is effectively disabled, as there are never unacknowledged messages. With automatic acknowledgement, the message is acknowledged as soon as it is delivered. This is the same as having an arbitrarily large prefetch count.
With prefetch >1, the messages are stored within a buffer in the client library. The exact data structure will depend upon the client library used, but to my knowledge, all implementations store the messages in RAM. Further, with automatic acknowledgements, you have no way of knowing when a specific consumer actually read and processed a message.
So, there are a few takeaways here:
Prefetch limit is irrelevant with automatic acknowledgements, as there are never any unacknowledged messages, thus
Automatic acknowledgements don't make much sense when using a consumer
Sufficiently-large prefetch when auto-ack is off, or any use of autoack = on will result in the message broker not doing any queuing, and instead doing routing only.
Now, here's a little bit of expert opinion. I find the whole notion of a message broker that "pushes" messages out to be a little backwards, and for this very reason- it's difficult to configure properly, and it is unclear what the benefit is. A queue system is a natural fit for a pull-based system. The processor can ask the broker for the next message when it is done processing the current message. This approach will ensure that load is balanced naturally and the messages don't get lost when processors disconnect or get knocked out.
Therefore, my recommendation is to drop the use of consumers altogether and switch over to using basic.get.

Best Practice for resilience of messages across RabbitMQ queues

I am trying to understand the best use of RabbitMQ to satisfy the following problem.
As context I'm not concerned with performance in this use case (my peak TPS for this flow is 2 TPS) but I am concerned about resilience.
I have RabbitMQ installed in a cluster and ignoring dead letter queues the basic flow is I have a service receive a request, creates a persistent message which it queues, in a transaction, to a durable queue (at this point I'm happy the request is secured to disk). I then have another process listening for a message, which it reads (not using auto ack), does a bunch of stuff, writes a new message to a different exchange queue in a transaction (again now happy this message is secured to disk). Assuming the transaction completes successfully it manually acks the message back to the original consumer.
At this point my only failure scenario is is I have a failure between the commit of the transaction to write to my second queue and the return of the ack. This will lead to a message being potentially processed twice. Is there anything else I can do to plug this gap or do I have to figure out a way of handling duplicate messages.
As a final bit of context the services are written in java so using the java client libs.
Paul Fitz.

First of all, I suggest you to look a this guide here which has a lot of valid information on your topic.
From the RabbitMQ guide:
At the Producer
When using confirms, producers recovering from a channel or connection
failure should retransmit any messages for which an acknowledgement
has not been received from the broker. There is a possibility of
message duplication here, because the broker might have sent a
confirmation that never reached the producer (due to network failures,
etc). Therefore consumer applications will need to perform
deduplication or handle incoming messages in an idempotent manner.
At the Consumer
In the event of network failure (or a node crashing), messages can be
duplicated, and consumers must be prepared to handle them. If
possible, the simplest way to handle this is to ensure that your
consumers handle messages in an idempotent way rather than explicitly
deal with deduplication.
So, the point is that is not possibile in any way at all to guarantee that this "failure" scenario of yours will not happen. You will always have to deal with network failure, disk failure, put something here failure etc.
What you have to do here is to lean on the messaging architecture and implement if possibile "idempotency" of your messages (which means that even if you process the message twice is not going to happen anything wrong, check this).
If you can't than you should provide some kind of "processed message" list (for example you can use a guid inside every message) and check this list every time you receive a message; you can simply discard them in this case.
To be more "theorical", this post from Brave New Geek is very interesting:
Within the context of a distributed system, you cannot have
exactly-once message delivery.
Hope it helps :)

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.