Kafka transactions with exception

Kafka transactions with exception - java

I have a situation in which Producer A writes on topics A,B and C however listener for topic C throws an exception. All writes are part of a transaction. I want to know if there is a way that all writes can be rolled back automatically, as if no there were no commits in the first place?

I don't think this can be achieved in Kafka out of the box. I would suggest to re-think the design since Kafka/ messaging system is not the best match for your requirement. Kafka consumers are meant to be independent business logic like a micro-service, even if one fails it should not affect the other. If its so critical you may consider a single topic/webservice with all required info in that topic/request and make the client transactional. Otherwise if non-critical(failure of a topic client is not affecting functionality of another topic client), then introduce some audit/alerting mechanism on top of clients to make sure that they are back online.

Related

How to handle session timeout while processing Kafka messages?

I am processing messages from Kafka in a standard processing loop:
while (true) {
ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(100));
for (ConsumerRecord<String, String> record : records) {
processMessage(record);
}
}
What should I do if my Kafka Consumer gets into a timeout while processing the records? I mean the timeout controlled by the property session.timeout.ms
When this happens, my consumer should stop processing the records, because it would lose its partitions and the records that it processes could be already processed by another consumer. If the original consumer writes some processing results into a database, it could overwrite the records produced by the "new" consumer that got the partitions after my original consumer timed out.
I know about the ConsumerRebalanceListener, but from my understanding its method onPartitionsLost would only be called after I call the poll method from the consumer. Therefore this doesn't help me to stop the processing loop of the batch of records that I received from the previous poll.
I would expect that the heartbeat thread could notify me that it was not able to contact the broker and that we have a session timeout in the consumer, but there doesn't seem to be anything like that...
Am I missing something?

Adding this as an answer as it would be too long in a comment.
Kafka has a few ways that can be used to process messages
At most once;
At least once; and
Exactly once.
You are describing that you would like to use kafka as exactly once semantics (which by the way is the least common way of using kafka). Also producers need to play nicely as by default kafka can produce the same message more than once.
It's a lot more common to build services that use the at least once mechanism, in this way you can receive (or process) the same message more than once but you need to have a way to deduplicate them (it's the same idea behind idempotency on http APIs). You'll need to have something in the message that is unique and have register that that id has been processed already. If the payload has nothing you can use to deduplicate them, you can add a header on the message and use that.
This is also useful in the scenario that you have to reset the offset, so the service can go through old messages without breaking.
I would suggest you to google a bit for details on how to implement the above.
Here's a blog post from confluent about developing exactly once semantics Improved Robustness and Usability of Exactly-Once Semantics in Apache Kafka and the Kafka docs explaining the different semantics.
About the point of the ConsumerRebalanceListener, you don't need to do anything if you follow the solution of using idempotency in the consumer. Rebalances also happen when an app crashes, and in that scenario the service might have processed some records, but not committed them yet to Kafka.
A mini tip I give to everyone who is starting with Kafka. Kafka looks simple from the outside but it's a complex technology. Don't use it in production until you know the nitty gritty details of how it works including have done some good amount of negative testing (unless you are ok with losing data).

Transactions with multiple resources (database and JMS broker)

I have an application where we insert to database and we publish event to ActiveMQ.
I am facing problems with the transaction. I will explain the issue with the code below:
#Transactional(rollbackFor = Exception.class)
public class ProcessInvoice {
public boolean insertInvoice(Object obj){
/* Some processing logic here */
/* DB Insert */
insert(obj);
/* Some processing logic here again */
/* Send event to Queue 1 */
sendEvent(obj);
/* Send event to Queue 2 */
sendEvent(obj);
return true;
}
}
Class is annotated with #Transactional, in the insertInvoice method I am doing some processing, inserting to DB, and sending event's to two queues.
With the above code I am facing two problems:
If the queue is slow then I am facing performance issue as process takes time in sendEvent method.
If for some reason ActiveMQ is down or consumer not able to process the message, how to rollback the transaction?
How to deal with these issue?

If you need to send your message transactionally (i.e. you need to be sure the broker actually got your message when you send it) and the broker is performing slowly which is impacting your application then you only have two choices:
Accept the performance loss in your application.
Improve the broker's performance so that your application performance improves as well. Improving broker performance is a whole other subject.
In JMS (and most other messaging architectures) producers and consumers are unaware of each other by design. Therefore, you will not know if the consumer of the message you send is unable to process the message for any reason, at least not through any automatic JMS mechanism.
When the broker is down the sendEvent method should fail outright. However, I'm not terribly familiar with how Spring handles transactions so I can't say what should happen in that regard.

I have some questions regarding your issue:
If the sendEvent(Object o) method is that expensive (according to what you say) in terms of performance, why do you consider to call it twice (apparently for processing the same object)?
Apparently the result of those 2 calls would be the same, with the difference that they would be sent to 2 different queues. I believe that you could send it to both queues in just one call, in order not to execute the same code twice.
When thinking in transactions, the first things that come to my head are synchronous operations. Do you want to perform those operations asynchronously or synchronously? For example, do you want to wait until the invoice is inserted in the DB for sending right after the message to Queue1 and Queue2?
Maybe you should do it asynchronously. If you don't or cannot, maybe you could opt for an "optimistic" strategy, where you send first the message to Queue1 and Queue2, and afterwards while you are processing those messages on the broker side, you perform the insertion of the invoice into the DB. If the database has a high availability, in most cases the insertion will succeed, so you will not have to wait until it is persisted to send the messages to Queue1 and 2. In case the insertion did not succeed (what would be very unlikely), you could send a second message to undo those changes on the broker side. In case that due to your business logic this "undo" process is not trivial, this alternative might not suit for you.
You mention if ActiveMQ is down, how to rollback. Well, in that case maybe you need some monitoring of the queues to find out if the message reached its destination or not. I would advise you to take a look to the Advisory messages, they may help you to control that and act in consequence.
But maybe what you need could also be re-thought and solved with durable subscribers, in that way once the subscribers were available again, they would receive that message that was en-queued. But this performs slightly worse since it needs to persist the messages to files to recover them afterwards if the broker goes down.
Hope these suggestions help you, but in my opinion I believe you should describe more how should it be the result you want (the flow) since it does not seem to be very clear (at least to me)

Verify existence of message in a Kafka topic

I wish to avoid sending duplicate messages to a Kafka topic.
What is the ideal way to achieve it ?
Using Java client for Apache Kafka, is there anyway to verify if a message exists before invoking KafkaProducer.send
I am referring to this doc

Currently (Kafka 0.10.1), there is no way to have exactly-once delivery on write with Kafka. No matter what workaround you want to do, there will be always be a gap and you can end up with either lost messages or duplicates.
However, Kafka will add an idempotent producer (planned for 0.10.2) that will allow you to avoid duplicate writes. The target date for 0.10.2 release is beginning 2017.

It is impractical for you to check whether the same message has been delivered every time you send a new one. Think it another way: you could invoke KafkaProducer.send method with a callback notifying you of the success or failure.

That's pretty much out of scope for Kafka. You need to do that using a different storage that provides proper indexing for random access.
Depending on your needs, that can be (distributed) cache, a key-value store or whatever.
You'll probably want to do that on the consumer-side rather than producer, as different consumers may use different strategies for de-duplication (and some consumers may simply tolerate duplicates).

One upstream event with multiple items, how to set up spring integration pipeline transaction-wise

Let's imagine the situation where you have incoming upstream message which contains multiple items. Each item contains the information which participates in the business logic implemented as part of the pipeline.
Difficulties I can see:
Message has to be split & converted into multiple internal events, those are processed further and if one of them fails, then all internal events should be rolled back
If we had one upstream message = 1 item, it would be much easier
How should one cater for such situation from architecture point of view?
What is the best pattern to employ here?
How should one set up transactions?
Thanks!

Looks like your question isn't clear and that transaction word is used for different subjects...
Anyway let me guess what you want.
If you are going (and can) to roll back part of the business request, you should just ensure global XA transaction for all of them and do all splitted sub-tasks in the same thread. Because only this let you keep and track transaction and roll backs afterwards, if that.
If you can't deal with XA and single thread, than you should take a look to some solutions like compensation transaction or acknowledge with claim-checks.
But that is already outside of Spring Integration scope.

Demultiplexing messages from a queue to process in parallel streams using amqp?

I am trying to figure out if I can switch from a blocking scenario to a more reactive pattern.
I have incoming update commands arriving in a queue, and I need to handle them in order, but only those regarding the same entity. In essence, I can create as many parallel streams of update events as I wish, as long as no two streams contain events regarding the same entity.
I was thinking that the consumer of the primary queue would possibly be able to leverage amqp's routing mechanisms, and temporary queues, by creating temporary queues for each entity id, and hooking a consumer to them. Once the subscriber is finished and no other events regarding the entity in question are currently in the queue, the queue could be disposed of.
Is this scenario something that is used regularly? Is there a better way to achieve this? In our current system we use a named lock based on the id to prevent concurrent updates.

There are at least 2 Options:
A single queue for each entity
And n Consumers on one Entity-Queue.
One queue with messages of all entities. Where the message contains data what it is for an entity. You could than split this up into several queues (One AMQP-Queue for one type of entity) or by using a BlockingQueue implementation.
Benefits of splitting up the Entities in qmqp-queues
You could create an ha-setup with rabbitmq
You could route messages
You could maybe have more than one consumer of an entity queue if it
is necessary someday (scalability)
Messages could be persistent and therefore recoverable on an
application-crash
Benefits of using an internal BlockingQueue implementation
It is faster (no net-io obviously)
Everything has to happen in one JVM
Anyway it does depend on what you want since both ways could have their benefits.
UPDATE:
I am not sure if I got you now, but let me give you some resources to try some things out.
There are special rabbitmq extensions maybe some of them can give you an idea. Take a look at alternate exchanges and exchange to exchange bindings.
Also for basic testing, I am not sure if it covers all rabbitmq features or at all all amqp features but this can sometimes be usefull. Keep in mind the routing key in this visualization is the producer name, you can also find there some examples. Import and Export your configuration.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.