I have a code in my business layer that updates data on database and also in a rest service.
The question is that if it doesn't fail data must be save in both places and, in other hand, if it fails it must to rollback in database and send another requisition to rest api.
So, what I'm looking for is a way to use transaction management of EJB to also orchestrait calls to api. When in commit time, send a set requisition to api and, when in rollback time, send delete requisition to api.
In fact I need to maintain consistency and make both places syncronous.
I have read about UserTransactions and managedbeans but I don't have a clue about what is the best way to do that.
You can use regular distributed transactions, depending on your infrastructure and participants. This might be possible e.g. if all participants are EJBs and the data stores are capable to handle distributed transactions.
This won't work with loosely coupled componentes, and your setup looks like this.
I do not recommend to create your own distributed transaction protocol. Regarding the edge and corner cases, you will probably not end up with consistent data in the end.
I would suggest to think about using event sourcing and eventually consistency for things like that. For example, you could emit an event (command) for writing data. If your "rollback" is needed, you can emit an event (command) to delete the date written before. After all events are processed, the data is consistent.
Some interesting links might be:
Martin Fowler - Event Sourcing
Martin Fowler - CQRS
Apache Kafka
Related
We are currently using axon framework with hikaricp as data source pooling system. We are facing pool exhausting issues from time to time and we have a theory:
To update our read models we use the command bus to send UpdateEntityViewCommand.
As the command bus starts a transaction using the primary transaction manager (the write one) it acquires a connection from the write pool
On the handler, we open an inner transaction using a connection from the read pool, thus blocking the outer one.
This seems to exhaust the pools under some conditions. The question is: should we stop using the command bus to update our read models? Is appropiate to have two buses( one for write and one for read?
Thanks in advance
Axon's intent with the separation of Commands, Events, and Queries, is in general to support CQRS within your system. This means you would introduce dedicated Command Models and Query Models, which respectively only receive Command and Query messages.
The Query Models are typically also referred to as Projections or Read Models.
It is this combination of ideas Axon is normally used for which makes your question a bit vague. Or, perchance, you are using it differently than normally intended.
That Axon supports CQRS through this approach, does not necessitate that you do CQRS within your set up as well, by the way.
At any note, it would be good too (as also stated in the comments):
Include code snippets of the message handlers when these prove useful
Give some deeper insight in your configuration
Whether you aim to use the CQRS pattern in your application, yes or no
Completely separate from this, I can give some guidance on how Axon deals with transactions. Every message inside an Axon application will start a so-called "Unit of Work". It is the UnitOfWork (UoW for short) that will start a transaction. This means that no matter if you use commands, events or queries, Axon will have started that transaction for you already.
Taking this a step further, this also means that whatever you do inside a message handling function (thus the #CommandHandler, #EventHandler and #QueryHandler annotated methods) will always already have an active transaction running. That for example defines that you do not have to include your own transaction management inside a message handling function, with for example the #Transactional annotation.
Concluding, I am guessing you might be mixing some concepts. This can obviously happen, so no worries there; that's what SO is for. Next to this though, you thus do not have to start your own transaction inside any message handling function, as you already have an active transaction at that point in time.
We have a micro-services architecture, with Kafka used as the communication mechanism between the services. Some of the services have their own databases. Say the user makes a call to Service A, which should result in a record (or set of records) being created in that service’s database. Additionally, this event should be reported to other services, as an item on a Kafka topic. What is the best way of ensuring that the database record(s) are only written if the Kafka topic is successfully updated (essentially creating a distributed transaction around the database update and the Kafka update)?
We are thinking of using spring-kafka (in a Spring Boot WebFlux service), and I can see that it has a KafkaTransactionManager, but from what I understand this is more about Kafka transactions themselves (ensuring consistency across the Kafka producers and consumers), rather than synchronising transactions across two systems (see here: “Kafka doesn't support XA and you have to deal with the possibility that the DB tx might commit while the Kafka tx rolls back.”). Additionally, I think this class relies on Spring’s transaction framework which, at least as far as I currently understand, is thread-bound, and won’t work if using a reactive approach (e.g. WebFlux) where different parts of an operation may execute on different threads. (We are using reactive-pg-client, so are manually handling transactions, rather than using Spring’s framework.)
Some options I can think of:
Don’t write the data to the database: only write it to Kafka. Then use a consumer (in Service A) to update the database. This seems like it might not be the most efficient, and will have problems in that the service which the user called cannot immediately see the database changes it should have just created.
Don’t write directly to Kafka: write to the database only, and use something like Debezium to report the change to Kafka. The problem here is that the changes are based on individual database records, whereas the business significant event to store in Kafka might involve a combination of data from multiple tables.
Write to the database first (if that fails, do nothing and just throw the exception). Then, when writing to Kafka, assume that the write might fail. Use the built-in auto-retry functionality to get it to keep trying for a while. If that eventually completely fails, try to write to a dead letter queue and create some sort of manual mechanism for admins to sort it out. And if writing to the DLQ fails (i.e. Kafka is completely down), just log it some other way (e.g. to the database), and again create some sort of manual mechanism for admins to sort it out.
Anyone got any thoughts or advice on the above, or able to correct any mistakes in my assumptions above?
Thanks in advance!
I'd suggest to use a slightly altered variant of approach 2.
Write into your database only, but in addition to the actual table writes, also write "events" into a special table within that same database; these event records would contain the aggregations you need. In the easiest way, you'd simply insert another entity e.g. mapped by JPA, which contains a JSON property with the aggregate payload. Of course this could be automated by some means of transaction listener / framework component.
Then use Debezium to capture the changes just from that table and stream them into Kafka. That way you have both: eventually consistent state in Kafka (the events in Kafka may trail behind or you might see a few events a second time after a restart, but eventually they'll reflect the database state) without the need for distributed transactions, and the business level event semantics you're after.
(Disclaimer: I'm the lead of Debezium; funnily enough I'm just in the process of writing a blog post discussing this approach in more detail)
Here are the posts
https://debezium.io/blog/2018/09/20/materializing-aggregate-views-with-hibernate-and-debezium/
https://debezium.io/blog/2019/02/19/reliable-microservices-data-exchange-with-the-outbox-pattern/
first of all, I have to say that I’m no Kafka, nor a Spring expert but I think that it’s more a conceptual challenge when writing to independent resources and the solution should be adaptable to your technology stack. Furthermore, I should say that this solution tries to solve the problem without an external component like Debezium, because in my opinion each additional component brings challenges in testing, maintaining and running an application which is often underestimated when choosing such an option. Also not every database can be used as a Debezium-source.
To make sure that we are talking about the same goals, let’s clarify the situation in an simplified airline example, where customers can buy tickets. After a successful order the customer will receive a message (mail, push-notification, …) that is sent by an external messaging system (the system we have to talk with).
In a traditional JMS world with an XA transaction between our database (where we store orders) and the JMS provider it would look like the following: The client sets the order to our app where we start a transaction. The app stores the order in its database. Then the message is sent to JMS and you can commit the transaction. Both operations participate at the transaction even when they’re talking to their own resources. As the XA transaction guarantees ACID we’re fine.
Let’s bring Kafka (or any other resource that is not able to participate at the XA transaction) in the game. As there is no coordinator that syncs both transactions anymore the main idea of the following is to split processing in two parts with a persistent state.
When you store the order in your database you can also store the message (with aggregated data) in the same database (e.g. as JSON in a CLOB-column) that you want to send to Kafka afterwards. Same resource – ACID guaranteed, everything fine so far. Now you need a mechanism that polls your “KafkaTasks”-Table for new tasks that should be send to a Kafka-Topic (e.g. with a timer service, maybe #Scheduled annotation can be used in Spring). After the message has been successfully sent to Kafka you can delete the task entry. This ensures that the message to Kafka is only sent when the order is also successfully stored in application database. Did we achieve the same guarantees as we have when using a XA transaction? Unfortunately, no, as there is still the chance that writing to Kafka works but the deletion of the task fails. In this case the retry-mechanism (you would need one as mentioned in your question) would reprocess the task an sends the message twice. If your business case is happy with this “at-least-once”-guarantee you’re done here with a imho semi-complex solution that could be easily implemented as framework functionality so not everyone has to bother with the details.
If you need “exactly-once” then you cannot store your state in the application database (in this case “deletion of a task” is the “state”) but instead you must store it in Kafka (assuming that you have ACID guarantees between two Kafka topics). An example: Let’s say you have 100 tasks in the table (IDs 1 to 100) and the task job processes the first 10. You write your Kafka messages to their topic and another message with the ID 10 to “your topic”. All in the same Kafka-transaction. In the next cycle you consume your topic (value is 10) and take this value to get the next 10 tasks (and delete the already processed tasks).
If there are easier (in-application) solutions with the same guarantees I’m looking forward to hear from you!
Sorry for the long answer but I hope it helps.
All the approach described above are the best way to approach the problem and are well defined pattern. You can explore these in the links provided below.
Pattern: Transactional outbox
Publish an event or message as part of a database transaction by saving it in an OUTBOX in the database.
http://microservices.io/patterns/data/transactional-outbox.html
Pattern: Polling publisher
Publish messages by polling the outbox in the database.
http://microservices.io/patterns/data/polling-publisher.html
Pattern: Transaction log tailing
Publish changes made to the database by tailing the transaction log.
http://microservices.io/patterns/data/transaction-log-tailing.html
Debezium is a valid answer but (as I've experienced) it can require some extra overhead of running an extra pod and making sure that pod doesn't fall over. This could just be me griping about a few back to back instances where pods OOM errored and didn't come back up, networking rule rollouts dropped some messages, WAL access to an aws aurora db started behaving oddly... It seems that everything that could have gone wrong, did. Not saying Debezium is bad, it's fantastically stable, but often for devs running it becomes a networking skill rather than a coding skill.
As a KISS solution using normal coding solutions that will work 99.99% of the time (and inform you of the .01%) would be:
Start Transaction
Sync save to DB
-> If fail, then bail out.
Async send message to kafka.
Block until the topic reports that it has received the
message.
-> if it times out or fails Abort Transaction.
-> if it succeeds Commit Transaction.
I'd suggest to use a new approach 2-phase message. In this new approach, much less codes are needed, and you don't need Debeziums any more.
https://betterprogramming.pub/an-alternative-to-outbox-pattern-7564562843ae
For this new approach, what you need to do is:
When writing your database, write an event record to an auxiliary table.
Submit a 2-phase message to DTM
Write a service to query whether an event is saved in the auxiliary table.
With the help of DTM SDK, you can accomplish the above 3 steps with 8 lines in Go, much less codes than other solutions.
msg := dtmcli.NewMsg(DtmServer, gid).
Add(busi.Busi+"/TransIn", &TransReq{Amount: 30})
err := msg.DoAndSubmitDB(busi.Busi+"/QueryPrepared", db, func(tx *sql.Tx) error {
return AdjustBalance(tx, busi.TransOutUID, -req.Amount)
})
app.GET(BusiAPI+"/QueryPrepared", dtmutil.WrapHandler2(func(c *gin.Context) interface{} {
return MustBarrierFromGin(c).QueryPrepared(db)
}))
Each of your origin options has its disadvantage:
The user cannot immediately see the database changes it have just created.
Debezium will capture the log of the database, which may be much larger than the events you wanted. Also deployment and maintenance of Debezium is not an easy job.
"built-in auto-retry functionality" is not cheap, it may require much codes or maintenance efforts.
I'm currently in a project where JPA and Kafka are used. I'm trying to find a set of good practice for combining those operations.
In the existing code, the producer is used in the same transaction as jpa, however, from what I have read, it seems that they don't share a transaction.
#PostMapping
#Transactional
public XDto createX(#RequestBody XRequest request) {
Xdto dto = xService.create(request);
kafkaProducer.putToQueue(dto, Type.CREATE);
return dto;
}
where the kafka producer is defined as the following:
public class KafkaProducer {
#Autowired
private KafkaTemplate<String, Type> template;
public void putToQueue(Dto dto, Type eventType) {
template.send("event", new Event(dto, eventType));
}
}
Is this a valid use case for combining jpa and kafka, are the transaction boundaries defined correctly?
this would not work as intended when the transaction fails. kafka interaction is not part of transaction.
You may want to have a look at TransactionalEventListener You may want to write the message to kafka on the AFTER_COMMIT event. even then the kafka publish may fail.
Another option is to write to db using jpa as you are doing. Let debezium read the updated data from your database and push it to kafka. The event will be in a different format but far more richer.
By looking at your question, I'm assuming that you are trying to achieve CDC (Change Data Capture) of your OLTP System, i.e. logging every change that is going to the transactional database. There are two ways to approach this.
Application code does dual writes to transactional DB as well as Kafka. It is inconsistent and hampers the performance. Inconsistent, because when you make the dual write to two independent systems, the data gets screwed when either of the writes fails and pushing data to Kafka in transaction flow adds latency, which you don't want to compromise on.
Extract changes from DB commit (either database/application-level triggers or transaction log) and send it to Kafka. It is very consistent and doesn't affect your transaction at all. Consistent because the DB commit logs are the reflections of the DB transactions after successful commits. There are a lot of solutions available which leverage this approach like databus, maxwell, debezium etc.
If CDC is your use case, try using any of the already available solutions.
As others have said, you could use change data capture to safely propagate the changes applied to your database to Apache Kafka. You cannot update the database and Kafka in a single transaction as the latter doesn't support any kind of 2-phase-commit protocol.
You might either CDC the tables themselves, or, if you wish to have some more control about the structure sent towards Kafka, apply the "outbox" pattern. In that case, your application would write to its actual business tables as well as an "outbox" table which contains the messages to send to Kafka. You can find a detailed description of this approach in this blog post.
Disclaimer: I'm the author of this post and the lead of Debezium, one of the CDC solutions mentioned in some of the other answers.
You shouldn't put the sending message to kafka in transaction. If you need the logic when if it fails to send event to kafka, then revert transaction, it will be better to use spring-retry in this case. Just put the code related to the sending event to kafka in #Retryable annotated method, and also add the #Recover annotated method with the logic of reverting changes to DB made before.
Consider Spring MVC java web-application, which provides some REST API.
Let's say it has many methods, one of them is DELETE /api/foo/{id}, which obviously deletes foo entity from the DB with given id.
The problem is that due to big data in the DB, this operation is not immediate, so if client tries perform simultaneously multiply delete operations on same entity, say
DELETE /api/foo/123 x N times (by mistake in client software of course),
it causes some unpleasant side effects in the DB (you know, if you try delete same entity in several transactions, that's not generally nice).
My question is: what is the best practice in Spring MVC to prevent such situations?
I can certainly introduce synchronisation on Foo id in each such update method (PUT/DELETE). I will need to do it for all entities and all PUT/DELETE API methods though, which I really don't want to do. I suppose it should be some elegant and nice solution, how to perform such type of synchronisation on interceptor/servlet level, i.e. not on service of controller level.
I can also create specific interceptor and perform there waiting for duplicated requests (requests with same URL and parameters). But again, it doesn't sound as an elegant solution (until I will be ensured that it is not possible to configure in Spring MVC somehow in more beauty way).
That is a problem of concurrency that shall be handled by using the appropriate transaction and locking level. Unfortunately, there is no single size fits all way here and depending on your actual requirements, you could have to implement optimistic or pessimistic locking, as well as one of the possible transaction level (from no transaction at all to serializable transactions).
In general, handling such questions at the web level is a bad idea, because you will end in questions like what to do in on request wants to delete some data that another one is displaying at the same time? In SpringMVC, the common way is to use transactional methods in the service layer. Additionaly, you should declare an optimistic or pessimistic locking system in the persistence layer.
Optimistic layer normally give a higher throughput, at the cost of some transaction ending in exceptions. In that case, current best practices are now to report the problem to the user asking him/her to send his/her request again.
I wonder if it is possible to add a listener to Cassandra getting the table and the primary key for changed entries? It would be great to have such a mechanism.
Checking Cassandra documentation I only find adding StateListener(s) to the Cluster instance.
Does anyone know how to do this without hacking Cassandras data store or encapsulate the driver and do something on my own?
Check out this future jira --
https://issues.apache.org/jira/browse/CASSANDRA-8844
If you like it vote for it : )
CDC
"In databases, change data capture (CDC) is a set of software design
patterns used to determine (and track) the data that has changed so
that action can be taken using the changed data. Also, Change data
capture (CDC) is an approach to data integration that is based on the
identification, capture and delivery of the changes made to enterprise
data sources."
-Wikipedia
As Cassandra is increasingly being used as the Source of Record (SoR)
for mission critical data in large enterprises, it is increasingly
being called upon to act as the central hub of traffic and data flow
to other systems. In order to try to address the general need, we,
propose implementing a simple data logging mechanism to enable
per-table CDC patterns.
If clients need to know about changes, the world has mostly gone to the message broker model-- a middleman which connects producers and consumers of arbitrary data. You can read about Kafka, RabbitMQ, and NATS here. There is an older DZone article here. In your case, the client writing to the database would also send out a change message. What's nice about this model is you can then pull whatever you need from the database.
Kafka is interesting because it can also store data. In some cases, you might be able to dispose of the database altogether.
Are you looking for something like triggers?
https://github.com/apache/cassandra/tree/trunk/examples/triggers
A database trigger is procedural code that is automatically executed
in response to certain events on a particular table or view in a
database. The trigger is mostly used for maintaining the integrity of
the information on the database. For example, when a new record
(representing a new worker) is added to the employees table, new
records should also be created in the tables of the taxes, vacations
and salaries.