I have an SQL server and I would like to get a callback each time new data was added or updated to some tables.
My initial thoughts are:
I would like to avoid using solutions like log shipping, linked servers, transactional replication, change tracking, change-data-capture, integration service because I want to be as less invasive to the server as possible.
Create a trigger that somehow sends callback requests to my Java service.
Can I achieve such a thing using a simple trigger like that:
CREATE TRIGGER my_trigger ON my_table
AFTER INSERT, UPDATE
AS
BEGIN
-- call something that notifies my Java service
END
Related
We have a micro-services architecture, with Kafka used as the communication mechanism between the services. Some of the services have their own databases. Say the user makes a call to Service A, which should result in a record (or set of records) being created in that service’s database. Additionally, this event should be reported to other services, as an item on a Kafka topic. What is the best way of ensuring that the database record(s) are only written if the Kafka topic is successfully updated (essentially creating a distributed transaction around the database update and the Kafka update)?
We are thinking of using spring-kafka (in a Spring Boot WebFlux service), and I can see that it has a KafkaTransactionManager, but from what I understand this is more about Kafka transactions themselves (ensuring consistency across the Kafka producers and consumers), rather than synchronising transactions across two systems (see here: “Kafka doesn't support XA and you have to deal with the possibility that the DB tx might commit while the Kafka tx rolls back.”). Additionally, I think this class relies on Spring’s transaction framework which, at least as far as I currently understand, is thread-bound, and won’t work if using a reactive approach (e.g. WebFlux) where different parts of an operation may execute on different threads. (We are using reactive-pg-client, so are manually handling transactions, rather than using Spring’s framework.)
Some options I can think of:
Don’t write the data to the database: only write it to Kafka. Then use a consumer (in Service A) to update the database. This seems like it might not be the most efficient, and will have problems in that the service which the user called cannot immediately see the database changes it should have just created.
Don’t write directly to Kafka: write to the database only, and use something like Debezium to report the change to Kafka. The problem here is that the changes are based on individual database records, whereas the business significant event to store in Kafka might involve a combination of data from multiple tables.
Write to the database first (if that fails, do nothing and just throw the exception). Then, when writing to Kafka, assume that the write might fail. Use the built-in auto-retry functionality to get it to keep trying for a while. If that eventually completely fails, try to write to a dead letter queue and create some sort of manual mechanism for admins to sort it out. And if writing to the DLQ fails (i.e. Kafka is completely down), just log it some other way (e.g. to the database), and again create some sort of manual mechanism for admins to sort it out.
Anyone got any thoughts or advice on the above, or able to correct any mistakes in my assumptions above?
Thanks in advance!
I'd suggest to use a slightly altered variant of approach 2.
Write into your database only, but in addition to the actual table writes, also write "events" into a special table within that same database; these event records would contain the aggregations you need. In the easiest way, you'd simply insert another entity e.g. mapped by JPA, which contains a JSON property with the aggregate payload. Of course this could be automated by some means of transaction listener / framework component.
Then use Debezium to capture the changes just from that table and stream them into Kafka. That way you have both: eventually consistent state in Kafka (the events in Kafka may trail behind or you might see a few events a second time after a restart, but eventually they'll reflect the database state) without the need for distributed transactions, and the business level event semantics you're after.
(Disclaimer: I'm the lead of Debezium; funnily enough I'm just in the process of writing a blog post discussing this approach in more detail)
Here are the posts
https://debezium.io/blog/2018/09/20/materializing-aggregate-views-with-hibernate-and-debezium/
https://debezium.io/blog/2019/02/19/reliable-microservices-data-exchange-with-the-outbox-pattern/
first of all, I have to say that I’m no Kafka, nor a Spring expert but I think that it’s more a conceptual challenge when writing to independent resources and the solution should be adaptable to your technology stack. Furthermore, I should say that this solution tries to solve the problem without an external component like Debezium, because in my opinion each additional component brings challenges in testing, maintaining and running an application which is often underestimated when choosing such an option. Also not every database can be used as a Debezium-source.
To make sure that we are talking about the same goals, let’s clarify the situation in an simplified airline example, where customers can buy tickets. After a successful order the customer will receive a message (mail, push-notification, …) that is sent by an external messaging system (the system we have to talk with).
In a traditional JMS world with an XA transaction between our database (where we store orders) and the JMS provider it would look like the following: The client sets the order to our app where we start a transaction. The app stores the order in its database. Then the message is sent to JMS and you can commit the transaction. Both operations participate at the transaction even when they’re talking to their own resources. As the XA transaction guarantees ACID we’re fine.
Let’s bring Kafka (or any other resource that is not able to participate at the XA transaction) in the game. As there is no coordinator that syncs both transactions anymore the main idea of the following is to split processing in two parts with a persistent state.
When you store the order in your database you can also store the message (with aggregated data) in the same database (e.g. as JSON in a CLOB-column) that you want to send to Kafka afterwards. Same resource – ACID guaranteed, everything fine so far. Now you need a mechanism that polls your “KafkaTasks”-Table for new tasks that should be send to a Kafka-Topic (e.g. with a timer service, maybe #Scheduled annotation can be used in Spring). After the message has been successfully sent to Kafka you can delete the task entry. This ensures that the message to Kafka is only sent when the order is also successfully stored in application database. Did we achieve the same guarantees as we have when using a XA transaction? Unfortunately, no, as there is still the chance that writing to Kafka works but the deletion of the task fails. In this case the retry-mechanism (you would need one as mentioned in your question) would reprocess the task an sends the message twice. If your business case is happy with this “at-least-once”-guarantee you’re done here with a imho semi-complex solution that could be easily implemented as framework functionality so not everyone has to bother with the details.
If you need “exactly-once” then you cannot store your state in the application database (in this case “deletion of a task” is the “state”) but instead you must store it in Kafka (assuming that you have ACID guarantees between two Kafka topics). An example: Let’s say you have 100 tasks in the table (IDs 1 to 100) and the task job processes the first 10. You write your Kafka messages to their topic and another message with the ID 10 to “your topic”. All in the same Kafka-transaction. In the next cycle you consume your topic (value is 10) and take this value to get the next 10 tasks (and delete the already processed tasks).
If there are easier (in-application) solutions with the same guarantees I’m looking forward to hear from you!
Sorry for the long answer but I hope it helps.
All the approach described above are the best way to approach the problem and are well defined pattern. You can explore these in the links provided below.
Pattern: Transactional outbox
Publish an event or message as part of a database transaction by saving it in an OUTBOX in the database.
http://microservices.io/patterns/data/transactional-outbox.html
Pattern: Polling publisher
Publish messages by polling the outbox in the database.
http://microservices.io/patterns/data/polling-publisher.html
Pattern: Transaction log tailing
Publish changes made to the database by tailing the transaction log.
http://microservices.io/patterns/data/transaction-log-tailing.html
Debezium is a valid answer but (as I've experienced) it can require some extra overhead of running an extra pod and making sure that pod doesn't fall over. This could just be me griping about a few back to back instances where pods OOM errored and didn't come back up, networking rule rollouts dropped some messages, WAL access to an aws aurora db started behaving oddly... It seems that everything that could have gone wrong, did. Not saying Debezium is bad, it's fantastically stable, but often for devs running it becomes a networking skill rather than a coding skill.
As a KISS solution using normal coding solutions that will work 99.99% of the time (and inform you of the .01%) would be:
Start Transaction
Sync save to DB
-> If fail, then bail out.
Async send message to kafka.
Block until the topic reports that it has received the
message.
-> if it times out or fails Abort Transaction.
-> if it succeeds Commit Transaction.
I'd suggest to use a new approach 2-phase message. In this new approach, much less codes are needed, and you don't need Debeziums any more.
https://betterprogramming.pub/an-alternative-to-outbox-pattern-7564562843ae
For this new approach, what you need to do is:
When writing your database, write an event record to an auxiliary table.
Submit a 2-phase message to DTM
Write a service to query whether an event is saved in the auxiliary table.
With the help of DTM SDK, you can accomplish the above 3 steps with 8 lines in Go, much less codes than other solutions.
msg := dtmcli.NewMsg(DtmServer, gid).
Add(busi.Busi+"/TransIn", &TransReq{Amount: 30})
err := msg.DoAndSubmitDB(busi.Busi+"/QueryPrepared", db, func(tx *sql.Tx) error {
return AdjustBalance(tx, busi.TransOutUID, -req.Amount)
})
app.GET(BusiAPI+"/QueryPrepared", dtmutil.WrapHandler2(func(c *gin.Context) interface{} {
return MustBarrierFromGin(c).QueryPrepared(db)
}))
Each of your origin options has its disadvantage:
The user cannot immediately see the database changes it have just created.
Debezium will capture the log of the database, which may be much larger than the events you wanted. Also deployment and maintenance of Debezium is not an easy job.
"built-in auto-retry functionality" is not cheap, it may require much codes or maintenance efforts.
I have a code in my business layer that updates data on database and also in a rest service.
The question is that if it doesn't fail data must be save in both places and, in other hand, if it fails it must to rollback in database and send another requisition to rest api.
So, what I'm looking for is a way to use transaction management of EJB to also orchestrait calls to api. When in commit time, send a set requisition to api and, when in rollback time, send delete requisition to api.
In fact I need to maintain consistency and make both places syncronous.
I have read about UserTransactions and managedbeans but I don't have a clue about what is the best way to do that.
You can use regular distributed transactions, depending on your infrastructure and participants. This might be possible e.g. if all participants are EJBs and the data stores are capable to handle distributed transactions.
This won't work with loosely coupled componentes, and your setup looks like this.
I do not recommend to create your own distributed transaction protocol. Regarding the edge and corner cases, you will probably not end up with consistent data in the end.
I would suggest to think about using event sourcing and eventually consistency for things like that. For example, you could emit an event (command) for writing data. If your "rollback" is needed, you can emit an event (command) to delete the date written before. After all events are processed, the data is consistent.
Some interesting links might be:
Martin Fowler - Event Sourcing
Martin Fowler - CQRS
Apache Kafka
I'm playing around with setting up a microservices / cqrs architecture for a personal project, and there's one point I don't understand in the "standard" setup.
By standard setup, I mean
https://www.ibm.com/developerworks/cloud/library/cl-build-app-using-microservices-and-cqrs-trs/index.html
Say I have an orders service and a pickup points service, and I have a command like "send order summary email".
How should the orders service get the data about the pickup point (eg opening hours etc) that it needs to send the email ? I see 4 possibilities, but there are surely others.
The command goes directly to the orders service, and then the orders service queries the pickup points service to get the data.
The command goes to the pickup points service, and then pickup points service publishes a new event for orders service with the needed information attached.
The command goes directly to the orders service, and the orders service then queries the read-only client-facing database.
Merge the 2 services... given that they have no other shared context, this would be a pity...
Thanks !
how to get data from another service
There are two use cases for this. In your specific case, what you are describing is somewhat akin to UI Composition; you are creating a view that pulls data from two different sources.
Key point #1: the data you are composing is stale -- by the time the email reaches its destination, the truth understood by the services may have changed anyway. Therefore, there is inherent in the requirements some flexibility about time.
Key point #2: In sending the email, you aren't changing the state of either service at all. You are just making a copy of some part of it. Reads are a safe operation.
Key point #3: Actually sending the email changes the "real world", not the services; it's an activity that can be performed concurrently with the service work.
So what this would normally look like is that one of your read models (probably that of the order service) will support a query that lists orders for which emails will be sent. Some process, running outside of the service, will periodically query that service for pending emails, query the required read models to compose the message, send it, and finally post a message to the input queue of the order service to share the information that the message was successfully sent. The order service would see that, and the read model gets updated to indicate that the message has already been sent.
You are describing a process of sending an order summary email to the customer after the order is completed.
In CQRS this is implemented with a Saga/Process manager.
The idea is that OrderSummaryEmailSaga subscribe to the OrderWasCompleted event; when such event is fired, the saga queries the Pickup service for the information it needs (most probable from a read-model) and then:
it builds+sends a complete SendOrderSummaryEmail command to the relevant aggregate from the orders service or
it calls an infrastructure service that, having all the data, it builds an email and send it to the customer
or a combination of the previous points, depending on how you want to manage this process
The details are specific to you case, like what domain services (building and formatting the email) or infrastructure services (actual sending of the email using sendmail or postfix or whatever) you need to build.
A REST client goes through a sequence of steps with a server in a flow. The client would like to cancel the flow and undo all the changes done to the data in that flow.
For example, we have a below method. It has three different steps in it. First two are rest calls where third one is data insertion. Now if restCall1(), restCall2() are success but third step is failed. Everything done in first two steps should be reverted back.
method(){
restCall1(); // Rest Call to the server, perform DB operations
restCall2(); // Rest call to the server, perform file operations
insertData(); // Perform DB operations
}
What is the best practice to deal with this transaction problem. One way is to build a custom transaction framework and rollback steps. Is there any framework/tool that can give solution to deal with this problem?
A REST client can only maintain it's own state in REST call and there is no synchronization between state of REST client and resources managed by server.You will have to maintain the synchronization among the various REST calls to web service which violates the stateless communication.If you intend to do that you have to maintain the synchronization of states among the various REST calls and co-ordinate accordingly.
More here
However there are workarounds.Retro is a model which supports the transactions for REST in an error prone and scalable way.
See this as well.Hope it helps.
I'm developing a Java desktop application with Swing as GUI. The app contains services that query from database every seconds to make the interface synced from database. We all know that with this approach, performance is the enemy.
What I want to achieve is that every changes made from database, altered thru psql (Postgres command-line) for example, my app should be notified to update the UI. In this way, performance may be optimized.
Thanks!
As #a_horse_with_no_name points out, PostgreSQL supports asynchronous notification channels for just this purpose.
You should create a trigger, probably in plpgsql, on the table(s) you wish to monitor. This trigger fires a NOTIFY when the table changes, optionally including the changed data its self.
The application LISTENs on the notification channel(s) and processes any asynchronous notifications it receives.
Note that it's valid to send an empty query, and you should do that instead of SELECT 1. e.g.:
stmt.execute("");
IIRC even that's optional if you're not using SSL, there's a way to poll purely client side. I don't remember what it is, though, so that's not real helpful.
You can determine exactly what changed by using a trigger-maintained change list table, or by using payloads on your notifications. Or you can simply re-read the whole table if it's small.