Handling transactions spanning across database servers

Handling transactions spanning across database servers - java

I have a scenario where the unit of work is defined as:
Update table T1 in database server S1
Update table T2 in database server S2
And I want the above unit of work to happen either completely or none at all (as the case with any database transaction). How can I do this? I searched extensively and found this post close to what I am expecting but this seems to be very specific to Hibernate.
I am using Spring, iBatis and Tomcat (6.x) as the container.

It really depends on how robust a solution you need. The minimal level of reliability on such a thing is XA transactions. To use that, you need a database and JDBC driver that supports it for starters, then you could configure Spring to use it (here is an outline).
If XA isn't robust enough for you (XA has failure scenarios, such as if something goes wrong in the second phase of commits, such as a hardware failure) then what you really need to do is put all the data in one database and then have a separate process propagate it. So the data may be inconsistent, but it is recoverable.
Edit: What I mean is that put the whole of the data into one database. Either the first database, or a different database for this purpose. This database would essentially become a queue from which the final data view is fed. The write to that database (assuming a decent database product) will be complete, or fail completely. Then, a separate thread would poll that database and distribute any missing data to the other databases. So if the process should fail, when that thread starts up again it will continue the distribution process. The data may not exist in every place you want it to right away, but nothing would get lost.

You want a distributed transaction manager. I like using Atomikos which can be run within a JVM.

Related

parallel updates to oracle table from spring mvc application

We have an application running in 2 containers/pods. This applications reads request from ActiveMQ and for processing this request this applications needs to update 10 tables. We have these hundreds of ASYNC request in ActiveMQ that needs to be processed in hundreds per second and each request tries to update the 10 tables in one oracle database. Because of these updates to 10 tables at the same fraction of a second some of the request fails frequently with this error "Error updating database. Cause: java.sql.SQLException: ORA-00060: deadlock detected while waiting for resource"
Is there a better way to handle these kind of scenarios like a better architecture ?
Is there a better way to handle these kind of scenarios using spring framework?

Your premise is wrong; deadlocks are not caused simply by a large amount of activity on many tables. They're a special kind of lock that occur when different sessions try to lock the same row in a different order. When that happens, the only solution is to kill one of the sessions to release the lock.
The first step in investigating deadlocks is to look in the alert log and find the trace file generated by a deadlock. It will show the statements and objects involved. If you're lucky, the mistake is caused by a missing foreign key index or a bitmap index on a transactional table, which can be resolved with a simple DDL change.
If you're unlucky, the application is changing tables in different orders. In that case, you'll need to change the application to always process changes the same way. I don't think I can offer any advice for that, it depends entirely on your application.

How does Spring Batch manage transactions (with possibly multiple datasources)?

I would like some information about the data flow in a Spring Batch processing but fail to find what I am looking for on the Internet (despite some useful questions on this site).
I am trying to establish standards to use Spring Batch in our company and we are wondering how Spring Batch behaves when several processors in a step updates data on different data sources.
This question focuses on a chunked process but feel free to provide information on other modes.
From what I have seen (please correct me if I am wrong), when a line is read, it follows the whole flow (reader, processors, writer) before the next is read (as opposed to a silo-processing where reader would process all lines, send them to the processor, and so on).
In my case, several processors read data (in different databases) and updates them in the process, and finally the writer inserts data into yet another DB. For now, the JobRepository is not linked to a database, but that would be an independent one, making the thing still a bit more complex.
This model cannot be changed since the data belongs to several business areas.
How is the transaction managed in this case? Is the data committed only once the full chunk is processed? And then, is there a 2-phase commit management? How is it ensured? What development or configuration should be made in order to ensure the consistency of data?
More generally, what would your recommendations be in a similar case?

Spring batch uses the Spring core transaction management, with most of the transaction semantics arranged around a chunk of items, as described in section 5.1 of the Spring Batch docs.
The transaction behaviour of the readers and writers depends on exactly what they are (eg file system, database, JMS queue etc), but if the resource is configured to support transactions then they will be enlisted by spring automatically. Same goes for XA - if you make the resource endpoint a XA compliant then it will utilise 2 phase commits for it.
Getting back to the chunk transaction, it will set up a transaction on chunk basis, so if you set the commit interval to 5 on a given tasklet then it will open and close a new transaction (that includes all resources managed by the transaction manager) for the set number of reads (defined as commit-interval).
But all of this is set up around reading from a single data source, does that meet your requirement? I'm not sure spring batch can manage a transaction where it reads data from multiple sources and writes the processor result into another database within a single transaction. (In fact I can't think of anything that could do that...)

Transaction Management in EJB

Recently I was asked a question which left me thinking..want to get the community views on the same question.
I have a CustomerEJB which has say a createCustomer method. My EJB is exposed as a web service and hence createCustomer is one of its operations.
When a request hits createCustomer, 2 operations need to be performed
An INSERT SQL query into the database which may be adding certain data into db that came in input request
creation of a text file, say .txt in the file system.
Now the question is I want to couple these two tasks into a transaction. If any one task fails, I rollback the other task as well.
Without mentioning any hot technologies, like Spring/Hibernate what is the approach I can follow for Transaction management
My thoughts:
1. I can use JTA, demarcate the transaction boundaries and perform commit and rollback accordingly. JDBC can be used for the SQL task
2. I can use DAOs
Inviting your kind suggestions/comments

You would need to wrap the file creating in a XA capable JCA connector (not sure whether there's a ready made one out there, a quick good only found this fsconnector which doesn't support transactions yet), and use an XA driver for your DB transaction (most DBs will will be able handle this) and then wrap your EJB in an XA transaction (should be straightforward).
As long as both resources can handle the XA transactions, you'll get the benefit of 2-phase commits, which is what you're after.

How to generically test a database connection with hibernate

I have a service method on an api that can be called to check the health of my database connection.
The method is pulling the query string from a properties file (depends on DB vendor, using Sybase and HSQL for now, more in future), and executing it. Then the method lets the caller know if it succeeded or failed.
In addition to this, I was using the Query.setHint("javax.persistence.query.timeout") to set a timeout on the query:
javax.persistence.EntityManager entityManager;
...
Query heartbeatQuery = entityManager.createNativeQuery(heartbeatQueryString);
heartbeatQuery.setHint("javax.persistence.query.timeout", heartbeatTimeout);
heartbeatQuery.getResultList();
My problem is the timeout property is working against my Sybase DB, but not against my HSQL DB. It sounds like it depends on the vendor, so I don't know for sure when it will work.
Is there a better way to generically test the DB connection & include some kind of timeout parameter?

Well sadly no. JPA's query hints are not mandatory, i.e. it's up to the implementator (EclipseLink, Hibernate, etc) to enforce them or not. Moreover, even if the implementator does chose to recognize a certain query hint, if that hint's functionality is not supported by the database then it won't work (here some implementators are nice and tell you if a certain hint won't work agains the current db while others fail silently). In the case of HSQLDB there's no way to set the query timeout. You can only set a timeout for the login (i.e. how long should it wait for a successful login before failing), but not for the queries duration.
Things are not so grim however. On the one hand, even if you'd solve this, you'd still stumble over other issues with HSQLDB, as it does not support a lot of other nice functionalities that most dbs have. You should only use HSQLDB for basic integration/unit testing. For more involved testing, you can use the integrated MySQL Java library. You can find it here:
http://dev.mysql.com/doc/refman/5.0/en/connector-mxj.html
This is simply a packaged fully working Mysql server, which has a Java api for star and stop, works on most major OSs (win,lin, os x, etc). This way you can have your integration tests start a real Mysql server, and try your code there, where such stuff as a query timeout hint will work fine.

What is the 'best' way to do distributed transactions across multiple databases using Spring and Hibernate

I have an application - more like a utility - that sits in a corner and updates two different databases periodically.
It is a little standalone app that has been built with a Spring Application Context. The context has two Hibernate Session Factories configured in it, in turn using Commons DBCP data sources configured in Spring.
Currently there is no transaction management, but I would like to add some. The update to one database depends on a successful update to the other.
The app does not sit in a Java EE container - it is bootstrapped by a static launcher class called from a shell script. The launcher class instantiates the Application Context and then invokes a method on one of its beans.
What is the 'best' way to put transactionality around the database updates?
I will leave the definition of 'best' to you, but I think it should be some function of 'easy to set up', 'easy to configure', 'inexpensive', and 'easy to package and redistribute'. Naturally FOSS would be good.

The best way to distribute transactions over more than one database is: Don't.
Some people will point you to XA but XA (or Two Phase Commit) is a lie (or marketese).
Imagine: After the first phase have told the XA manager that it can send the final commit, the network connection to one of the databases fails. Now what? Timeout? That would leave the other database corrupt. Rollback? Two problems: You can't roll back a commit and how do you know what happened to the second database? Maybe the network connection failed after it successfully committed the data and only the "success" message was lost?
The best way is to copy the data in a single place. Use a scheme which allows you to abort the copy and continue it at any time (for example, ignore data which you already have or order the select by ID and request only records > MAX(ID) of your copy). Protect this with a transaction. This is not a problem since you're only reading data from the source, so when the transaction fails for any reason, you can ignore the source database. Therefore, this is a plain old single source transaction.
After you have copied the data, process it locally.

Setup a transaction manager in your context. Spring docs have examples, and it is very simple. Then when you want to execute a transaction:
try {
TransactionTemplate tt = new TransactionTemplate(txManager);
tt.execute(new TransactionCallbackWithoutResult(){
protected void doInTransactionWithoutResult(
TransactionStatus status) {
updateDb1();
updateDb2();
}
} catch (TransactionException ex) {
// handle
}
For more examples, and information perhaps look at this:
XA transactions using Spring

When you say "two different databases", do you mean different database servers, or two different schemas within the same DB server?
If the former, then if you want full transactionality, then you need the XA transaction API, which provides full two-phase commit. But more importantly, you also need a transaction coordinator/monitor which manages transaction propagation between the different database systems. This is part of JavaEE spec, and a pretty rarefied part of it at that. The TX coordinator itself is a complex piece of software. Your application software (via Spring, if you so wish) talks to the coordinator.
If, however, you just mean two databases within the same DB server, then vanilla JDBC transactions should work just fine, just perform your operations against both databases within a single transaction.

In this case you would need a Transaction Monitor (server supporting XA protocol) and make sure your databases supports XA also. Most (all?) J2EE servers comes with Transaction Monitor built in. If your code is running not in J2EE server then there are bunch of standalone alternatives - Atomicos, Bitronix, etc.

You could try Spring ChainedTransactionManager - http://docs.spring.io/spring-data/commons/docs/1.6.2.RELEASE/api/org/springframework/data/transaction/ChainedTransactionManager.html that supports distributed db transaction. This could be a better alternative to XA

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Handling transactions spanning across database servers - java

You want a distributed transaction manager. I like using Atomikos which can be run within a JVM.

Related

parallel updates to oracle table from spring mvc application

How does Spring Batch manage transactions (with possibly multiple datasources)?

Transaction Management in EJB

How to generically test a database connection with hibernate

What is the 'best' way to do distributed transactions across multiple databases using Spring and Hibernate

Categories

Resources