Java Concurrent Update for multiple tables With Transaction support - java

I am looking for a development of a transaction framework, which needs to update the database tables concurrently.
In simple words, a single transaction should update concurrently around 8 independent tables, and the whole transaction should fail if any update thrown error.
Is there any way I can handle it concurrently,
Ie, 10 Threads update 10 Tables and if any update fails all the update should rollback.
Is there any framework which allows to me handle this scenario.
If you use JTA or Spring transaction which will be shared by same connection and defeat the purpose of concurrent update.
Or any way I can write using custom thread based solution.

Why would using JTA or Spring Transaction mean you'll use the same connection? If you configure a connection pool and connect to it correctly, surely you'll get a different connection for each thread that you use?
This just seems like an unusually configured distributed transaction to me, and my first attempt at this would be to use Spring and/or Hibernate. I think you'd just have to ensure that you were treating the transactions as distributed transactions.

You can use the standard JDBC. JDBC allows you to share a single Connection among multiple threads. To make several threads work in one transaction you should
create a java.sql.Connection / or take it from pool
turn autocommit off
run concurrent tasks with the same connection
wait for the tasks to finish
commit if all tasks finished successfully; rollback otherwise
close connection
It is also possible to use Spring JDBC if you use Spring's SingleConnectionDataSource

The framework is JTA.
It depends from your database whether you can use one connection for all threads. Details can be found here. So in the general case you need a connection for each thread.
If you use an XA data source, you could try to run the concurrent threads under control of a JTA transaction.
This is a lot of complexity, and it takes time to prepare the threads, so it's probably only useful, if the updates take a long time, the affected tables are independent, and you have enough CPUs in your database server.
Update
Regarding transaction propagation, here you can find some thoughts on it.

Related

Why do we need a connection pool for a standalone application?

I need to know why we need a connection pool for a standalone application. According to my knowledge, a standalone application needs only one database connection instance. That's why we use the singleton pattern while creating the connection object using JDBC. So what's the use of having a connection pool for a standalone application? If I am using a connection pool, do I need to specify the max size as 1? Here I am trying to use the CP30 connection pool with native Hibernate.
A major reason for using a connection pool is that it makes it easier for your application to recover in case the connection goes bad. The only time I would not use a connection pool was if it was acceptable for the program to fail if the connection stopped working. An example could be a very simple batch job that executed one transaction and the job framework running it would retry it if it failed.
I agree you have a stand-alone application, but that does not mean, you always need to use a Singleton design pattern. How about a single application spinning up multiple threads and each thread connecting to the database. In that case, Singleton won't be of any help, and you should implement connection pool, you are gracefully handling the db operations.
Connection pool and applications (stand-alone or distributed), are related to some extent, but it majorly depends on the use case. Suppose you are working on a stand-alone desktop based application, which is a simple CRUD one, in that case, I agree you need not implement connection pool, but in case we are talking about multiple user, and that too parallel, I think we should always leverage connection pool.
Not sure what your use-case talks about, but generalizing the statement, "Stand-alone application, does not need connection pooling", does not stand true always.
The cost of using a connection pool is usually insignificant.
Your data access layer does not need to know whether it's being called from a standalone application or, say, a multithreaded web application. So there's a good case for always using connection pooling, which doesn't hurt in the first case and is probably necessary in the second.

Request guarantees with hibernate or connection pooling

Does hibernate that utilises connection pools require retries to take care of intermittent failures (e.g. network issues). My colleague is of the opinion that it's not necessary cause of the use of connection pools and that if there was anything wrong with the connection then the connection pool manager would take care of it. I'm not convinced as the connection could be open and valid, but when the request is made it could succumb to network issues.
As what is being done is related to payments we need strong guarantees that the update takes place. I tried googling how hibernate/connection pools might deal with intermittent issues during a single request but couldn't find much information.
The entity is being saved by a call to getSession().update(object); where getSession() returns the current Hibernate session. We use Hibernate v4.3 and looking at the hibernate documentation it only mentions an exception is thrown if the persistence instance has the same identifier.
Would appreciate if I could get some links to some references/documentation that might guide my confusion.
You should rely on transactions to give you strong guarantees that a change is made atomically. So in case of a (network) failure your transaction would rollback.
Connection pools provide no such functionality, they facilitate the reuse of connections. See this question about connection pooling: What is database pooling?

How to use a JTA transaction with two databases?

App1 interacts with App2(EJB application) using some client api exposed by App2.Uses CMT managed JTA transaction in Jboss.We are getting the UserTransaction from App2(Jboss) using JNDI look up.
App1 makes a call to App2 to insert data into DS2 using UserTransaction's begin() and commit().
App1 makes a call to DS1 using Hibernate JPA to insert data into DS1 using JPATransaction Manager.
Is it possible to wrap above both DB operations in a single transaction(Distributed transaction)
PFB the image which describes requirement
To do this it´s necessary to implement your own transactional resource, capable of joining an ongoing JTA transaction. See this answer as well for some guidelines, one way to see how this is done is to look at XA driver code for a database or JMS resource, and base yourself on that.
This is not trivial to do and a very rare use case, usually solved in practice by adopting an alternative design. One way would be to extract the necessary code from App2 into a jar library, and use it in Tomcat with a JTA transaction manager like Atomikos connected to two XA JTA datasources.
Another way is to flush the SQL statements into the database into tomcat and see if that works, before sending a synchronous call to JBoss, returning the result if the transaction in JBoss went through.
Depending on that commit/rollback in tomcat. This does not guarantee that will work 100% of the times (network failure etc) but might be acceptable depending on what the system does and the business consequences of a failed transaction.
Yet another way is to make the operation revertable in JBoss side and expose a compensate service used by tomcat in case errors are detected. For that and making the two servers JBoss you could take advantage of the JBoss Narayana engine, see also this answer.
Which way is better it depends on the use case, but implementing your own XA transactional services is a big undertaking, I would be simpler to change the design. The reason that very few projects are doing it is doing it is that it´s complex and there are simpler alternatives.
Tomcat is a webserver, so it does not support Global Transactions.
JBoss is an Application server, so it supports Global transactions.
If you have to combine both, you have to use JOTM or ATOMIKOS which acts as Trasaction Managers and commits or rollbacks.

Modify JDBC code to target multiple database servers for Inesrt/Update/Deletions

Is there a way that I can use JDBC to target multiple databases when I execute statements (basic inserts, updates, deletes).
For example, assume both servers [200.200.200.1] and [200.200.200.2] have a database named MyDatabase, and the databases are exactly the same. I'd like to run "INSERT INTO TestTable VALUES(1, 2)" on both databases at the same time.
Note regarding JTA/XA:
We're developing a JTA/XA architecture to target multiple databases in the same transaction, but it won't be ready for some time. I'd like to use standard JDBC batch commands and have them hit multiple servers for now if its possible. I realize that it won't be transaction safe, I just wan't the commands to hit both servers for basic testing at the moment.
You need one connection per database. Once you have those, the standard auto commit/rollback calls will work.
You could try Spring; it already has transaction managers set up.
Even if you don't use Spring, all you have to do is get XA versions of the JDBC driver JARs in your CLASSPATH. Two phase commit will not work if you don't have them.
I'd wonder if replication using the database would not be a better idea. Why should the middle tier care about database clustering?
Best quick and dirty way for development is to use multiple database connections. They won't be in the same transaction since they are in different connection. I don't think this would be much of an issue if this is just for testing.
When your JTA/XA architecture is ready, just plug it into the already working code.

Hibernate L2 cache. Read-write or transactional cache concurrency strategy on cluster?

I’m trying to figure out which cache concurrency strategy should I use for my application (for entity updates, in particular). The application is a web-service developed using Hibernate, is deployed on Amazon EC2 cluster and works on Tomcat, so no application server there.
I know that there are nonstrict-read-write \ read-write and transactional cache concurrency strategies for data that can be updated and there are mature, popular, production ready 2L cache providers for Hibernate: Infinispan, Ehcache, Hazelcast.
But I don't completely understand the difference between the transactional and read-write caches from the Hibernate documentation. I thought that the transactional cache is the only choice for a cluster application, but now (after reading some topics), I'm not so sure about that.
So my question is about the read-write cache. Is it cluster-safe? Does it guarantee data synchronization between database and the cache as well as synchronization between all the connected servers? Or it is only suitable for single server applications and I should always prefer the transactional cache?
For example, if a database transaction that is updating an entity field (first name, etc.) fails and has been rolled back, will the read-write cache discard the changes or it will just populate the bad data (the updated first name) to all the other nodes?
Does it require a JTA transaction for this?
The Concurrency strategy configuration for JBoss TreeCache as 2nd level Hibernate cache topic says:
'READ_WRITE` is an interesting combination. In this mode Hibernate
itself works as a lightweight XA-coordinator, so it doesn't require a
full-blown external XA. Short description of how it works:
In this mode Hibernate manages the transactions itself. All DB
actions must be inside a transaction, autocommit mode won't work.
During the flush() (which might appear multiple time during
transaction lifetime, but usually happens just before the commit)
Hibernate goes through a session and searches for
updated/inserted/deleted objects. These objects then are first saved
to the database, and then locked and updated in the cache so
concurrent transactions can neither update nor read them.
If the transaction is then rolled back (explicitly or because of some
error) the locked objects are simply released and evicted from the
cache, so other transactions can read/update them.
If the transaction is committed successfully, then the locked objects are
simply released and other threads can read/write them.
Is there some documentation how this works in a cluster environment?
It seems that the transactional cache works correctly for this, but requires JTA environment with a standalone transaction manager (such as JBossTM, Atomikos, Bitronix), XA datasource and a lot of configuration changes and testing. I managed to deploy this, but still have some issues with my frameworks. For instance, Google Guice IoC does not support JTA transactions and I have to replace it with Spring or move the service to some application server and use EJB.
So which way is better?
Thanks in advance!
Summary of differences
NonStrict R/w and R/w are both asynchronous strategies, meaning they
are updated after the transaction is completed.
Transactional is
obviously synchronous and is updated within the transaction.
Nonstrict R/w never locks an entity, so there's always the chance of
a dirty read.
Read-Write always soft locks an entity, so any
simultaneous access is sent to the database. However, there is a
remote chance that R/w might not produce Repeatable Read isolation.
The best way to understand the differences between these strategies
is to see how they behave during the course of the Insert, update or
delete operations.
You can check out my post
here
which describes the differences in further detail.
Feel free to comment.
So far I've only seen clustered 2LC working with transactional cache modes. That's precisely what Infinispan does, and in fact, Infinispan has so far stayed away from implementing the other cache concurrency modes. To lighten the transactional burden, Infinispan integrates via transaction synchronizations with Hibernate as opposed to XA.

Categories