Single transaction across multiple threads solution

Single transaction across multiple threads solution - java

As I understand it, all transactions are Thread-bound (i.e. with the context stored in ThreadLocal). For example if:
I start a transaction in a transactional parent method
Make database insert #1 in an asynchronous call
Make database insert #2 in another asynchronous call
Then that will yield two different transactions (one for each insert) even though they shared the same "transactional" parent.
For example, let's say I perform two inserts (and using a very simple sample, i.e. not using an executor or completable future for brevity, etc.):
#Transactional
public void addInTransactionWithAnnotation() {
addNewRow();
addNewRow();
}
Will perform both inserts, as desired, as part of the same transaction.
However, if I wanted to parallelize those inserts for performance:
#Transactional
public void addInTransactionWithAnnotation() {
new Thread(this::addNewRow).start();
new Thread(this::addNewRow).start();
}
Then each one of those spawned threads will not participate in the transaction at all because transactions are Thread-bound.
Key Question: Is there a way to safely propagate the transaction to the child threads?
The only solutions I've thought of to solve this problem:
Use JTA or some XA manager, which by definition should be able to do
this. However, I ideally don't want to use XA for my solution
because of it's overhead
Pipe all of the transactional work I want performed (in the above example, the addNewRow() function) to a single thread, and do all of the prior work in the multithreaded fashion.
Figuring out some way to leverage InheritableThreadLocal on the Transaction status and propagate it to the child threads. I'm not sure how to do this.
Are there any more solutions possible? Even if it's tastes a little bit of like a workaround (like my solutions above)?

The JTA API has several methods that operate implicitly on the current Thread's Transaction, but it doesn't prevent you moving or copying a Transaction between Threads, or performing certain operations on a Transaction that's not bound to the current (or any other) Thread. This causes no end of headaches, but it's not the worst part...
For raw JDBC, you don't have a JTA Transaction at all. You have a JDBC Connection, which has its own ideas about transaction context. In which case, the transaction is Connection bound, not thread bound. Pass the Connection around and the tx goes with it. But Connections aren't necessarily threadsafe and are probably a performance bottleneck anyhow, so sharing one between multiple concurrent threads doesn't really help you. You likely need multiple Connections that think they are in the same Transaction, which means you need XA, since that's how the db identifies such cases. At which point you're back to JTA, but now with a JCA in the picture to handle the Connection management properly. In short, you've reinvented the JavaEE application server.
For frameworks that layer on JDBC e.g. ORMs like Hibernate, you have an additional complication: their abstractions are not necessarily threadsafe. So you can't have a Session that is bound to multiple Threads concurrently. But you can have multiple concurrent Sessions that each participate in the same XA transaction.
As usual it boils down to Amdahl's law. If the speedup you get from using multiple Connections per tx to allow for multiple concurrent Threads to share the db I/O work is large relative to what you get from batching, then the overhead of XA is worthwhile. If the speedup is in local computation and the db I/O is a minor concern, then a single Thread that handles the JDBC Connection and offloads non-IO computation work to a Thread pool is the way to go.

First, a clarification: if you want to speed up several inserts of the same kind, as your example suggests, you will probably get the best performance by issuing the inserts in the same thread and using some type of batch inserting. Depending on your DBMS there are several techniques available, look at:
Efficient way to do batch INSERTS with JDBC
What's the fastest way to do a bulk insert into Postgres?
As for your actual question, I would personally try to pipe all the work to a worker thread. It is the simplest option as you don't need to mess with either ThreadLocals or transaction enlistment/delistment. Furthermore, once you have your units of work in the same thread, if you are smart you might be able to apply the batching techniques above for better performance.
Lastly, piping work to worker threads does not mean that you must have a single worker thread, you could have a pool of workers and achieve some parallelism if it is really beneficial to your application. Think in terms of producers/consumers.

Related

Multi threaded transactional inserts with spring data jdbc

I'm using NamedParameterJdbcTemplate. I need to inserts data to 5 different tables within a transaction.
The sequential execution of inserts take long time & I need to optimize the time taken for inserts.
One possible option is make all inserts parallel using threads. As far as I understood transaction is not propagate to multi threads.
How can I improve time taken for this operation within a transaction boundary ?

I don't think what you are trying to do can possibly work.
As far as I know a database transaction is always bound to a single connection.
And the JDBC connection API is blocking, i.e. you can only execute a single statement at a time. So even when you share the Spring transaction across multiple threads you'll still execute your SQL sequential.
I therefore see the following options which might be combined available to you:
Tune your database/SQL: batched inserts, disabled constraints, adding or removing indexes and so one might have a effect on the execution time.
Drop the transactional constraint.
If you can break your process into multiple processes you might be able to run them in parallel and actually gaining performance.
Tune/parallelise the part happening in your Java application so you can do other stuff while your SQL statements are running.
To decide which approach is most promising we'd need to know more about your actual scenario.

Singleton or Connection pool for high perfs?

Context
I have a RESTful API for a versus fighting game, using JAX-RS, tomcat8 and Neo4j embedded.
Today I figured that a lot of queries will be done in a limited time, I'm using embedded for faster queries but I still want to go as fast as possible.
Problem
In fact, the problem is a bit different but not that much.
Actually, I'm using a Singleton with a getDabatase() method returning the current GraphDatabaseServiceinstance to begin a transaction, once it's done, the transaction is closed... and that's all.
I don't know if the best solution for optimal perfs is a Singleton pattern or a pool one (like creating XX instances of database connection, and reuse them when the database operation is finished).
I can't test it myself actually, because I don't have enough connections to even know which one is the fastest (and the best overall).
Also, I wonder if I create a pool of GraphDatabaseService instances, will they all be able to access the same datas without getting blocked by the lock?

Crate only one on GraphDatabaseService instance and use it everywhere. There are no need to create instance pool for them. GraphDatabaseService is completely thread-safe, so you can not worry about concurrency (note: transaction are thread-bound, so you can't run multiple transactions in same thread).
All operations in Neo4j should be executed in Transaction. On commit transaction is written in transaction log, and then persisted into database. General rules are:
Always close transaction as early as possible (use try-with-resource)
Close all resources as early as possible (ResourceIterator returned by findNodes() and execute())
Here you can find information about locking strategy.
To be sure that you have best performance, you should:
Check database settings (memory mapping)
Check OS settings (file system)
Check JVM settings (GC, heap size)
Data model
Here you can find some articles about Neo4j configuration & optimizations. All of them have useful information.

Use a pool - definitely.
Creating a database connection is generally very expensive. Using a pool will ensure that connections are kept for a reasonable mount of time and re-used whenever possible.

Using Hibernate and JDBC together from different threads

I want to use Spring-Hibernate and JDBC together in my application.
Hibernate should do all the updating and writing from one thread and other threads should just be able to read from the database without too much synchronization effort.
Will those JDBC-using threads deliver correct results (if they read from the database a short time after calling persist() or merge()) or could it happen, that Hibernate
has not flushed any updates and therefore other threads return wrong database entries?

"Wrong" depends on the isolation level you set for your connection pool.
I think it can work if Hibernate and Spring share the same connection pool and you set the isolation level to SERIALIZABLE for all connections.
Long-running transactions will be the problem. If all your write operations are fast you won't block. If you don't commit and flush updates quickly the read operations will either have to block and wait OR allow "dirty reads".

That depends. You're basically describing a race condition - if you want to make sure that your read-thread only reads after the write-thread has persisted, you will have to look into thread synchronization methodology.
Cheers,

Is making a method synchronized will ensure that it is thread safe?

I have a method in which some database insert operations are happening using hibernate and i want them to be thread safe. The method is getting some data in parametres and its a possiblity that sometimes two calls are made with same data at same point of time.
I can't lock those tables because of performance degradation. Can anyone suggest making the method as synchronized will solve issue?

Synchronizing a method will ensure that it can only be accessed by one thread at a time. If this method is your only means of writing to the database, then yes, this will stop two threads from writing at the same time. However, you still have to deal with the fact that you have multiple insert operations with the same data.

You should let Hibernate handle the concurrency, that's what it is meant to do. Don't assume Hibernate will lock anything: it supports optimistic transactions for exactly this purpose. Quote from the above link:
The only approach that is consistent with high concurrency and high scalability, is optimistic concurrency control with versioning. Version checking uses version numbers, or timestamps, to detect conflicting updates and to prevent lost updates. Hibernate provides three possible approaches to writing application code that uses optimistic concurrency.

Database Concurrency is handled by transactions. Transactions have the Atomic Consistent Isolated Durable (ACID) properties. They provide isolation between programs accessing a database concurrently. In the Hibernate DAO template of spring framework there are single line methods for CRUD operations on the database. When used individually these don't need to be synchronized by method. Spring provides declarative (XML), programmatic and annotation meta-data driven transaction management if you need to declare "your method" as transactional with specific propagation settings, rollbackFor settings, isolation settings. So in "your method" you can do multiple save,update,deletes etc and the ORM will ensure that it is executed with the transaction settings you have given in the meta-data.
Another issue is that the thread has to have the lock on all the objects that are taking part in the transaction.Otherwise the transaction might fail or the ORM will persist stale data. In another situation it can result in a deadlock because of lock-ordering. I think this is what really answers your question.
Both objects a and b have an instance variable of the type Lock. A boolean flag can be used to indicate the success of the transaction. The client code can retry the same transaction if it fails.
if (a.lock.tryLock()) {
try {
if (b.lock.tryLock()) {
try {
// persist or update object a and b
} finally {
b.lock.unlock();
}
}
} finally {
a.lock.unlock();
}
}
The problem with using synchronized methods is that it locks up the entire Service or DAO class making other service methods unavailable to other threads. By using individual locks on objects we can gain the advantage of fine grained concurrency.

No. This method probably uses another methods and objects, which may be not thread safe. synchronized makes threads to use that's method's object monitor only once at a time, so it makes thread-safe a method with respect to the object.
If you are sure that all other threads use shared functionality only with this method, then making it synchronized may be sufficient.

Choosing the best strategy depends on the architecture, sometimes to increase performance seems to be easier to use the trick like method synchronization, but this is bad approach.
There's no doubts, you should use transactions, and if with that strategy you're facing performance issues you should optimize your db queries or db structure.
Please remember that "Synchronization" should be as much as possible atomic.

Using XA with databases that don't support it natively?

Is there an open-source Java library that adds XA support to databases that don't support it natively? That is, it wraps a non-XA JDBC datasource and takes care of the necessary commits/rollbacks behind the scenes for 2-phase commits?

No, because it's impossible.
Let's review what XA is designed to achieve. It's a consensus protocol for guaranteeing ACID properties on transactions that span multiple resource managers. To do that it utilises a two phase commit protocol: the transaction manager prepares each resource manager, then commits each of them.
For the protocol to function correctly, the resource manager e.g. database, must make certain guarantees at the prepare stage. These include a) not making any changes visible to other processes until the commit phase ('Isolation'), b) ensuring it can perform the update at commit time if required, even if it crashes between prepare and commit ('Durability') and c) ensuring that data manipulated in different transactions exhibits the promised consistency properties. Realistically the only way to implement that is exclusive locking. Even resource managers e.g. pgsql and oracle, that use MVCC or other techniques during most operations will take exclusive locks at prepare.
Without access to the db internals, you can't acquire locks and hold them across connections. Hence you can't write code that can meet the transactional requirements. So, no layering of XA on top of a database engine - it has to be baked in.
However...
You can fake some aspects of the XA behaviour. Depending on your exact application requirements this may allow a useful solution to be crafted.
First up, you can use Last Resource Optimization (aka Last Resource Commit Optimization or Last Resource Gambit) to enlist a single non-XA i.e. one phase resource into a XA transaction with one or more real XA resources. By ordering the one phase resource last in the processing order you can achieve something that behaves like XA for most scenarios. It breaks horribly if a crash occurs at certain points in the execution, so you have to custom write data reconciliation code or rely on a human to handle that contingency. Depending on the semantics of your data that may or may not be an attractive option.
Next up, you can implement a custom driver that operates much like semantic replication. It records the sequence of SQL operations to a log at prepare time, but does not actually apply them to the db until the commit phase. This works for transactional updates that are isolated at the application level, but won't work if you're relying on the db to do concurrency control for you. For example, you may find the commit fails because something else snuck in a conflicting update between the prepare and commit phases. You could use an external lock manager, but only if your custom driver is the only thing talking to the db. As soon as a client that is not aware of that lock manager comes along all bets are off.
Finally, you can invert that model and use compensation based transactions under XA. In this model you apply the updates at prepare time and apply additional operations to reverse their effect in the rollback phase if needed. This has two drawbacks: concurrent operations may read and operate on the prematurely committed values of a tx that later rolls back, as there is no isolation between the prepare and commit; also depending on the business logic it's not easy to generate suitable compensation statements. Even if you can, you need quite a lot of complex plumbing to ensure they are run properly even in crash scenarios.
Realistically you're probably limited to LRCO, which is supported out of the box by most transaction managers. The other options require substantial transactions expertise to get right and the dev/test overhead usually isn't justified. If LRCO won't work for you then frankly it's going to be easier to redesign your app to avoid the need for XA.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.