I want to move records from one database to another which are on different machines. the records should be removed from first database and inserted to second database atomically.
can we use xa ?
i believe xa uses 2 phase commit algorithm which requires the blocking locks on the resources
the target database is a EIS database, so it should be locked for minimum time.
XA is indeed a 2 phase commit blocking protocol, but in my case there are only two entities involed with the first entity being very fast. so 2PC will work efficiently for me.
for a more general scenario 3 phase commit can be used. it's a non-blocking protocol. though dont' seems to have any java specifications.
also came across BTP and http://jotm.objectweb.org/jotm-btp.html
not sure how easily it can fused with JDBC adapter.
XA doesn't have any incidence on the locking mechanism. It just makes sure that ACIDity is preserved even if you update two separate transactional resources. Your usecase only updates one, if I understand correctly, so XA is not necessary here.
Related
I'm using NamedParameterJdbcTemplate. I need to inserts data to 5 different tables within a transaction.
The sequential execution of inserts take long time & I need to optimize the time taken for inserts.
One possible option is make all inserts parallel using threads. As far as I understood transaction is not propagate to multi threads.
How can I improve time taken for this operation within a transaction boundary ?
I don't think what you are trying to do can possibly work.
As far as I know a database transaction is always bound to a single connection.
And the JDBC connection API is blocking, i.e. you can only execute a single statement at a time. So even when you share the Spring transaction across multiple threads you'll still execute your SQL sequential.
I therefore see the following options which might be combined available to you:
Tune your database/SQL: batched inserts, disabled constraints, adding or removing indexes and so one might have a effect on the execution time.
Drop the transactional constraint.
If you can break your process into multiple processes you might be able to run them in parallel and actually gaining performance.
Tune/parallelise the part happening in your Java application so you can do other stuff while your SQL statements are running.
To decide which approach is most promising we'd need to know more about your actual scenario.
Example Scenario:
Using a threadpool in java where each thread gets a new connection from the connectionpool and then all threads proceed to do some db transaction in parallel. For example inserting 100 values into the same table.
Will this somehow mess with the table/database or is it entirely safe without any kind of synchronization required between the threads?
I find it hard to find reliable information about this subject. From what I gather DB engines handle this on their own/if at all (PostgresQL apparently since version 9.X). Are there any well written articles explaining this further?
Bonus question: Is there even a point to make use of parallel transactions when the DB runs on a single hdd?
As long as the database itself is conforming to ACID you are fine (although every now and then someone finds a bug in some really strange situation).
To the bonus question: for PostgreSQL it totally does make sense as long as you have some time for collecting concurrent transactions (increase value for commit_delay), which then can help combining disk I/O's into batches. There are also other parameters for transaction throughput tuning, most of which can be pretty dangerous if Durability is one of your major concerns.
Also, please keep in mind that the database client also needs to do some work between database calls which, when executed sequentially, will just add idle time for the database. So even here, parallelism helps (as long as you have actual resources for it (CPU, ...).
I am working on building a microservice which is using transaction manager implemented based on Java Transaction API(JTA).
My question is does Trasaction maanger have ability to handle concurrency issue in distributed database scenario's .
Scenario:
Assume there are multiple instance of a service running and we get two requests to update balance amount by 10 in an account. Initially an account can have $100 and the first instance gets that and increments it to $10 but has not been commited yet.
At the same time the second instance also retreive's account which is still 100 and increments it by $10 and then commits it updating balance to $110 and then service one updates account again to $110.
By this time you must have figured that balance was supposed to be incremented by $20 and not 10. Do I have to write some kind of Optimistic lock exception mechanism to prevent the above scenario or will Transaction Manager based on JTA specification already ensure such a thing will not happen ?
does Trasaction maanger have ability to handle concurrency issue in distributed database scenario's .
Transactions and concurrency are two independent concepts and though Transactions become most siginificant in context where we also see concurrency , transactions can be important without concurrency.
To answer your question : No , Transaction Manager generally does not concern itself with handling issues that arise with concurrent updates. It takes a very naive and simple ( and often most meaningful ) approach : if after the start of a transaction , it detects that the state has become inconsistent ( because of concurrent updates ) it would simply raise it as an exception and Rollback the transaction. If only it can establish that all the conditions of the ACID properties of the transaction are still valid will it commit the transaction.
For such type of requests, you can handle through Optimistic Concurrency where you would have a column on the database (Timestamp) as a reference to the version number.
Each time when a change is commited it would modify the timestamp value.
If two requests try to commit the change at the same time, only one of them will succeed as the version (Timestamp) column will change by then negating other request from comitting its changes.
The transaction manager (as implementation of the JTA specification) makes transparent a work above multiple resources. It ensures all the operations happens as a single unit of work. The "work above multiple resources" mean that that the application can insert data to database and meanwhile it sends a message to a JMS broker. Transaction manager guarantees ACID properties to be hold for this two operations. In simplistic form when the transaction finishes successfully the application developer can be sure both operation was processed. When some trouble happens is on the transaction manager to handle it - possibly throw an exception and rollback the data changes. Thus neither operation was processed.
It makes this transparent for the application developer who does not need to care to update first database and then JMS and checks if all data changes were really processed or a failure happens.
In general the JTA specification was not written with microservice architecture in mind. Now it really depends on your system design(!) But if I consider you have two microservices where each one has attached its own transaction manager then the transaction manager can't help you to sort out your concurrency issue. Transaction managers does not work (usually) in some synchronization. You don't work with multiple resources from one microservice (what is the usecase for the transaction manager) but with one resource from multiple microservices.
As there is the one resource it's the synchronization point for all you updates. It depends on it how it manages concurrency. Considering it's a SQL database then it depends on the level of the isolation it uses (ACID - I = isolation, see https://en.wikipedia.org/wiki/ACID_(computer_science)). Your particular example talks about lost update phenomena (https://vladmihalcea.com/a-beginners-guide-to-database-locking-and-the-lost-update-phenomena/). As both microservices tries to update one record. One solution for the avoiding the issue is using optimistic/pesimistic locking (you can implement it on your own by e.g. timestamps as stated above), the other is to use serializable isolation level in your database, or you can design your application for not reading and updating data based on what is read first time but change the sql query having the update atomic (or there are possibly other strategies how to work with your data model to achieve the desired outcome).
In summary - it depends on how your transaction manager is implemented, it can help you in a way but it's not its purpose. Your goal should be to check how the isolation level is set up at the shared storage and consider if your application needs to handle lost update phenomena at application level or your storage cang manage it for you.
As I understand it, all transactions are Thread-bound (i.e. with the context stored in ThreadLocal). For example if:
I start a transaction in a transactional parent method
Make database insert #1 in an asynchronous call
Make database insert #2 in another asynchronous call
Then that will yield two different transactions (one for each insert) even though they shared the same "transactional" parent.
For example, let's say I perform two inserts (and using a very simple sample, i.e. not using an executor or completable future for brevity, etc.):
#Transactional
public void addInTransactionWithAnnotation() {
addNewRow();
addNewRow();
}
Will perform both inserts, as desired, as part of the same transaction.
However, if I wanted to parallelize those inserts for performance:
#Transactional
public void addInTransactionWithAnnotation() {
new Thread(this::addNewRow).start();
new Thread(this::addNewRow).start();
}
Then each one of those spawned threads will not participate in the transaction at all because transactions are Thread-bound.
Key Question: Is there a way to safely propagate the transaction to the child threads?
The only solutions I've thought of to solve this problem:
Use JTA or some XA manager, which by definition should be able to do
this. However, I ideally don't want to use XA for my solution
because of it's overhead
Pipe all of the transactional work I want performed (in the above example, the addNewRow() function) to a single thread, and do all of the prior work in the multithreaded fashion.
Figuring out some way to leverage InheritableThreadLocal on the Transaction status and propagate it to the child threads. I'm not sure how to do this.
Are there any more solutions possible? Even if it's tastes a little bit of like a workaround (like my solutions above)?
The JTA API has several methods that operate implicitly on the current Thread's Transaction, but it doesn't prevent you moving or copying a Transaction between Threads, or performing certain operations on a Transaction that's not bound to the current (or any other) Thread. This causes no end of headaches, but it's not the worst part...
For raw JDBC, you don't have a JTA Transaction at all. You have a JDBC Connection, which has its own ideas about transaction context. In which case, the transaction is Connection bound, not thread bound. Pass the Connection around and the tx goes with it. But Connections aren't necessarily threadsafe and are probably a performance bottleneck anyhow, so sharing one between multiple concurrent threads doesn't really help you. You likely need multiple Connections that think they are in the same Transaction, which means you need XA, since that's how the db identifies such cases. At which point you're back to JTA, but now with a JCA in the picture to handle the Connection management properly. In short, you've reinvented the JavaEE application server.
For frameworks that layer on JDBC e.g. ORMs like Hibernate, you have an additional complication: their abstractions are not necessarily threadsafe. So you can't have a Session that is bound to multiple Threads concurrently. But you can have multiple concurrent Sessions that each participate in the same XA transaction.
As usual it boils down to Amdahl's law. If the speedup you get from using multiple Connections per tx to allow for multiple concurrent Threads to share the db I/O work is large relative to what you get from batching, then the overhead of XA is worthwhile. If the speedup is in local computation and the db I/O is a minor concern, then a single Thread that handles the JDBC Connection and offloads non-IO computation work to a Thread pool is the way to go.
First, a clarification: if you want to speed up several inserts of the same kind, as your example suggests, you will probably get the best performance by issuing the inserts in the same thread and using some type of batch inserting. Depending on your DBMS there are several techniques available, look at:
Efficient way to do batch INSERTS with JDBC
What's the fastest way to do a bulk insert into Postgres?
As for your actual question, I would personally try to pipe all the work to a worker thread. It is the simplest option as you don't need to mess with either ThreadLocals or transaction enlistment/delistment. Furthermore, once you have your units of work in the same thread, if you are smart you might be able to apply the batching techniques above for better performance.
Lastly, piping work to worker threads does not mean that you must have a single worker thread, you could have a pool of workers and achieve some parallelism if it is really beneficial to your application. Think in terms of producers/consumers.
Is there an open-source Java library that adds XA support to databases that don't support it natively? That is, it wraps a non-XA JDBC datasource and takes care of the necessary commits/rollbacks behind the scenes for 2-phase commits?
No, because it's impossible.
Let's review what XA is designed to achieve. It's a consensus protocol for guaranteeing ACID properties on transactions that span multiple resource managers. To do that it utilises a two phase commit protocol: the transaction manager prepares each resource manager, then commits each of them.
For the protocol to function correctly, the resource manager e.g. database, must make certain guarantees at the prepare stage. These include a) not making any changes visible to other processes until the commit phase ('Isolation'), b) ensuring it can perform the update at commit time if required, even if it crashes between prepare and commit ('Durability') and c) ensuring that data manipulated in different transactions exhibits the promised consistency properties. Realistically the only way to implement that is exclusive locking. Even resource managers e.g. pgsql and oracle, that use MVCC or other techniques during most operations will take exclusive locks at prepare.
Without access to the db internals, you can't acquire locks and hold them across connections. Hence you can't write code that can meet the transactional requirements. So, no layering of XA on top of a database engine - it has to be baked in.
However...
You can fake some aspects of the XA behaviour. Depending on your exact application requirements this may allow a useful solution to be crafted.
First up, you can use Last Resource Optimization (aka Last Resource Commit Optimization or Last Resource Gambit) to enlist a single non-XA i.e. one phase resource into a XA transaction with one or more real XA resources. By ordering the one phase resource last in the processing order you can achieve something that behaves like XA for most scenarios. It breaks horribly if a crash occurs at certain points in the execution, so you have to custom write data reconciliation code or rely on a human to handle that contingency. Depending on the semantics of your data that may or may not be an attractive option.
Next up, you can implement a custom driver that operates much like semantic replication. It records the sequence of SQL operations to a log at prepare time, but does not actually apply them to the db until the commit phase. This works for transactional updates that are isolated at the application level, but won't work if you're relying on the db to do concurrency control for you. For example, you may find the commit fails because something else snuck in a conflicting update between the prepare and commit phases. You could use an external lock manager, but only if your custom driver is the only thing talking to the db. As soon as a client that is not aware of that lock manager comes along all bets are off.
Finally, you can invert that model and use compensation based transactions under XA. In this model you apply the updates at prepare time and apply additional operations to reverse their effect in the rollback phase if needed. This has two drawbacks: concurrent operations may read and operate on the prematurely committed values of a tx that later rolls back, as there is no isolation between the prepare and commit; also depending on the business logic it's not easy to generate suitable compensation statements. Even if you can, you need quite a lot of complex plumbing to ensure they are run properly even in crash scenarios.
Realistically you're probably limited to LRCO, which is supported out of the box by most transaction managers. The other options require substantial transactions expertise to get right and the dev/test overhead usually isn't justified. If LRCO won't work for you then frankly it's going to be easier to redesign your app to avoid the need for XA.