I am working on building a microservice which is using transaction manager implemented based on Java Transaction API(JTA).
My question is does Trasaction maanger have ability to handle concurrency issue in distributed database scenario's .
Scenario:
Assume there are multiple instance of a service running and we get two requests to update balance amount by 10 in an account. Initially an account can have $100 and the first instance gets that and increments it to $10 but has not been commited yet.
At the same time the second instance also retreive's account which is still 100 and increments it by $10 and then commits it updating balance to $110 and then service one updates account again to $110.
By this time you must have figured that balance was supposed to be incremented by $20 and not 10. Do I have to write some kind of Optimistic lock exception mechanism to prevent the above scenario or will Transaction Manager based on JTA specification already ensure such a thing will not happen ?
does Trasaction maanger have ability to handle concurrency issue in distributed database scenario's .
Transactions and concurrency are two independent concepts and though Transactions become most siginificant in context where we also see concurrency , transactions can be important without concurrency.
To answer your question : No , Transaction Manager generally does not concern itself with handling issues that arise with concurrent updates. It takes a very naive and simple ( and often most meaningful ) approach : if after the start of a transaction , it detects that the state has become inconsistent ( because of concurrent updates ) it would simply raise it as an exception and Rollback the transaction. If only it can establish that all the conditions of the ACID properties of the transaction are still valid will it commit the transaction.
For such type of requests, you can handle through Optimistic Concurrency where you would have a column on the database (Timestamp) as a reference to the version number.
Each time when a change is commited it would modify the timestamp value.
If two requests try to commit the change at the same time, only one of them will succeed as the version (Timestamp) column will change by then negating other request from comitting its changes.
The transaction manager (as implementation of the JTA specification) makes transparent a work above multiple resources. It ensures all the operations happens as a single unit of work. The "work above multiple resources" mean that that the application can insert data to database and meanwhile it sends a message to a JMS broker. Transaction manager guarantees ACID properties to be hold for this two operations. In simplistic form when the transaction finishes successfully the application developer can be sure both operation was processed. When some trouble happens is on the transaction manager to handle it - possibly throw an exception and rollback the data changes. Thus neither operation was processed.
It makes this transparent for the application developer who does not need to care to update first database and then JMS and checks if all data changes were really processed or a failure happens.
In general the JTA specification was not written with microservice architecture in mind. Now it really depends on your system design(!) But if I consider you have two microservices where each one has attached its own transaction manager then the transaction manager can't help you to sort out your concurrency issue. Transaction managers does not work (usually) in some synchronization. You don't work with multiple resources from one microservice (what is the usecase for the transaction manager) but with one resource from multiple microservices.
As there is the one resource it's the synchronization point for all you updates. It depends on it how it manages concurrency. Considering it's a SQL database then it depends on the level of the isolation it uses (ACID - I = isolation, see https://en.wikipedia.org/wiki/ACID_(computer_science)). Your particular example talks about lost update phenomena (https://vladmihalcea.com/a-beginners-guide-to-database-locking-and-the-lost-update-phenomena/). As both microservices tries to update one record. One solution for the avoiding the issue is using optimistic/pesimistic locking (you can implement it on your own by e.g. timestamps as stated above), the other is to use serializable isolation level in your database, or you can design your application for not reading and updating data based on what is read first time but change the sql query having the update atomic (or there are possibly other strategies how to work with your data model to achieve the desired outcome).
In summary - it depends on how your transaction manager is implemented, it can help you in a way but it's not its purpose. Your goal should be to check how the isolation level is set up at the shared storage and consider if your application needs to handle lost update phenomena at application level or your storage cang manage it for you.
Related
I was asked a use case in one of the interviews where the interviewer asked:
Suppose you have a bank account and you are doing online shopping
Your brother has your ATM card and he is about to make a transaction
Your father has gone to give a withdrawal cheque in the bank. All
these 3 transactions happen at the same time.
How will these transactions be managed such that the balance is not overdrawn?
Me: I said I would use synchronisation on the account balance object
Interviewer : Not satisfied. Next question.
Can someone please explain what could have been the answer? Would a database lock or transaction isolation be a better approach?
I am a beginner in JAVA so forgive my naivety.
Concurrency control is a database management systems (DBMS) concept
that is used to address conflicts with the simultaneous accessing or
altering of data that can occur with a multi-user system. concurrency
control, when applied to a DBMS, is meant to coordinate simultaneous
transactions while preserving data integrity. 1 The Concurrency is
about to control the multi-user access of Database
Basic timestamping is a concurrency control mechanism that eliminates deadlock. This method doesn’t use locks to control concurrency, so it is impossible for deadlock to occur. According to this method a unique timestamp is assigned to each transaction, usually showing when it was started. This effectively allows an age to be assigned to transactions and an order to be assigned. Data items have both a read-timestamp and a write-timestamp. These timestamps are updated each time the data item is read or updated respectively.
Problems arise in this system when a transaction tries to read a data item which has been written by a younger transaction. This is called a late read. This means that the data item has changed since the initial transaction start time and the solution is to roll back the timestamp and acquire a new one. Another problem occurs when a transaction tries to write a data item which has been read by a younger transaction. This is called a late write. This means that the data item has been read by another transaction since the start time of the transaction that is altering it. The solution for this problem is the same as for the late read problem. The timestamp must be rolled back and a new one acquired.
Adhering to the rules of the basic timestamping process allows the transactions to be serialized and a chronological schedule of transactions can then be created. Timestamping may not be practical in the case of larger databases with high levels of transactions. A large amount of storage space would have to be dedicated to storing the timestamps in these cases.
Source:-http://databasemanagement.wikia.com/wiki/Concurrency_Control
Normally, when you start a project there are some kind of behaviour that must be handled. Transaction isolation is one of these, so I would evaluate the most appropriate transaction level in this case. To know more, I would recommend you to read about view serialization. This is one of the way of doing this.
On a Java EE server using CMT, I am using ehcache to implement a caching layer between the business object layer (EJBs) and the Data Access layer (POJOs using JDBC). I seem to be experiencing a race condition between two threads accessing the same record while using a self-populating Ehcache. The cache is keyed on the primary key of the record.
The scenario is:
The first thread updates the record in the database and removes the record from cache (but the database commit doesn't necessarily happen immediately - there may be other queries to follow.)
The second thread reads the record, causing the cache to be re-populated.
The first thread commits transaction.
This is all happening in a fraction of a second. It results in the cache being out of sync with the database, and subsequent reads of the record returning the stale cached data until another update is performed, or the entry expires from the cache. I can handle stale data for short periods (the typical length of a transaction), but not minutes, which is how long I would like to cache objects.
Any suggestions for avoiding this race condition?
UPDATE:
Clearing the cache after the transaction has committed would certainly be ideal. The question is, in a J2EE environment using CMT, when the caching layer is sandwiched between the business layer (stateless session EJBs) and the data access layer, how to do this?
To be clear about the constraints this imposes, the method call in question may or may not be in the same transaction as additional method calls that happen before or after. I can't force a commit (or do this work in a separate transaction) since that would change the transaction boundaries from what the client code expects. Any subsequent exceptions would not roll back the entire transaction (unneseccarily clearing the cache in this case is an acceptable side-effect). I can't control the entry points into the transaction, as it is essentially an API that clients can use. It is not reasonable to push the resonsiblity of clearing the cache to the client application.
I would like to be able to defer any cache clearing operations until the entire transaction is committed by the EJB container, but I have found no way to hook into that logic and run my own code with a stateless session bean.
UPDATE #2:
The most promising solution so far, short of a major design change, is to use ehcache 2.0's JTA support: http://ehcache.org/documentation/apis/jta
This means upgrading to ehcache 2.x and enabling XA transactions for the database as well, which could potentially have negative side-effects. But it seems like the "right" way.
You are using transactions - it makes more sense to remove the cache after the commit, that is when the change really happens.
That way you see the old data only during the length of the transaction, and all reads afterwards have the latest view.
Update: Since this is CMT specific, you should look at the SessionSynchronization interface, and it's afterCompletion() method. This is showed in this tutorial.
I want to move records from one database to another which are on different machines. the records should be removed from first database and inserted to second database atomically.
can we use xa ?
i believe xa uses 2 phase commit algorithm which requires the blocking locks on the resources
the target database is a EIS database, so it should be locked for minimum time.
XA is indeed a 2 phase commit blocking protocol, but in my case there are only two entities involed with the first entity being very fast. so 2PC will work efficiently for me.
for a more general scenario 3 phase commit can be used. it's a non-blocking protocol. though dont' seems to have any java specifications.
also came across BTP and http://jotm.objectweb.org/jotm-btp.html
not sure how easily it can fused with JDBC adapter.
XA doesn't have any incidence on the locking mechanism. It just makes sure that ACIDity is preserved even if you update two separate transactional resources. Your usecase only updates one, if I understand correctly, so XA is not necessary here.
Is there an open-source Java library that adds XA support to databases that don't support it natively? That is, it wraps a non-XA JDBC datasource and takes care of the necessary commits/rollbacks behind the scenes for 2-phase commits?
No, because it's impossible.
Let's review what XA is designed to achieve. It's a consensus protocol for guaranteeing ACID properties on transactions that span multiple resource managers. To do that it utilises a two phase commit protocol: the transaction manager prepares each resource manager, then commits each of them.
For the protocol to function correctly, the resource manager e.g. database, must make certain guarantees at the prepare stage. These include a) not making any changes visible to other processes until the commit phase ('Isolation'), b) ensuring it can perform the update at commit time if required, even if it crashes between prepare and commit ('Durability') and c) ensuring that data manipulated in different transactions exhibits the promised consistency properties. Realistically the only way to implement that is exclusive locking. Even resource managers e.g. pgsql and oracle, that use MVCC or other techniques during most operations will take exclusive locks at prepare.
Without access to the db internals, you can't acquire locks and hold them across connections. Hence you can't write code that can meet the transactional requirements. So, no layering of XA on top of a database engine - it has to be baked in.
However...
You can fake some aspects of the XA behaviour. Depending on your exact application requirements this may allow a useful solution to be crafted.
First up, you can use Last Resource Optimization (aka Last Resource Commit Optimization or Last Resource Gambit) to enlist a single non-XA i.e. one phase resource into a XA transaction with one or more real XA resources. By ordering the one phase resource last in the processing order you can achieve something that behaves like XA for most scenarios. It breaks horribly if a crash occurs at certain points in the execution, so you have to custom write data reconciliation code or rely on a human to handle that contingency. Depending on the semantics of your data that may or may not be an attractive option.
Next up, you can implement a custom driver that operates much like semantic replication. It records the sequence of SQL operations to a log at prepare time, but does not actually apply them to the db until the commit phase. This works for transactional updates that are isolated at the application level, but won't work if you're relying on the db to do concurrency control for you. For example, you may find the commit fails because something else snuck in a conflicting update between the prepare and commit phases. You could use an external lock manager, but only if your custom driver is the only thing talking to the db. As soon as a client that is not aware of that lock manager comes along all bets are off.
Finally, you can invert that model and use compensation based transactions under XA. In this model you apply the updates at prepare time and apply additional operations to reverse their effect in the rollback phase if needed. This has two drawbacks: concurrent operations may read and operate on the prematurely committed values of a tx that later rolls back, as there is no isolation between the prepare and commit; also depending on the business logic it's not easy to generate suitable compensation statements. Even if you can, you need quite a lot of complex plumbing to ensure they are run properly even in crash scenarios.
Realistically you're probably limited to LRCO, which is supported out of the box by most transaction managers. The other options require substantial transactions expertise to get right and the dev/test overhead usually isn't justified. If LRCO won't work for you then frankly it's going to be easier to redesign your app to avoid the need for XA.
I was going through ACID properties regarding Transaction and encountered the statement below across the different sites
ACID is the acronym for the four properties guaranteed by transactions: atomicity, consistency, isolation, and durability.
**My question is specifically about the phrase.
guaranteed by transactions
**. As per my experience these properties are not taken care by
transaction automatically. But as a java developer we need to ensure that these properties criteria are met.
Let's go through for each property:-
Atomicity:- Assume when we create the customer the account should be created too as it is compulsory. So now during transaction
the customer gets created while during account creation some exception oocurs. So the developer can now go two ways: either he rolls back the
complete transaction (atomicity is met in this case) or he commits the transaction so customer will be created but not the
account (which violates the atomicity). So responsibility lies with developer?
Consistency:- Same reason holds valid for consistency too
Isolation :- as per definition isolation makes a transaction execute without interference from another process or transactions.
But this is achieved when we set the isolation level as Serializable. Otherwis in another case like read commited or read uncommited
changes are visible to other transactions. So responsibility lies with the developer to make it really isolated with Serializable?
Durability:- If we commit the transaction, then even if the application crashes, it should be committed on restart of application. Not sure if it needs to be taken care by developer or by database vendor/transaction?
So as per my understanding these ACID properties are not guaranteed automatically; rather we as a developer sjould achieve them. Please let me know
if above understanding regarding each point is correct? Would appreciate if you folks can reply for each point(yes/no will also do.
As per my understanding read committed should be most logical isolation level in most application, though it depends on requirement too.
The transactions guarantees ACID more or less:
1) Atomicity. Transaction guarantees all changes are made or none of them. But you need to manually set the start and end of a transaction and manually perform commit or rollback. Depending on the technology you use (EJB...), transactions are container-managed, setting the start and end to the whole "method" you are creating. You can control by configuration if a method invoked requires a new transaction or an existing one, no transaction...
2) Consistency. Guaranteed by atomicity.
3) Isolation. You must define the isolation level your application needs. Default value is defined depending upon the database, container... The commonest one is READ COMMITTED. Be careful with locks as can cause dead-lock depending on your logic and isolation level.
4) Durability. Managed entirely by the database. If your commit executes without error, nearly all database guarantees durability of changes, but some scenarios can cause to not guarantee that (writes to disk are cached in memory and flushed later...)
In general, you should be aware of transactions and configure it in the container of declare by code the star and end (commit, rollback).
Database transactions are atomic: They either happen in their entirety or not at all. By itself, this says nothing about the atomicity of business transactions. There are various strategies to map business transactions to database transactions. In the simplest case, a business transaction is implemented by one database transaction (where a business transaction is aborted by rolling back the database one). Then, atomicity of database transactions implies atomicity of business transactions. However, things get tricky once business transactions span several database transactions ...
See above.
Your statement is correct. Often, the weaker guarantees are sufficient to prove correctness.
Database transactions are durable (unless there is a hardware failure): if the transaction has committed, its effect will persist until other transactions change the data. However, calling code might not learn whether a transaction has comitted if the database or the network between database and calling code fails. Therefore
If we commit the transaction, then even if application crash, it should be committed on restart of application.
is wrong. If the transaction has committed, there is nothing left to do.
To summarize, the database does give strong guarantees - about the behaviour of the database. Obviously, it can not give guarantees about the behaviour of the entire application.