What exactly is meant by Spring transactions being atomic? - java

My understanding of an atomic operation is that it should not be possible for the steps of the operation to be interleaved with those of any other operation - that it should be executed as a single unit.
I have a method for creating a database record that will first of all check if a record with the same value, which also satisfies certain other parameters, already exists, and if so will not create the record.
In fakecode:
public class FooDao implements IFooDao {
#Transactional
public void createFoo(String fooValue) {
if (!fooExists(fooValue)) {
// DB call to create foo
}
}
#Transactional
public boolean fooExists(String fooValue) {
// DB call to check if foo exists
}
}
However I have seen that it is possible for two records with the same value to be created, suggesting that these operations have interleaved in some way. I am aware that with Spring's transactional proxies, self-invocation of a method within an object will not use the transactional logic, but if createFoo() is called from outside the object then I would expect fooExists() to still be included in the same transaction.
Are my expectations as to what transactional atomicity should enforce wrong? Do I need to be using a synchronized block to enforce this?

What a transaction really mean for the database depends on the isolation level. The wikipdia article on Isolation (database systems) explain it well.
Normally one use a not so high isolation level, for example: Read committed. This mean that one can read data from an other transaction not until the other transaction is committed.
In your case this is not enough, because this is the opposite from what you want. - So the obvious solution would be using a more restrictive and slower isolation level: Repeatable reads.
But to be honest, I would use an other way: Make the relevant column unique (but do not remove your if (!fooExists(fooValue))-check). So in 99% your check work. In the remaining 1% you will get an exception, because you try to violate the unique constraint.

Transactional means all updates occur within the same transaction, ie all updates/inserts/delete succeed or all are rolled back (for example if you update multiple tables).
It doesn't guarantee anything about the behaviour of queries within the transaction, which depend on the RDBMS and its configuration (configuration of the isolation level on the database).

#Transactional does not by default make the code synchronized. Two separate threads can enter the same block at the same time and cause inserts to occur. synchronizing the method isn't really a good answer either since that can drastically affect application performance. If your issue is that two identical records are being created by two different threads you may want to add some indexes with unique constraint on the database so that duplicate inserts will fail.

Related

How to implement race condition at database level with Spring and hibernate?

I have a bank project which customer balances should be updated by parallel threads in parallel applications. I hold customer balances in an Oracle database. My java applications will be implemented with Spring and Hibernate.
How can i implement the race condition between parallel applications? Should my solution be at database level or at application level?
I assume what you would like to know is how to handle concurrency, preventing race conditions which can occur where two parts of the application modify and accidentally overwrite the same data.
You have mostly two strategies for this: pessimistic locking and optimistic locking:
Pessimistic locking
here you assume that the likelyhood that two threads overwrite the same data is high, so you would like it to handle it in a transparent way. To handle this, increase the isolation level of your Spring transactions from it's default value of READ_COMMITTED to for example REPEATABLE_READ which should be sufficient in most cases:
#Transactional(isolation=Isolation.REPEATABLE_READ)
public void yourBusinessMethod {
...
}
In this case if you read some data in the beginning of the method, you are sure that noone can overwrite the data in the database while your method is ongoing. Note that it's still possible for another thread to insert extra records to a query you made (a problem known as phantom reads), but not change the records you already read.
If you want to protect against phantom reads, you need to upgrade the isolation level to SERIALIZABLE. The improved isolation comes at a performance cost, your program will run slower and will more frequently 'hang' waiting for the other part of the program to finish.
Optimistic Locking
Here you assume that data access colisions are rare, and that in the rare cases they occur they are easilly recoverable by the application. In this mode, you keep all your business methods in their default REPEATABLE_READ mode.
Then each Hibernate entity is marked with a version column:
#Entity
public SomeEntity {
...
#Version
private Long version;
}
With this each entity read from the database is versioned using the version column. When Hibernate write changes to an entity in the database, it will check if the version was incremented since the last time that transaction read the entity.
If so it means someone else modified the data, and decisions where made using stale data. In this case a StaleObjectException is thrown, that needs to be caught by the application and handled, ideally at a central place.
In the case of a GUI, you usuall catch the exception, show a message saying user xyz changed this data while you where also editing it, your changes are lost. Press Ok to reload the new data.
With optimistic locking your program will run faster but the applications needs to handle some concurrency aspects that would otherwise be transparent with pessimistic locking: version entities, catch exceptions.
The most frequently used method is optimistic locking, as it seems to be acceptable in most applications. With pessimistic locking it's very easy to cause performance problems, specially when data access colisions are rare and can be solved in a simple way.
There are no constraints to mix the use of the two concurrency handling methods in the same application if needed.

Is making a method synchronized will ensure that it is thread safe?

I have a method in which some database insert operations are happening using hibernate and i want them to be thread safe. The method is getting some data in parametres and its a possiblity that sometimes two calls are made with same data at same point of time.
I can't lock those tables because of performance degradation. Can anyone suggest making the method as synchronized will solve issue?
Synchronizing a method will ensure that it can only be accessed by one thread at a time. If this method is your only means of writing to the database, then yes, this will stop two threads from writing at the same time. However, you still have to deal with the fact that you have multiple insert operations with the same data.
You should let Hibernate handle the concurrency, that's what it is meant to do. Don't assume Hibernate will lock anything: it supports optimistic transactions for exactly this purpose. Quote from the above link:
The only approach that is consistent with high concurrency and high scalability, is optimistic concurrency control with versioning. Version checking uses version numbers, or timestamps, to detect conflicting updates and to prevent lost updates. Hibernate provides three possible approaches to writing application code that uses optimistic concurrency.
Database Concurrency is handled by transactions. Transactions have the Atomic Consistent Isolated Durable (ACID) properties. They provide isolation between programs accessing a database concurrently. In the Hibernate DAO template of spring framework there are single line methods for CRUD operations on the database. When used individually these don't need to be synchronized by method. Spring provides declarative (XML), programmatic and annotation meta-data driven transaction management if you need to declare "your method" as transactional with specific propagation settings, rollbackFor settings, isolation settings. So in "your method" you can do multiple save,update,deletes etc and the ORM will ensure that it is executed with the transaction settings you have given in the meta-data.
Another issue is that the thread has to have the lock on all the objects that are taking part in the transaction.Otherwise the transaction might fail or the ORM will persist stale data. In another situation it can result in a deadlock because of lock-ordering. I think this is what really answers your question.
Both objects a and b have an instance variable of the type Lock. A boolean flag can be used to indicate the success of the transaction. The client code can retry the same transaction if it fails.
if (a.lock.tryLock()) {
try {
if (b.lock.tryLock()) {
try {
// persist or update object a and b
} finally {
b.lock.unlock();
}
}
} finally {
a.lock.unlock();
}
}
The problem with using synchronized methods is that it locks up the entire Service or DAO class making other service methods unavailable to other threads. By using individual locks on objects we can gain the advantage of fine grained concurrency.
No. This method probably uses another methods and objects, which may be not thread safe. synchronized makes threads to use that's method's object monitor only once at a time, so it makes thread-safe a method with respect to the object.
If you are sure that all other threads use shared functionality only with this method, then making it synchronized may be sufficient.
Choosing the best strategy depends on the architecture, sometimes to increase performance seems to be easier to use the trick like method synchronization, but this is bad approach.
There's no doubts, you should use transactions, and if with that strategy you're facing performance issues you should optimize your db queries or db structure.
Please remember that "Synchronization" should be as much as possible atomic.

Java selective synchronization

I'm maintaining a very old application, and recently I came across a 'multi thread' bug.
Here, in a method, to insert a value into a db, the record is first checked that it exists or not, then if it does not exist, it is inserted into the db.
createSomething(params)
{
....
....
if( !isPresentInDb(params) )
{
.....
.....
.....
insertIntoDb(params)
}
...
}
Here when multiple threads invoke this method, two or more threads with same params may cross the isPresentInDb check, one thread inserts successfully, the other threads fail.
To solve this problem I enclosed both the db interactions into a single synchronized(this) block. But is there a better way of doing this?
Edit: it is more like selective synchronization, only threads with same params need to by synchronized. Is selective synchronization possible?
I'd say the better way to do this would be to let the database do it for you if at all possible. Assuming the row on the database that you are wanting to either update or insert has a unique constraint on it, then my usual approach would be
unconditionally insert the row
if an SQLException occurs, check to see if it is due to a duplicate key on insert error, if it is, do the update, otherwise rethrow the SQLException.
If you can wrap those statements in a database transaction, then you don't have to worry about two threads trampling on each other.
If the logic is really "create this if it doesn't already exist", it could be better still to push the logic down into the database. For example, MySQL has "INSERT IGNORE" syntax that will cause it to ignore the insert if it would violate a primary key constraint. It may not be possible for your code, but worth considering.
This way of doing would only work if this object instance is the only one which inserts something in the table. If it's not, then two threads will synchronize on two different objects, and the synchronization won't work. To make it short : the object should be a singleton, and no other object should insert into this table.
Even if there is a unique object instance inserting, if you have any other application, or any other JVM, inserting in this table, then the synchronization won't bring you any guarantee.
Doing this is better than nothing, but doesn't guarantee that the insert will always succeed. If it doesn't, then the transaction will rollback due (hopefully) to a constraint violation. If you don't have any unique constraint to guarantee the uniqueness in the database, and you have several applications inserting in parallel, then you can't do anything to avoid duplicates.
Since you only want to forbid this method from running with the same params, you can use a ConcurrentMap instead and then call putIfAbsent and check its return value before proceeding. This will allow you to run the method concurrently for different arguments.
Looks fine to me. You can use some of the java.util.concurrent aids, like a ReentrantLock.
It will be better to utilize some sort of optimistic transactions: try to insert, and catch an exception. If the records has just been inserted, simply do nothing.
In one word NO. there is not better way than this. Since to make check-then-update kind of operations atomic you must have to put the logic inside a synchronized block.
You could make the whole method synchronized. I tend to find that a good marker for "this method only gets run by one thread at a time". That's my personal preference though.
The downside of too coarse-grained locking is performance degradation. If the method is called often, it will become a performance bottleneck. Here are two other approaches:
Move your concurrent code into your database statement, if possible.
Use a non-blocking data structure such as ConcurrentMap and maintain a list of known entries (must be warmed up on startup). This allows you two run the method with minimal locking, and without synchronizing the code. An atomic putIfAbsent() can be used to check if it must be added or not.
As others have stated your current approach is fine. Although depending on your requirements there are other things to consider
Is this the only place in your application where you insert these records into the db? If no then the insert could still fail even with synchronisation
How often does theoperation fail? If the number of times the operations fail compared to the number of times you run the method it may be beneficial to detect the failure by catching an appropriate exception. This may be beneficial due to the overhead involved in synchronising threads.
What does your application need to do when it detects this kind of failure?
On first sight your solution seems ok, but if you want to change it here are two options:
use db transactions
use locks from java.util.concurrent.locks
Lock lock = new ReentrantLock();
.....
createSomething(params)
{
....
....
try {
lock.lock();
if( !isPresentInDb(params) )
{
.....
.....
.....
insertIntoDb(params)
}
finally {
lock.unlock;
}
}

Where does the responsibility lie to ensure the ACID properties of a transaction?

I was going through ACID properties regarding Transaction and encountered the statement below across the different sites
ACID is the acronym for the four properties guaranteed by transactions: atomicity, consistency, isolation, and durability.
**My question is specifically about the phrase.
guaranteed by transactions
**. As per my experience these properties are not taken care by
transaction automatically. But as a java developer we need to ensure that these properties criteria are met.
Let's go through for each property:-
Atomicity:- Assume when we create the customer the account should be created too as it is compulsory. So now during transaction
the customer gets created while during account creation some exception oocurs. So the developer can now go two ways: either he rolls back the
complete transaction (atomicity is met in this case) or he commits the transaction so customer will be created but not the
account (which violates the atomicity). So responsibility lies with developer?
Consistency:- Same reason holds valid for consistency too
Isolation :- as per definition isolation makes a transaction execute without interference from another process or transactions.
But this is achieved when we set the isolation level as Serializable. Otherwis in another case like read commited or read uncommited
changes are visible to other transactions. So responsibility lies with the developer to make it really isolated with Serializable?
Durability:- If we commit the transaction, then even if the application crashes, it should be committed on restart of application. Not sure if it needs to be taken care by developer or by database vendor/transaction?
So as per my understanding these ACID properties are not guaranteed automatically; rather we as a developer sjould achieve them. Please let me know
if above understanding regarding each point is correct? Would appreciate if you folks can reply for each point(yes/no will also do.
As per my understanding read committed should be most logical isolation level in most application, though it depends on requirement too.
The transactions guarantees ACID more or less:
1) Atomicity. Transaction guarantees all changes are made or none of them. But you need to manually set the start and end of a transaction and manually perform commit or rollback. Depending on the technology you use (EJB...), transactions are container-managed, setting the start and end to the whole "method" you are creating. You can control by configuration if a method invoked requires a new transaction or an existing one, no transaction...
2) Consistency. Guaranteed by atomicity.
3) Isolation. You must define the isolation level your application needs. Default value is defined depending upon the database, container... The commonest one is READ COMMITTED. Be careful with locks as can cause dead-lock depending on your logic and isolation level.
4) Durability. Managed entirely by the database. If your commit executes without error, nearly all database guarantees durability of changes, but some scenarios can cause to not guarantee that (writes to disk are cached in memory and flushed later...)
In general, you should be aware of transactions and configure it in the container of declare by code the star and end (commit, rollback).
Database transactions are atomic: They either happen in their entirety or not at all. By itself, this says nothing about the atomicity of business transactions. There are various strategies to map business transactions to database transactions. In the simplest case, a business transaction is implemented by one database transaction (where a business transaction is aborted by rolling back the database one). Then, atomicity of database transactions implies atomicity of business transactions. However, things get tricky once business transactions span several database transactions ...
See above.
Your statement is correct. Often, the weaker guarantees are sufficient to prove correctness.
Database transactions are durable (unless there is a hardware failure): if the transaction has committed, its effect will persist until other transactions change the data. However, calling code might not learn whether a transaction has comitted if the database or the network between database and calling code fails. Therefore
If we commit the transaction, then even if application crash, it should be committed on restart of application.
is wrong. If the transaction has committed, there is nothing left to do.
To summarize, the database does give strong guarantees - about the behaviour of the database. Obviously, it can not give guarantees about the behaviour of the entire application.

Hibernate: A long read-only transaction will now require a small DB update in the middle

I have written quite a complicated engine of sorts which navigates up and down a large series of objects read in from the database.
So I have code that looks something like this:
public void go(long id) {
try {
beginTransaction();
Foo foo = someDao.find(id);
anotherObject.doSomething(foo);
commitTransaction();
} catch (Exception e) {
rollbackTransaction();
}
}
The code in doSomething(...) will call methods to get child objects of Foo and pass those child objects off to other classes and so on.
Prior to my problem, this use to just be a long read-only transaction. Now however, somewhere in the middle of all of this, there needs to be an update to the database. It is important that this update is committed straight away. As Hibernate doesn't support nested transactions, how would I deal with a situation like this to allow me to continue to pass my object around and still call getter methods to access children whilst having that database update get committed?
I thought of removing the long running transaction and having small transactions all over the place. Unfortunately, my code at the moment passes Foo and other child objects everywhere assuming it is still bound to the session. If this is my only solution, would that mean I would end up with ugly merge calls everywhere just to re-attach to the session so the getter methods work again? I'm sure there must be a more elegant solution.
Do the database update within your transaction, i.e. pass the required information to the thread performing your long transaction.
Alternatively, use entity listeners to signal what needs update, and then use the EntityManager.refresh method.
This will be getting a bit ugly with multi-threading and all, but note that you probably do not want the transaction to 'just update' at some random point in time, as that in many cases will yield unpredictable results, like breaking for-loops and such.
And if this is a n-level algorithm, is there any way of doing m levels at a time, save the state (say, the id's of the current scope), and run the next iteration in a new transaction? For this you can use one method without a transaction, which calls EJBs methods which are confined within their own transaction, returning state.
If you must stick with Hibernate (and cannot consider accessing the underlying JDBC driver, Spring Transactions or JTA), you can probably just spawn a thread to do the update and have the main thread wait until it's done (Thread.join()).
I've bitten the bullet and I believe splitting up the big transaction into smaller transactions to have more atomicity is best. This required some manual eager loading in the code however but my nested transaction issue is gone.

Categories