I have Java-based web server, and I also have DAO singleton object with method, whose SQL operations' logic must be synchronized in some way in order to guarantee data
integrity (method can be accessed from several Java threads simultaneously).
I was wondering to know whether DB transaction wrapping (serializable level) is better than DAO's method explicit synchronization in server side?
Yes, using transactions is better. With synchronizing in your code, locking on the class, the scope of that lock is your classloader, and standing up a second instance of your application will invalidate your locking, because the two instances are using different locks.
With database transactions you can have multiple instances of your application and the database treats all the transactions the same.
Also with databases you have options like dialing down the isolation level to no higher than what you need for that transaction, or using row-level locking. Those are harder to implement in code and you're still stuck with not being able to deploy a second instance.
Depends deeply in what is what you want to synchronize, synchronization is about resources, if you have more than one database in your code, and the data integrity problem is distributed, you need a transaction context, not only declaring it but knowing how to manage it properly.
Assuming you have a single database and assuming your problem is integrity caused by a possible inconsistency of a SELECT clause with a UPDATE or INSERT clause happening later in the method, The right solution would be a DB transaction and the use of a SELECT FOR UPDATE clause.
If your problem is about UPDATE/INSERT of different tables in the same operation you may have two resources, one is including CONSTRAINTS, this is the preferred method, but in some cases is not possible.
In the case that a CONTRAINT is not possible, consider a redesign of your DATAMODEL as managing this kind of problems synchronyzing app code is the worst solution, but even so is a solution.
Related
I have a Spring Boot web service with some method that does a get-from-DB followed by insert-if-not-present (with some logic in between that I'd rather keep in Java for now).
The method is annotated with #Transactional, but of course with the default isolation level (read committed), it's possible to end up with the same row inserted twice if two run in parallel.
If I change isolation level to serializable, then I would get a performance hit.
Would it be better to use plain Java synchronized, and to synchronize on a global object that uniquely represents the item being queried/added? Basically I would intern the string param that gets passed to the method, which represents some item name, and synchronize against that.
Obviously I wouldn't be able to scale horizontally, but let's assume this instance of the web service is the only client of the DB.
Add unique constraint for inserted data. This way you will be not able to insert the same data twice.
If you can be sure that the Web Service is the only client and that only one instance is running than Synchronized will do the work perfectly but it's not a good practices and you better do Strict Db isolation even implement coherence checks and contraint at DB Level
Consider Spring MVC java web-application, which provides some REST API.
Let's say it has many methods, one of them is DELETE /api/foo/{id}, which obviously deletes foo entity from the DB with given id.
The problem is that due to big data in the DB, this operation is not immediate, so if client tries perform simultaneously multiply delete operations on same entity, say
DELETE /api/foo/123 x N times (by mistake in client software of course),
it causes some unpleasant side effects in the DB (you know, if you try delete same entity in several transactions, that's not generally nice).
My question is: what is the best practice in Spring MVC to prevent such situations?
I can certainly introduce synchronisation on Foo id in each such update method (PUT/DELETE). I will need to do it for all entities and all PUT/DELETE API methods though, which I really don't want to do. I suppose it should be some elegant and nice solution, how to perform such type of synchronisation on interceptor/servlet level, i.e. not on service of controller level.
I can also create specific interceptor and perform there waiting for duplicated requests (requests with same URL and parameters). But again, it doesn't sound as an elegant solution (until I will be ensured that it is not possible to configure in Spring MVC somehow in more beauty way).
That is a problem of concurrency that shall be handled by using the appropriate transaction and locking level. Unfortunately, there is no single size fits all way here and depending on your actual requirements, you could have to implement optimistic or pessimistic locking, as well as one of the possible transaction level (from no transaction at all to serializable transactions).
In general, handling such questions at the web level is a bad idea, because you will end in questions like what to do in on request wants to delete some data that another one is displaying at the same time? In SpringMVC, the common way is to use transactional methods in the service layer. Additionaly, you should declare an optimistic or pessimistic locking system in the persistence layer.
Optimistic layer normally give a higher throughput, at the cost of some transaction ending in exceptions. In that case, current best practices are now to report the problem to the user asking him/her to send his/her request again.
I am extracting the following lines from the famous book - Mastering Enterprise JavaBeans™ 3.0.
Concurrent Access and Locking:Concurrent access to data in the database is always protected by transaction isolation, so you need not design additional concurrency controls to protect your
data in your applications if transactions are used appropriately. Unless you make specific provisions, your entities will be protected by container-managed transactions using the isolation levels that are configured for your persistence provider and/or EJB container’s transaction service. However, it is important to understand the concurrency control requirements and semantics of your applications.
Then it talks about Java Transaction API, Container Managed and Bean Managed Transaction, different TransactionAttributes, different Isolation Levels. It also states that -
The Java Persistence specification defines two important features that can be
tuned for entities that are accessed concurrently:
1.Optimistic locking using a version attribute
2.Explicit read and write locks
Ok - I read everything and understood them well. But the question comes in which scenario I need the use all these techniques? If I use Container Managed transaction and it does everything for me why I need to bother about all these details? I know the significance of TransactionAttributes (REQUIRED, REQUIRES_NEW) and know in which cases I need to use them, but what about the others? More specifically -
Why do I need Bean Managed transaction?
Why do we need Read and Write Lock on Entity classes?
Why do we need version attribute?
For Q2 and Q3 - I think Entity classes are not thread safe and hence we need locking over there. But database is managed at the EJB class by the JTA API (as stated in the first para), and then why do we need to manage the Entity classes separately? I know how the Lock and Version works and why they are required. But why they are coming into the picture since JTA is already present?
Can you please provide any answer to them? If you give me some URLs even that will be very highly appreciated.
Many thanks in advance.
You don't need locking because entity classes are not thread-safe. Entities must not be shared between threads, that's all.
Your database comes with ACID guarantees, but that is not always sufficient, and you sometimes nees to explicitely lock rows to get what you need. Imagine the following scenarios:
transaction A reads employee 1 from database
transaction B reads employee 1 from database
transaction A sets employee 1 salary to 3000
transaction B sets employee 1 salary to 4000
transaction A commits
transaction B commits
The end result is that the salary is 4000. The user that started transaction A is completely unaware that even though he set the salary to 3000, another user, concurrently, set it to 4000. Depending on which transaction writes last, the end result is different (and thus unpredictable). That's the kind of situation that can be avoided using optimistic locking.
Next scenario: you want to generate purely sequential invoice numbers, without lost values and without duplicates. You could imagine reading and incrementing a value in the database to do that. But two transactions might both read the same value concurrently, and then incrementing it. You would thus have a duplicate. Using a lock in the table row holding the next number allows avoiding this situation.
ThreadLocal<Session> tl = new ThreadLocal<Session>();
tl.set(session);
to get the session,
Employee emp = (Employee)((Session)tl.get().get(Employee.class, 1));
If our application is web based, the web container creates a separate thread for each request.
If all these requests concurrently using the same single Session object , we should get
unwanted results in our database operations.
To overcome from above results, it is good practice to set our session to threadLocal object
which does not allows concurrent usage of session.I think, If it is correct the application performance should be very poor.
What is the good approach in above scenarios.
If I'm in wrong track , in which situations we need to go for ThreadLocal.
I'm new to hibernate, please excuse me if this type questioning is silly.
thanks in advance.
Putting the Hibernate Session in ThreadLocal is unlikely to achieve the isolation between requests that you want. Surely you create a new Session for each request using a SessionFactory backed by a connection pooling implementation of DataSource, which means that the local reference to the Session is on the stack anyway. Changing that local reference to a member variable only complicates the code, imho.
Anyhow, ensuring isolation within a single container doesn't address the actual problem - how is data accessed efficiently while maintaining consistency within a multi-threaded environment.
There are two parts to the problem you mention - the first is that a database connection is an expensive resource, the second that you need to ensure some level of data consistency between threads/requests.
The general approach to the resource problem is to use a database connection pool (which I'd guess you're already doing). As each request is processed, connections are obtained from the pool and returned when finished but importantly the connections in the pool are maintained beyond the lifetime of a request thus avoiding the cost of creating a connection each time it is needed.
The consistency problem is a little trickier and there's no one size fits all model. What you need to be doing is thinking about what level of consistency you need - questions like does it matter if data is read at the same time it's being written, do updates absolutely have to be atomic, etc.
Once you know the answer to these questions there two places you need to look at consistency - in the database and in the code.
With the database you need to look at database level locks and create a scheme suitable for your application by applying that appropriate isolation levels.
With the code, things are a little more complicated. Data is often loaded and displayed for a period of time before updates are written back - no problem if there's a single user but in a multi-user system it's possible that updates are made based on stale data or multiple updates occur simulatiously. It may be acceptable to have a policy of last update wins, in which case it's simple, but if not you'll need to be using version numbers or old/new comparisons to ensure integrity at the time the updates are applied.
I am not sure if you have compulsion of using ThreadLocal. Using ThreadLocal to store session object is definitely is not a good idea, specially when you are using hibernate along with spring.
A typical scheme for using Hibernate with Spring is:
Inject the sessionFactory in your DAO. I assume that you have sessionFactory already configured which is backed by a pooled datasource.
Now in your DAO class, a session can be accessed as follows.
Session session = sessionFactory.getCurrentSession();
Here is a link to related article.
Please note that this example is specific to Hiberante 3.x APIs. This takes care of session creation/closure/thread-safety aspect internally and its neat too.
Is using the 'synchronized' keyword on methods in a Java DAO going to cause issues when used by a web application?
I ask because I have a multi-threaded stand alone application that needs the methods to by synchronized to avoid resource conflict, as seen here.
java.util.concurrent.ExecutionException: javax.persistence.PersistenceException: org.hibernate.HibernateException: Found shared references to a collection: com.replaced.orm.jpa.Entity.stuffCollection
What I am concerned about is that when a significant number of people try and use the application that the synchronized methods will block and slow the entire application down.
I am using a Spring injected JPA entity manager factory, which provides an entity manager to the DAO. I could technically remove the DAO layer and have the classes call the entity manager factory directly, but I enjoy the separation the DAO provides.
I should also note that I am being very careful not to pass around connected entity ORM objects between threads. I speculate that the resource conflict error comes about when accessing the DAO. I think multiple threads are going at the same time and try to persist or read from the database in non-atomic ways.
In this case is using a DAO going to do more harm then help?
A big piece of information I left out of the question is that the DAO is not a singleton. If I had been thinking lucidly enough to include that detail I probably wouldn't have asked the question in the first place.
If I understand correctly, Spring creates a new instance of the DAO class for each class that uses it. So the backing entity manager should be unique to each thread. Not sharing the entity manager is, as Rob H answered, the key thing here.
However, now I don't understand why I get errors when I remove synchronized.
According to this thread, the #PersistenceContext annotation creates a thread-safe SharedEntityManager. So you should be able to create a singleton DAO.
You say you are not sharing entity objects across threads. That's good. But you should also make sure you're not sharing EntityManager objects (or Session objects in Hibernate) across threads either. Frameworks like Spring manage this for you automatically by storing the session in a thread-local variable. If you're coding your own DAOs without the help of a framework, you need to take precautions yourself to avoid sharing them.
Once you do this, there should be no reason to synchronize DAO methods because none of the conversational state will be shared across threads. This is critical for a highly concurrent web application. The alternative is that only one thread will be able to access the DAO at one time, assuming they all share the same DAO instance. Not good at all for throughput.
If it needs to be synchronized for thread safety, then leave them there. The blocking is required anyway in that case. If the blocking is not required for the web application case, you can either:
leave it as is, since the performance
hit when there is no contention on
the lock is negligible, and
insignificant when taken into account
the expense of hitting the database.
Redesign it so that you add a
synchronization layer for the
standalone application case which
protects the underlying
unsynchronized DAO.
Personally, I would leave it as is and profile it to see if you need to refactor. Until then you are simply doing premature optimization.