Is using the 'synchronized' keyword on methods in a Java DAO going to cause issues when used by a web application?
I ask because I have a multi-threaded stand alone application that needs the methods to by synchronized to avoid resource conflict, as seen here.
java.util.concurrent.ExecutionException: javax.persistence.PersistenceException: org.hibernate.HibernateException: Found shared references to a collection: com.replaced.orm.jpa.Entity.stuffCollection
What I am concerned about is that when a significant number of people try and use the application that the synchronized methods will block and slow the entire application down.
I am using a Spring injected JPA entity manager factory, which provides an entity manager to the DAO. I could technically remove the DAO layer and have the classes call the entity manager factory directly, but I enjoy the separation the DAO provides.
I should also note that I am being very careful not to pass around connected entity ORM objects between threads. I speculate that the resource conflict error comes about when accessing the DAO. I think multiple threads are going at the same time and try to persist or read from the database in non-atomic ways.
In this case is using a DAO going to do more harm then help?
A big piece of information I left out of the question is that the DAO is not a singleton. If I had been thinking lucidly enough to include that detail I probably wouldn't have asked the question in the first place.
If I understand correctly, Spring creates a new instance of the DAO class for each class that uses it. So the backing entity manager should be unique to each thread. Not sharing the entity manager is, as Rob H answered, the key thing here.
However, now I don't understand why I get errors when I remove synchronized.
According to this thread, the #PersistenceContext annotation creates a thread-safe SharedEntityManager. So you should be able to create a singleton DAO.
You say you are not sharing entity objects across threads. That's good. But you should also make sure you're not sharing EntityManager objects (or Session objects in Hibernate) across threads either. Frameworks like Spring manage this for you automatically by storing the session in a thread-local variable. If you're coding your own DAOs without the help of a framework, you need to take precautions yourself to avoid sharing them.
Once you do this, there should be no reason to synchronize DAO methods because none of the conversational state will be shared across threads. This is critical for a highly concurrent web application. The alternative is that only one thread will be able to access the DAO at one time, assuming they all share the same DAO instance. Not good at all for throughput.
If it needs to be synchronized for thread safety, then leave them there. The blocking is required anyway in that case. If the blocking is not required for the web application case, you can either:
leave it as is, since the performance
hit when there is no contention on
the lock is negligible, and
insignificant when taken into account
the expense of hitting the database.
Redesign it so that you add a
synchronization layer for the
standalone application case which
protects the underlying
unsynchronized DAO.
Personally, I would leave it as is and profile it to see if you need to refactor. Until then you are simply doing premature optimization.
Related
I have Java-based web server, and I also have DAO singleton object with method, whose SQL operations' logic must be synchronized in some way in order to guarantee data
integrity (method can be accessed from several Java threads simultaneously).
I was wondering to know whether DB transaction wrapping (serializable level) is better than DAO's method explicit synchronization in server side?
Yes, using transactions is better. With synchronizing in your code, locking on the class, the scope of that lock is your classloader, and standing up a second instance of your application will invalidate your locking, because the two instances are using different locks.
With database transactions you can have multiple instances of your application and the database treats all the transactions the same.
Also with databases you have options like dialing down the isolation level to no higher than what you need for that transaction, or using row-level locking. Those are harder to implement in code and you're still stuck with not being able to deploy a second instance.
Depends deeply in what is what you want to synchronize, synchronization is about resources, if you have more than one database in your code, and the data integrity problem is distributed, you need a transaction context, not only declaring it but knowing how to manage it properly.
Assuming you have a single database and assuming your problem is integrity caused by a possible inconsistency of a SELECT clause with a UPDATE or INSERT clause happening later in the method, The right solution would be a DB transaction and the use of a SELECT FOR UPDATE clause.
If your problem is about UPDATE/INSERT of different tables in the same operation you may have two resources, one is including CONSTRAINTS, this is the preferred method, but in some cases is not possible.
In the case that a CONTRAINT is not possible, consider a redesign of your DATAMODEL as managing this kind of problems synchronyzing app code is the worst solution, but even so is a solution.
Is there an example for a real-world application when you can send between 3 tiers (ui-business-persistance layer) the same object? For example for the sake of simplicity let it be the entity beans.
I mean if am getReference() from my entityManager and send to the user and i am let the user to be able to edit it, create it, and not copy in any point of my code?
Is there any concurrent or any know issue if you choose this option?
What is the drawback of this option??
I know there would be nice if we wrapped the object, this way we can attach, package entites, and can travel the object in one object maybe responseItem, we can attach other attributes like dirty flag, but if i want to keep simplicity is it possible? Or EntityManagers will mess up the whole thing? (i feel they will and cant handle e.g. detached objects properly so it is better to encapsulation the whole persistance and UI tier and deep copy them....)
Thanks any answer.
I found this answer in this book : Pro JPA 2
Mastering the Java™ Persistence API
I think the main reason is the concurrent thread acces for deep copy entities, referencing to this:
Applications may not access an entity directly from multiple threads
while it is managed by a persistence context. An application may
choose, however, to allow entities to be accessed concurrently when
they are detached. If it chooses to do so, the synchronization must be
controlled through the methods coded on the entity. Concurrent entity
state access is not recommended, however, because the entity model
does not lend itself well to concurrent patterns. It would be
preferable to simply copy the entity and pass the copied entity to
other threads for access and then merge any changes back into a
persistence context when they need to be persisted
I tend to use Hibernate in combination with Spring framework and it's declarative transaction demarcation capabilities (e.g., #Transactional).
As we all known, hibernate tries to be as non-invasive and as transparent as possible, however this proves a bit more challenging when employing lazy-loaded relationships.
I see a number of design alternatives with different levels of transparency.
Make relationships not lazy-loaded (e.g., fetchType=FetchType.EAGER)
This vioalites the entire idea of lazy loading ..
Initialize collections using Hibernate.initialize(proxyObj);
This implies relatively high-coupling to the DAO
Although we can define an interface with initialize, other implementations are not guaranteed to provide any equivalent.
Add transaction behaviour to the persistent Model objects themselves (using either dynamic proxy or #Transactional)
I've not tried the dynamic proxy approach, although I never seemed to get #Transactional working on the persistent objects themselves. Probably due to that hibernate is operation on a proxy to bein with.
Loss of control when transactions are actually taking place
Provide both lazy/non-lazy API, e.g, loadData() and loadDataWithDeps()
Forces the application to know when to employ which routine, again tight coupling
Method overflow, loadDataWithA(), ...., loadDataWithX()
Force lookup for dependencies, e.g., by only providing byId() operations
Requires alot of non-object oriented routines, e.g., findZzzById(zid), and then getYyyIds(zid) instead of z.getY()
It can be useful to fetch each object in a collection one-by-one if there's a large processing overhead between the transactions.
Make part of the application #Transactional instead of only the DAO
Possible considerations of nested transactions
Requires routines adapted for transaction management (e.g., suffiently small)
Small programmatic impact, although might result in large transactions
Provide the DAO with dynamic fetch profiles, e.g., loadData(id, fetchProfile);
Applications must know which profile to use when
AoP type of transactions, e.g., intercept operations and perform transactions when necessary
Requires byte-code manipulation or proxy usage
Loss of control when transactions are performed
Black magic, as always :)
Did I miss any option?
Which is your preferred approach when trying to minimize the impact of lazy-loaded relationships in your application design?
(Oh, and sorry for WoT)
As we all known, hibernate tries to be as non-invasive and as transparent as possible
I would say the initial assumption is wrong. Transaparent persistence is a myth, since application always should take care of entity lifecycle and of size of object graph being loaded.
Note that Hibernate can't read thoughts, therefore if you know that you need a particular set of dependencies for a particular operation, you need to express your intentions to Hibernate somehow.
From this point of view, solutions that express these intentions explicitly (namely, 2, 4 and 7) look reasonable and don't suffer from the lack of transparency.
I am not sure which problem (caused by lazyness) you're hinting to, but for me the biggest pain is to avoid losing session context in my own application caches. Typical case:
object foo is loaded and put into a map;
another thread takes this object from the map and calls foo.getBar() (something that was never called before and is lazy evaluated);
boom!
So, to address this we have a number of rules:
wrap sessions as transparently as possible (e.g. OpenSessionInViewFilter for webapps);
have common API for threads/thread pools where db session bind/unbind is done somewhere high in the hierarchy (wrapped in try/finally) so subclasses don't have to think about it;
when passing objects between threads, pass IDs instead of objects themselves. Receiving thread can load object if it needs to;
when caching objects, never cache objects but their ids. Have an abstract method in your DAO or manager class to load the object from 2nd level Hibernate cache when you know the ID. The cost of retrieving objects from 2nd level Hibernate cache is still far cheaper than going to DB.
This, as you can see, is indeed nowhere close to non-invasive and transparent. But the cost is still bearable, to compare with the price I'd have to pay for eager loading. The problem with latter is that sometimes it leads to the butterfly effect when loading single referenced object, let alone a collection of entities. Memory consumption, CPU usage and latency to mention the least are also far worse, so I guess I can live with it.
A very common pattern is to use OpenEntityManagerInViewFilter if you're building a web application.
If you're building a service, I would open the TX on the public method of the service, rather than on the DAOs, as very often a method requires to get or update several entities.
This will solve any "Lazy Load exception". If you need something more advanced for performance tuning, I think fetch profiles is the way to go.
I've recently taken on the database/hibernate side of our project and am having terrible trouble understanding some fundamentals of our design regarding the use of managed sessions.
We have a util class containing a static session that is only initialised once. Retrieval of the session is used by every DAO in the system via a static method getBoundSession(). The application runs 24/7. Is this a common design?
One of the benefits which is extremely useful, is that lazy attributes/collections on domain objects can be used throughout the business logic tier since the session is always open. Another benefit is that the objects retreived will stay cached within the session.
I feel we must be using Hibernate in the wrong way, it just doesn't seem right to have a single permanently open session. Also it causes problems when separate threads are using the util class, hence sharing the session. On the flip side I can't find a way to achieve the above benefits (particularly the first) with a different design. Can anyone shed any light on this?
Thanks
James
We have a util class containing a static session that is only initialised once. Retrieval of the session is used by every DAO in the system via a static method getBoundSession(). The application runs 24/7. Is this a common design?
Not it's not. The most common pattern in a multi-user client/server application is session-per-request and a session-per-application approach in a multi-user application is not only an anti-pattern, it's totally wrong:
A Session is not thread-safe.
You should rollback a transaction and close the Session after an Hibernate exception if you want to keep object state and database in sync.
The Session will grow indefinitely if keep it open too long.
You really need to read the whole Chapter 11. Transactions and Concurrency.
On the flip side I can't find a way to achieve the above benefits (particularly the first) with a different design.
Either use the OSIV (Open Session In View) pattern or load explicitely what you need per flow. And if you want to benefit from global caching, use the second level cache.
Keeping a session open for an extended period of time is OK (although that should not be eternity :-) A session should identify a unit of work - a coherent set of queries / updates which logically belong together. Can you identify such units in your app - e.g. client requests or conversations? If so, create a separate session for each of these.
You should also definitely use a separate session per thread (typically a unit of work is handled by a single thread anyway). A simple way to achieve this is using thread local storage.
It's an anti-pattern.
If you use one session for all requests. Then consider 100 clients (100 requests/threads) running almost simultaneously. You detach something from the session, but then another user reloads the same thing. You will need syncrhonization, which will hit performance. And you will have totally random behaviour that will be nightmare to debug.
The SessionFactory is static / per-application, not the Session. The factory should build a session whenever required. Read sessions and transactions docs at hibernate.
I trying to understand the best prastice of using ThreadLocal for the above questions. From my understanding the reason of using this is to ensure only one session/pm created for entire application. My question is
is there any impact of using threadlocal like this on clustering application? (example google app engine) ?
if u use "transactional" begin,commit on my application, i do not need to use threadlocal right? since "transaction" already ensure my session open and close properly?
if i need to use "transactional", tx, it should be in threadlocal as well?
why not just use "static" instead of "threadlocal" ?
i interested to hear feedback from you all regarding advantages/disadvantages of using this techinque?
Probably not unless your clustering software can migrate threads between nodes. In this case, you'd need to migrate the thread local data as well.
No. The transaction is attached to the session, so you must keep both in sync. While you can begin a transaction in thread A and commit it in thread B, it's usually very hard to make sure that this work reliably. Therefore: Don't.
Yes.
static is global for the whole application. threadlocal is global per Thread.
Conclusion: If you're a beginner in this area, I suggest to use Spring. The Spring Framework solves many of the problems for you and helps you with useful error messages when something breaks.
Follow the documentation to the letter, especially when it doesn't make sense. Chances are that you missed something important and the Spring guys are right.
ThreadLocal is not used to create one session for the whole application. It is used to create one session for every thread. Every user session will be one thread so the ThreadLocal ensures that every user accessing you web page/ database will get its own database connection. If you use a static singleton pattern every user on the server will use the same database connection and I don't know how that would work out.
The implementation of many of the Transaction engines is actually using ThreadLocal to associate the session state you have with the database to a particular thread. This makes for instance running multiple threads inside of a transaction very difficult.
ThreadLocal is a guarantee of Thread safety but queryable in a semi static way later on by another piece of code. Its a thread global variable. This makes it useful for temporary but session aware information. Another use beyond transactions might be holding onto internal parameters for Authorisation which are then checked with a proxy.