I have an application with hibernate. There are two main threads, first one is collecting/modifying data and second one is saving data into database, in certain cases the program may try to modify and save the entity at same time.
Do i have to make all entities thread-safe (use only synchronized collections, atomic objects instead of primitives...) or hibernate takes care about it automatically?
Hibernate instantiates objects per session, so classic synchronization is not needed (and would not be helpful).
The most common way to take care of concurrent data access and modifications is to use locks.
Related
I have Java-based web server, and I also have DAO singleton object with method, whose SQL operations' logic must be synchronized in some way in order to guarantee data
integrity (method can be accessed from several Java threads simultaneously).
I was wondering to know whether DB transaction wrapping (serializable level) is better than DAO's method explicit synchronization in server side?
Yes, using transactions is better. With synchronizing in your code, locking on the class, the scope of that lock is your classloader, and standing up a second instance of your application will invalidate your locking, because the two instances are using different locks.
With database transactions you can have multiple instances of your application and the database treats all the transactions the same.
Also with databases you have options like dialing down the isolation level to no higher than what you need for that transaction, or using row-level locking. Those are harder to implement in code and you're still stuck with not being able to deploy a second instance.
Depends deeply in what is what you want to synchronize, synchronization is about resources, if you have more than one database in your code, and the data integrity problem is distributed, you need a transaction context, not only declaring it but knowing how to manage it properly.
Assuming you have a single database and assuming your problem is integrity caused by a possible inconsistency of a SELECT clause with a UPDATE or INSERT clause happening later in the method, The right solution would be a DB transaction and the use of a SELECT FOR UPDATE clause.
If your problem is about UPDATE/INSERT of different tables in the same operation you may have two resources, one is including CONSTRAINTS, this is the preferred method, but in some cases is not possible.
In the case that a CONTRAINT is not possible, consider a redesign of your DATAMODEL as managing this kind of problems synchronyzing app code is the worst solution, but even so is a solution.
The Ebean ORM is the go-to ORM for the Play! Java Framework.
As I am making the choice between building my own object relational mappers and other object relational behavioural patterns, and using an ORM, a criterion that strikes me as really important is whether or not Ebeans supports concurrent access to mappers.
Indeed albeit Play! uses asynchronous threading, there can still be a reader/writer problem that arises from concurrent requests using the same objects.
Hence the question is, do the Ebean ORM supports multithreading (reader/writer problem)?
Ebeans supports concurrent access to mappers
EbeanServer is safe for multiple concurrent use yes. The EbeanServer instance is built and contains all the meta data about the bean properties etc (so the mapping information).
EbeanServer internally holds some mutating data such as L2 cache, performance metrics for query execution, query execution plans etc but these are written to be thread safe.
In general Query objects and query results (object graphs) are not thread safe and intended for single threaded use. You can create read-only object graphs that can't be mutated and hence safe for multi-threaded use via query.setReadOnly(true).
EbeanServer also has support for background fetching via findFutureRowCount(), findFutureList() etc which internally makes a copy of the query and takes care of the details. findFutureRowCount() is used internally as part of PagedList to get total row count.
I am extracting the following lines from the famous book - Mastering Enterprise JavaBeans™ 3.0.
Concurrent Access and Locking:Concurrent access to data in the database is always protected by transaction isolation, so you need not design additional concurrency controls to protect your
data in your applications if transactions are used appropriately. Unless you make specific provisions, your entities will be protected by container-managed transactions using the isolation levels that are configured for your persistence provider and/or EJB container’s transaction service. However, it is important to understand the concurrency control requirements and semantics of your applications.
Then it talks about Java Transaction API, Container Managed and Bean Managed Transaction, different TransactionAttributes, different Isolation Levels. It also states that -
The Java Persistence specification defines two important features that can be
tuned for entities that are accessed concurrently:
1.Optimistic locking using a version attribute
2.Explicit read and write locks
Ok - I read everything and understood them well. But the question comes in which scenario I need the use all these techniques? If I use Container Managed transaction and it does everything for me why I need to bother about all these details? I know the significance of TransactionAttributes (REQUIRED, REQUIRES_NEW) and know in which cases I need to use them, but what about the others? More specifically -
Why do I need Bean Managed transaction?
Why do we need Read and Write Lock on Entity classes?
Why do we need version attribute?
For Q2 and Q3 - I think Entity classes are not thread safe and hence we need locking over there. But database is managed at the EJB class by the JTA API (as stated in the first para), and then why do we need to manage the Entity classes separately? I know how the Lock and Version works and why they are required. But why they are coming into the picture since JTA is already present?
Can you please provide any answer to them? If you give me some URLs even that will be very highly appreciated.
Many thanks in advance.
You don't need locking because entity classes are not thread-safe. Entities must not be shared between threads, that's all.
Your database comes with ACID guarantees, but that is not always sufficient, and you sometimes nees to explicitely lock rows to get what you need. Imagine the following scenarios:
transaction A reads employee 1 from database
transaction B reads employee 1 from database
transaction A sets employee 1 salary to 3000
transaction B sets employee 1 salary to 4000
transaction A commits
transaction B commits
The end result is that the salary is 4000. The user that started transaction A is completely unaware that even though he set the salary to 3000, another user, concurrently, set it to 4000. Depending on which transaction writes last, the end result is different (and thus unpredictable). That's the kind of situation that can be avoided using optimistic locking.
Next scenario: you want to generate purely sequential invoice numbers, without lost values and without duplicates. You could imagine reading and incrementing a value in the database to do that. But two transactions might both read the same value concurrently, and then incrementing it. You would thus have a duplicate. Using a lock in the table row holding the next number allows avoiding this situation.
I need to update a whole collection concurrently in a background thread, but read operation might take place at the same time. It takes about 3 seconds to update the collection when I benchmark it. Is there any way to lock a collection while updating the collection? I try to create a new collection and insert all the documents into it and rename it to the original collection with "dropToTarget=true", but I am not sure how safe and stable it is in terms of sharding. I read that renameCollection is incompatible with the sharding.
It would be great if someone can suggest if there is a good idea.
Thanks.
Do you presented two possible strategies to update your collection, one being inline with a lock on it and the other one with a temporary collection?
As the mongodb documentation clearly states it will not work for sharded collections (http://docs.mongodb.org/manual/reference/command/renameCollection/). From my understanding this means that your collection you want to rename isn't sharded, as you need to delete the other collection before you do the actual renaming you'll mostlikely loose any previously kept sharding (-information). So you would need to reactivate the sharding. I highly discourage from using the two collection approach, especially if you're sharding your data.
You would need to get all the data from your sharded collection and store it centralized, once you're done with updating you need to rename the collection and shard it again. This will cause much I/O for your whole system, especially for the client doing the update.
Depending on your system architecture (with a single point of entry). You could easily hold some global flag telling you if you currently have the collection update running. Forbidding other write operations.
For multi-entry points into your MongoDB you might try $isolated, but this doesn't work with sharded collections. And I'm not sure if it allows read operations, the documentation isn't very clear.
Is it strictly disallowed to write any data, while the update is in progress? What type of updates do you perform. Can they influence each other? Or would it be possible to have concurrent writes?
We are building a product, so from performance point of view I need some help.
We are using complete Spring (MVC, JPA, Security etc..)
We have a requirement where say for a particular flow there can be 100 Business Rules getting executed at the same time. There can be n number of such flows and business rules.
These rules when executed actually fetch records from tables in database, these will contain few LAZILY INITIALIZED ENTITIES also.
I used Futures/Callables for multi threading purpose, but the problem is it fails to load LAZY variables. It gives Hibernate loading exception, probably some issue in TRANSACTIONAL not getting distributed in different threads.
Please let me know if there is any other way to approach?
if some Entity /Entity Collection is lazy fetched , and you are accessing it in another thread, you will face LazyInitialization exception, as lazy loaded entities can be accessed only within a transaction, and transaction wont span across, threads.
You can use DTO pattern, or if you are sharing an entity across threads, call its lazy initialized collections getter within the transaction so that they are fetched within transaction itself.
If you really need async processing then I suggest you use the Java EE specified way i.e. using JMS/MDB.
Anyways, since it seems like you want all the data to be loaded anyway for your processing. So Eager fetch all the required data and then submit the data for parallel processing. Or let each of the Tasks (Callables) fetch the data they require. Essentially, I am asking you to change your design so that the transactional boundaries doesn't cross multiple threads. Localize the boundaries within a single thread.