I have a Spring Boot web service with some method that does a get-from-DB followed by insert-if-not-present (with some logic in between that I'd rather keep in Java for now).
The method is annotated with #Transactional, but of course with the default isolation level (read committed), it's possible to end up with the same row inserted twice if two run in parallel.
If I change isolation level to serializable, then I would get a performance hit.
Would it be better to use plain Java synchronized, and to synchronize on a global object that uniquely represents the item being queried/added? Basically I would intern the string param that gets passed to the method, which represents some item name, and synchronize against that.
Obviously I wouldn't be able to scale horizontally, but let's assume this instance of the web service is the only client of the DB.
Add unique constraint for inserted data. This way you will be not able to insert the same data twice.
If you can be sure that the Web Service is the only client and that only one instance is running than Synchronized will do the work perfectly but it's not a good practices and you better do Strict Db isolation even implement coherence checks and contraint at DB Level
Related
I have Java-based web server, and I also have DAO singleton object with method, whose SQL operations' logic must be synchronized in some way in order to guarantee data
integrity (method can be accessed from several Java threads simultaneously).
I was wondering to know whether DB transaction wrapping (serializable level) is better than DAO's method explicit synchronization in server side?
Yes, using transactions is better. With synchronizing in your code, locking on the class, the scope of that lock is your classloader, and standing up a second instance of your application will invalidate your locking, because the two instances are using different locks.
With database transactions you can have multiple instances of your application and the database treats all the transactions the same.
Also with databases you have options like dialing down the isolation level to no higher than what you need for that transaction, or using row-level locking. Those are harder to implement in code and you're still stuck with not being able to deploy a second instance.
Depends deeply in what is what you want to synchronize, synchronization is about resources, if you have more than one database in your code, and the data integrity problem is distributed, you need a transaction context, not only declaring it but knowing how to manage it properly.
Assuming you have a single database and assuming your problem is integrity caused by a possible inconsistency of a SELECT clause with a UPDATE or INSERT clause happening later in the method, The right solution would be a DB transaction and the use of a SELECT FOR UPDATE clause.
If your problem is about UPDATE/INSERT of different tables in the same operation you may have two resources, one is including CONSTRAINTS, this is the preferred method, but in some cases is not possible.
In the case that a CONTRAINT is not possible, consider a redesign of your DATAMODEL as managing this kind of problems synchronyzing app code is the worst solution, but even so is a solution.
Consider Spring MVC java web-application, which provides some REST API.
Let's say it has many methods, one of them is DELETE /api/foo/{id}, which obviously deletes foo entity from the DB with given id.
The problem is that due to big data in the DB, this operation is not immediate, so if client tries perform simultaneously multiply delete operations on same entity, say
DELETE /api/foo/123 x N times (by mistake in client software of course),
it causes some unpleasant side effects in the DB (you know, if you try delete same entity in several transactions, that's not generally nice).
My question is: what is the best practice in Spring MVC to prevent such situations?
I can certainly introduce synchronisation on Foo id in each such update method (PUT/DELETE). I will need to do it for all entities and all PUT/DELETE API methods though, which I really don't want to do. I suppose it should be some elegant and nice solution, how to perform such type of synchronisation on interceptor/servlet level, i.e. not on service of controller level.
I can also create specific interceptor and perform there waiting for duplicated requests (requests with same URL and parameters). But again, it doesn't sound as an elegant solution (until I will be ensured that it is not possible to configure in Spring MVC somehow in more beauty way).
That is a problem of concurrency that shall be handled by using the appropriate transaction and locking level. Unfortunately, there is no single size fits all way here and depending on your actual requirements, you could have to implement optimistic or pessimistic locking, as well as one of the possible transaction level (from no transaction at all to serializable transactions).
In general, handling such questions at the web level is a bad idea, because you will end in questions like what to do in on request wants to delete some data that another one is displaying at the same time? In SpringMVC, the common way is to use transactional methods in the service layer. Additionaly, you should declare an optimistic or pessimistic locking system in the persistence layer.
Optimistic layer normally give a higher throughput, at the cost of some transaction ending in exceptions. In that case, current best practices are now to report the problem to the user asking him/her to send his/her request again.
I am extracting the following lines from the famous book - Mastering Enterprise JavaBeans™ 3.0.
Concurrent Access and Locking:Concurrent access to data in the database is always protected by transaction isolation, so you need not design additional concurrency controls to protect your
data in your applications if transactions are used appropriately. Unless you make specific provisions, your entities will be protected by container-managed transactions using the isolation levels that are configured for your persistence provider and/or EJB container’s transaction service. However, it is important to understand the concurrency control requirements and semantics of your applications.
Then it talks about Java Transaction API, Container Managed and Bean Managed Transaction, different TransactionAttributes, different Isolation Levels. It also states that -
The Java Persistence specification defines two important features that can be
tuned for entities that are accessed concurrently:
1.Optimistic locking using a version attribute
2.Explicit read and write locks
Ok - I read everything and understood them well. But the question comes in which scenario I need the use all these techniques? If I use Container Managed transaction and it does everything for me why I need to bother about all these details? I know the significance of TransactionAttributes (REQUIRED, REQUIRES_NEW) and know in which cases I need to use them, but what about the others? More specifically -
Why do I need Bean Managed transaction?
Why do we need Read and Write Lock on Entity classes?
Why do we need version attribute?
For Q2 and Q3 - I think Entity classes are not thread safe and hence we need locking over there. But database is managed at the EJB class by the JTA API (as stated in the first para), and then why do we need to manage the Entity classes separately? I know how the Lock and Version works and why they are required. But why they are coming into the picture since JTA is already present?
Can you please provide any answer to them? If you give me some URLs even that will be very highly appreciated.
Many thanks in advance.
You don't need locking because entity classes are not thread-safe. Entities must not be shared between threads, that's all.
Your database comes with ACID guarantees, but that is not always sufficient, and you sometimes nees to explicitely lock rows to get what you need. Imagine the following scenarios:
transaction A reads employee 1 from database
transaction B reads employee 1 from database
transaction A sets employee 1 salary to 3000
transaction B sets employee 1 salary to 4000
transaction A commits
transaction B commits
The end result is that the salary is 4000. The user that started transaction A is completely unaware that even though he set the salary to 3000, another user, concurrently, set it to 4000. Depending on which transaction writes last, the end result is different (and thus unpredictable). That's the kind of situation that can be avoided using optimistic locking.
Next scenario: you want to generate purely sequential invoice numbers, without lost values and without duplicates. You could imagine reading and incrementing a value in the database to do that. But two transactions might both read the same value concurrently, and then incrementing it. You would thus have a duplicate. Using a lock in the table row holding the next number allows avoiding this situation.
My question is this: Is there ever a role for JPA merge in a stateless web application?
There is a lot of discussion on SO about the merge operation in JPA. There is also a great article on the subject which contrasts JPA merge via a more manual Do-It-Yourself process (where you find the entity via the entity manager and make your changes).
My application has a rich domain model (ala domain-driven design) that uses the #Version annotation in order to make use of optimistic locking. We have also created DTOs to send over the wire as part of our RESTful web services. The creation of this DTO layer also allows us to send to the client everything it needs and nothing it doesn't.
So far, I understand this is a fairly typical architecture. My question is about the service methods that need to UPDATE (i.e. HTTP PUT) existing objects. In this case we have these two approaches 1) JPA Merge, and 2) DIY.
What I don't understand is how JPA merge can even be considered an option for handling updates. Here's my thinking and I am wondering if there is something I don't understand:
1) In order to properly create a detached JPA entity from a wire DTO, the version number must be set correctly...else an OptimisticLockException is thrown. But the JPA spec says:
An entity may access the state of its version field or property or
export a method for use by the application to access the version, but
must not modify the version value[30]. Only the persistence provider
is permitted to set or update the value of the version attribute in
the object.
2) Merge doesn't handle bi-directional relationships ... the back-pointing fields always end up as null.
3) If any fields or data is missing from the DTO (due to a partial update), then the JPA merge will delete those relationships or null-out those fields. Hibernate can handle partial updates, but not JPA merge. DIY can handle partial updates.
4) The first thing the merge method will do is query the database for the entity ID, so there is no performance benefit over DIY to be had.
5) In a DYI update, we load the entity and make the changes according to the DTO -- there is no call to merge or to persist for that matter because the JPA context implements the unit-of-work pattern out of the box.
Do I have this straight?
Edit:
6) Merge behavior with regards to lazy loaded relationships can differ amongst providers.
Using Merge does require you to either send and receive a complete representation of the entity, or maintain server side state. For trivial CRUD-y type operations, it is easy and convenient. I have used it plenty in stateless web apps where there is no meaningful security hazard to letting the client see the entire entity.
However, if you've already reduced operations to only passing the immediately relevant information, then you need to also manually write the corresponding services.
Just remember that when doing your 'DIY' update you still need to pass a Version number around on the DTO and manually compare it to the one that comes out of the database. Otherwise you don't get the Optimistic Locking that spans 'user think-time' that you would have if you were using the simpler approach with merge.
You can't change the version on an entity created by the provider, but when you have made your own instance of the entity class with the new keyword it is fine and expected to set the version on it.
It will make the persistent representation match the in-memory representation you provide, this can include making things null. Remember when an object is merged that object is supposed to be discarded and replaced with the one returned by merge. You are not supposed to merge an object and then continue using it. Its state is not defined by the spec.
True.
Most likely, as long as your DIY solution is also using the entity ID and not an arbitrary query. (There are other benefits to using the 'find' method over a query.)
True.
I would add:
7) Merge translates to insert or to update depending on the existence of the record on DB, hence it does not deal correctly with update-vs-delete optimistic concurrency. That is, if another user concurrently deletes the record and you update it, it must (1) throw a concurrency exception... but it does not, it just inserts the record as new one.
(1) At least, in most cases, in my opinion, it should. I can imagine some cases where I would want this use case to trigger a new insert, but they are far from usual. At least, I would like the developer to think twice about it, not just accept that "merge() == updateWithConcurrencyControl()", because it is not.
Multiple clients are concurrently accessing a JAX-JWS webservice running on Glassfish or some other application server. Persistence is provided by something like Hibernate or OpenJPA. Database is Microsoft SQL Server 2005.
The service takes a few input parameters, some "magic" occurs, and then returns what is basically a transformed version of the next available value in a sequence, with the particular sequence and transformation being determined by the inputs. The "magic" that performs the transformation depends on the input parameters and various database tables (describing the relationship between the input parameters, the transformation, the sequence to get the next base value from, and the list of already served values for a particular sequence). Not sure if this could all be wrapped up in a stored procedure (probably), but also not sure if the client wants it there.
What is the best way to ensure consistency (i.e. each value is unique and values are consumed in order, with no opportunity for a value to reach a client without also being stored in the database) while maintaining performance?
It's hard to provide a complete answer without a full description (table schemas, etc.), but giving my best guess here as to how it works, I would say that you need a transaction around your "magic", which marks the next value in the sequence as in use before returning it. If you want to reuse sequence numbers then you can later unflag them (for example, if the user then cancels what they're doing) or you can just consider them lost.
One warning is that you want your transaction to be as short and as fast as possible, especially if this is a high-throughput system. Otherwise your sequence tables could quickly become a bottleneck. Analyze the process and see what the shortest transaction window is that will still allow you to ensure that a sequence isn't reused and use that.
It sounds like you have most of the elements you need here. One thing that might pose difficulty, depending on how you've implemented your service, is that you don't want to write any response to the browser until your database transaction has been safely committed without errors.
A lot of web frameworks keep the persistence session open (and uncommitted) until the response has been rendered to support lazy loading of persistent objects by the view. If that's true in your case, you'll need to make sure that none of that rendered view is delivered to the client until you're sure it's committed.
One approach is a Servlet Filter that buffers output from the servlet or web service framework that you're using until it's completed its work.