How does Spring JPA with hibernate manage concurrent updates - java

I am trying to write an app that can run active-active, with both instances using the same DB. I was worried that in updating a entity (calling repo.findById(), entity.setX(), repo.save(entity)) there was a potential race condition where updates could overwrite each other if there was a large enough delay between find and save.
To test it I made a method that loads the entity, waits 10 seconds, then adds something to a list attribute and saves it. Calling that twice I expected the second save to overwrite the first one, but surprisingly both updates were persisted.
Whilst this is what I wanted I was wondering if anyone knew how/why spring did this, because I want to make sure it will do this every time? I understand it uses optimistic locking by default, but you need the #version annotation (which I don't have). Will this also work if the updates come from separate apps, or did it only work because both of them were from the same application?

Related

Is it possible to know the progress of a transaction.commit operation?

I'm using JPA in my application to bundle a series of insert and updates into one commit() operation.
While that commit is running, is it possible to learn the progress of that operation (0-100%) so I can display that in a progress bar to the user?
I could split my updates into many commits, but that would make the entire job take longer.
Using EclipseLink as my JPA provider.
I think the only way to create something like that would be to use the org.hibernate.stat.internal.StatisticsImpl class of hibernate. You can programmatically get different metrics from the instantiation of this class. Hibernate statistics generation must be enabled for this to work. You can enable it by setting the property hibernate.generate_statistics to true.
The statistics instance has a method called getQueryExecutionCount() that you might be able to use to build a progress bar. It gives the number of queries that were executed by the current JPA EntityManagerFactory or Hibernate. If you keep calling that method in a while loop while the queries are still running you might be able to show the percentage of completed queries by dividing the return value of getQueryExecutionCount() by the total amount of queries that need to be processed. Heres a good tutorial that explains all the different metrics that are available.
I must also point out that turning on hibernate statistics could slow your application down. So if you want to use this feature in production then you must also test whether this slowdown is acceptable or not.
EDIT: You could also choose to only turn hibernate statistics on right before the queries will run and turn it off after they've completed.
The StatisticsImpl class has a method called setStatisticsEnabled(boolean b) that you can use to programmatically turn it on or off.
EDIT 2: I'm assuming here that you are using Hibernate as the JPA provider. If not i'll remove this answer.

Spring data save vs saveAll performance

I'm trying to understand why saveAll has better performance than save in the Spring Data repositories. I'm using CrudRepository which can be seen here.
To test I created and added 10k entities, which just have an id and a random string (for the benchmark I kept the string a constant), to a list. Iterating over my list and calling .save on each element, it took 40 seconds. Calling .saveAll on the same entire list completed in 2 seconds. Calling .saveAll with even 30k elements took 4 seconds. I made sure to truncate my table before performing each test. Even batching the .saveAll calls to sublists of 50 took 10 seconds with 30k.
The simple .saveAll with the entire list seems to be the fastest.
I tried to browse the Spring Data source code but this is the only thing I found of value. Here it seems .saveAll simply iterates over the entire Iterable and calls .save on each one like I was doing. So how is it that much faster? Is it doing some transactional batching internally?
Without having your code, I have to guess, I believe it has to do with the overhead of creating new transaction for each object saved in the case of save versus opening one transaction in the case of saveAll.
Notice the definition of save and saveAll they are both annotated with #Transactional. If your project is configured properly, which seems to be the case since entities are being saved to the database, that means a transaction will be created whenever one of these methods are called. if you are calling save in a loop that means a new transaction is being created each time you call save, but in the case of saveAll there is one call and therefor one transaction created regardless of the number of entities being saved.
I'm assuming that the test is not itself being run within a transaction, if it were to be run within a transaction then all calls to save will run within that transaction since the the default transaction propagation is Propagation.REQUIRED, that means if there is a transaction already open the calls will be run within it. If your planning to use spring data I strongly recommend that you read about transaction management in Spring.

A good strategy/solution for concurrency with hibernate's optimistic/pessimistic locking

Let's presume that we have an application "mail client" and a front-end for it.
If a user is typing a message or editing the subject or whatever, a rest call is made to update whatever the user was changing (e.g. the receivers) to keep the message in DRAFT. So a lot PUT's are happening to save the message. When closing the window, an update of every editable field happens at the same time. Hibernate can't handle this concurrency: Each of those calls retrieve the message, edit their own fields and try to save the message again, while the other call already changed it.
I know I can add a rest call to save all fields at the same time, but I was wondering if there is a cleaner solution, or a decent strategy to handle such cases (like for example only updating one field or some merge strategy if the object has already changed)
Thanks in advance!
The easiest solutions here would be to tweak the UI to either
Submit a single rest call during email submission that does all the tasks necessary.
Serialize the rest calls so they're chained rather than firing concurrently.
The concern I have here is that this will snowball at some point and become a bigger concurrency problem as more users are interacting with the application. Consider for the moment the potential number of concurrent rest calls your web infrastructure will have to support alone when you're faced with a 100, 500, 1000, or even 10000 or more concurrent users.
Does it really make sense to beef up the volume of servers to handle that load when the load itself is a product of a design flaw in the first place?
Hibernate is designed to handle locking through two mechanisms, optimistic and pessimistic.
Optimistic Way
Read the entity from the data store.
Cache a copy of the fields you're going to modify in temporary variables.
Modify the field or fields based on your PUT operation.
Attempt to merge the changes.
If save succeeds, you're done.
Should an OptimisticLockException occur, refresh the entity state from data store.
Compare cached values to the fields you must change.
If values differ, you can assert or throw an exception
If they don't differ, go back to 4.
The beautiful part of the optimistic approach is you avoid any form of deadlock happening, particularly if you're allowing multiple tables to be read and locked separately.
While you can use pessimistic lock options, optimistic locking is generally the best accepted way to handle concurrent operations as it has the least concurrency contention and performance impact.

Parallel updates to different entity properties

I'm using JDO to access Datastore entities. I'm currently running into issues because different processes access the same entities in parallel and I'm unsure how to go around solving this.
I have entities containing values and calculated values: (key, value1, value2, value3, calculated)
The calculation happens in a separate task queue.
The user can edit the values at any time.
If the values are updated, a new task is pushed to the queue that overwrite the old calculated value.
The problem I currently have is in the following scenario:
User creates entity
Task is started
User notices an error in his initial entry and quickly updates the entity
Task finishes based on the old data (from step 1) and overwrites the entire entity, also removing the newly entered values (from step 3)
User is not happy
So my questions:
Can I make the task fail on update in step 4? Wrapping the task in a transaction does not seem to solve this issue for all cases due to eventual consistency (or, quite possibly, my understanding of datastore transactions is just wrong)
Is using the low-level setProperty method the only way to update a single field of an entity and will this solve my problem?
If none of the above, what's the best way to deal with a use case like this
background:
At the moment, I don't mind trading performance for consistency. I will care about performance later.
This was my first AppEngine application, and because it was a learning process, it does not use some of the best practices. I'm well aware that, in hindsight, I should have thought longer and harder about my data schema. For instance, none of my entities use ancestor relationships where they would be appropriate. I come from a relational background and it shows.
I am planning a major refactoring, probably moving to Objectify, but in the meantime I have a few urgent issues that need to be solved ASAP. And I'd like to first fully understand the Datastore.
Obviously JDO comes with optimistic concurrency checking (should the user enable it) for transactions, which would prevent/reduce the chance of such things. Optimistic concurrency is equally applicable with relational datastores, so you likely know what it does.
Google's JDO plugin uses the low-level API setProperty() method obviously. The log even tells you what low level calls are made (in terms of PUT and GET). Moving to some other API will not on its own solve such problems.
Whenever you need to handle write conflicts in GAE, you almost always need transactions. However, it's not just as simple as "use a transaction":
First of all, make sure each logical unit of work can be defined in a transaction. There are limits to transactions; no queries without ancestors, only a certain number of entity groups can be accessed. You might find you need to do some extra work prior to the transaction starting (ie, lookup keys of entities that will participate in the transaction).
Make sure each unit of work is idempotent. This is critical. Some units of work are automatically idempotent, for example "set my email address to xyz". Some units of work are not automatically idempotent, for example "move $5 from account A to account B". You can make transactions idempotent by creating an entity before the transaction starts, then deleting the entity inside the transaction. Check for existence of the entity at the start of the transaction and simply return (completing the txn) if it's been deleted.
When you run a transaction, catch ConcurrentModificationException and retry the process in a loop. Now when any txn gets conflicted, it will simply retry until it succeeds.
The only bad thing about collisions here is that it slows the system down and wastes effort during retries. However, you will get at least one completed transaction per second (maybe a bit less if you have XG transactions) throughput.
Objectify4 handles the retries for you; just define your unit of work as a run() method and run it with ofy().transact(). Just make sure your work is idempotent.
The way I see it, you can either prevent the first task from updating the object because certain values have changed from when the task was first launched.
Or you can you embed the object's values within the task request so that the 2nd calc task will restore the object state with consistent value and calcuated members.

Database changes done by one instance of an application is not being picked up another instance using Hibernate

I have an application which can read/write changes to a database table. Another instance of the same application should be able to see the updated values in the database. i am using hibernate for this purpose. If I have 2 instances of the application running, and if i make changes to the db from one instance for the first time, the updated values can be seen from the second. But any further changes from the first instance is not reflected in the second. Please throw some light.
This seems to be a bug in your cache settings. By default, Hibernate assumes that it's the only one changing the database. This allows it to efficiently cache objects. If several parties can change tables in the DB, then you must switch off caching for those tables/instances.
You can use hibernate.connection.autocommit=true
This will make hibernate commit each SQL update to the database immediately and u should be able to see the changes from the other application.
HOWEVER, I would strongly discourage you from doing so. Like Aaron pointed out, you should only use one Hibernate SessionFactory with a database.
If you do need multiple applications to be in sync, think about using a shared cache,e.g. Gemstone.

Categories