Spring data save vs saveAll performance - java

I'm trying to understand why saveAll has better performance than save in the Spring Data repositories. I'm using CrudRepository which can be seen here.
To test I created and added 10k entities, which just have an id and a random string (for the benchmark I kept the string a constant), to a list. Iterating over my list and calling .save on each element, it took 40 seconds. Calling .saveAll on the same entire list completed in 2 seconds. Calling .saveAll with even 30k elements took 4 seconds. I made sure to truncate my table before performing each test. Even batching the .saveAll calls to sublists of 50 took 10 seconds with 30k.
The simple .saveAll with the entire list seems to be the fastest.
I tried to browse the Spring Data source code but this is the only thing I found of value. Here it seems .saveAll simply iterates over the entire Iterable and calls .save on each one like I was doing. So how is it that much faster? Is it doing some transactional batching internally?

Without having your code, I have to guess, I believe it has to do with the overhead of creating new transaction for each object saved in the case of save versus opening one transaction in the case of saveAll.
Notice the definition of save and saveAll they are both annotated with #Transactional. If your project is configured properly, which seems to be the case since entities are being saved to the database, that means a transaction will be created whenever one of these methods are called. if you are calling save in a loop that means a new transaction is being created each time you call save, but in the case of saveAll there is one call and therefor one transaction created regardless of the number of entities being saved.
I'm assuming that the test is not itself being run within a transaction, if it were to be run within a transaction then all calls to save will run within that transaction since the the default transaction propagation is Propagation.REQUIRED, that means if there is a transaction already open the calls will be run within it. If your planning to use spring data I strongly recommend that you read about transaction management in Spring.

Related

How does Spring JPA with hibernate manage concurrent updates

I am trying to write an app that can run active-active, with both instances using the same DB. I was worried that in updating a entity (calling repo.findById(), entity.setX(), repo.save(entity)) there was a potential race condition where updates could overwrite each other if there was a large enough delay between find and save.
To test it I made a method that loads the entity, waits 10 seconds, then adds something to a list attribute and saves it. Calling that twice I expected the second save to overwrite the first one, but surprisingly both updates were persisted.
Whilst this is what I wanted I was wondering if anyone knew how/why spring did this, because I want to make sure it will do this every time? I understand it uses optimistic locking by default, but you need the #version annotation (which I don't have). Will this also work if the updates come from separate apps, or did it only work because both of them were from the same application?

How to force Hibernate read external database changes

I have a common database that is used by two different applications (different technologies, different deployment servers, they just use the same database).
Let's call them application #1 and application #2.
Suppose we have the following scenario:
the database contains a table called items (doesn't matter its content)
application #2 is developed in Spring Boot and it is mainly used just for reading data from the database
application #2 retrieves an item from the database
application #1 changes that item
application #2 retrieves the same item again, but the changes are not visible
What I understood by reading a lot of articles:
when application #2 retrieves the item, Hibernate stores it in the first level cache
the changes that are done to the item by application #1 are external changes and Hibernate is unaware of them, and thus, the cache is not updated (same happens when you do a manual change in the database)
you cannot disable Hibernate's first level cache.
So, my question is, can you force Hibernate into refreshing the entities every time they are read (or make it go into the database) without explicitly calling em.refresh(entity)? The problem is that the business logic module from application1 is used as a dependency in application1 so I can only call service methods (i.e. I don't have access to the entityManager or session references).
Hibernate L1 cache is roughly equivalent to a DB transaction when you run in a repeatable-read level isolation. Basically, if you read/write some data, the next time you query in the context of the same session, you will get the same data. Further, within the same process, sessions run independent of each other, which means 2 session are looking at different data in the L1 cache.
If you use repeatable read or less, then you shouldn't really be concerned about the L1 cache, as you might run into this scenario regardless of the ORM (or no ORM).
I think you only need to think about the L2 cache here. The L2 cache is what stores data and assumes only hibernate is accessing the DB, which means that if some change happens in the DB, hibernate might not know about it. If you just disable the L2 cache, you are sorted.
Further reading - Short description of hibernate cache levels
Well, if you cannot access hibernate session you are left with nothing. Any operations you want to do requires session access. For instance you can remove entity from cache after reading it like this:
session.evict(entity);
or this
session.clear();
but first and foremost you need a session. Since you calling only services you need to create service endpoints clearing session cache after serving them or modify existing endpoints to do that.
You can try to use StatelessSession, but you will lose cascading and other things.
https://docs.jboss.org/hibernate/orm/current/userguide/html_single/Hibernate_User_Guide.html#_statelesssession
https://stackoverflow.com/a/48978736/3405171
You can force to start a new transaction, so in this manner hibernate will not be read from the cache and it will redo the read from the db.
You can annotate your function in this manner
#Transactional(readOnly = true, propagation = Propagation.REQUIRES_NEW)
Requesting a new transaction, the system will generation a new hibernate session, so the data will not be in the cache.

JPA not taking update into account within single Transaction

Within a transactional service method, I loop on querying a database to get the first 10 of entity A with a criteria.
I update each A entity from the list so that they don't match the criteria anymore, and call a flush() to make sure changes are made.
The second call to the query within the loop returns the exact same set of A entities.
Why isn't the flushed change on the entities taken into account?
I'm using JPA 2.0 with Hibernate 4.1.7.
The same process with Hibernate only seems to be working.
I've turned off the second level cache and the query cache, to no avail.
I'm using a rather plain configuration, JpaTransactionManager, Spring over JPA over Hibernate. Main method annotated with #Transactional.
The code would be something like this:
do {
modelList = exportContributionDao.getContributionListToExport(10);
for (M m : modelList) {
//export m to a file
m. (false);
super.flush();
}
} while (modelList.size() == 10);
With each iteration of the loop, the Dao method always return the same 10 results, JPA not taking into account the updated 'isToBeExported' attribute.
I'm not trying to solve a problem, rather I want to understand why JPA is not behaving as expected here.
I expect this to be a 'classic' problem.
No doubt it would be solved if the Transaction would be commited at each iteration.
ASAIK, the cache L1, i.e. the Session with Hibernate as the underlying JPA provider, should be up to date, and the second iteration query should take into account the updated Entities, even though the changes haven't been persisted yet.
So my question is: why isn't it the case? Misconfiguration or know behavior?
Flush does not necessarily commit the changes on the database. What do you want to achieve? From what I understand you do s.th. like:
Loop about entities
Within the loop, change the entity
Call 'flush' on the entity
Read the entity back again
You wonder why the data did not change on the database?
If this is correct, why do you re-read the changes and just don't work with the elements? After leaving the transaction, the changes will be automagically made persistent.
This should definitely be working.
This is a configuration problem on our part.
Apologies for the question, it was pretty hard to spot the reason on our part, but I hope the answer will at least be useful to some:
JPA definitely takes into account changes made on entities within a single transaction.

Parallel updates to different entity properties

I'm using JDO to access Datastore entities. I'm currently running into issues because different processes access the same entities in parallel and I'm unsure how to go around solving this.
I have entities containing values and calculated values: (key, value1, value2, value3, calculated)
The calculation happens in a separate task queue.
The user can edit the values at any time.
If the values are updated, a new task is pushed to the queue that overwrite the old calculated value.
The problem I currently have is in the following scenario:
User creates entity
Task is started
User notices an error in his initial entry and quickly updates the entity
Task finishes based on the old data (from step 1) and overwrites the entire entity, also removing the newly entered values (from step 3)
User is not happy
So my questions:
Can I make the task fail on update in step 4? Wrapping the task in a transaction does not seem to solve this issue for all cases due to eventual consistency (or, quite possibly, my understanding of datastore transactions is just wrong)
Is using the low-level setProperty method the only way to update a single field of an entity and will this solve my problem?
If none of the above, what's the best way to deal with a use case like this
background:
At the moment, I don't mind trading performance for consistency. I will care about performance later.
This was my first AppEngine application, and because it was a learning process, it does not use some of the best practices. I'm well aware that, in hindsight, I should have thought longer and harder about my data schema. For instance, none of my entities use ancestor relationships where they would be appropriate. I come from a relational background and it shows.
I am planning a major refactoring, probably moving to Objectify, but in the meantime I have a few urgent issues that need to be solved ASAP. And I'd like to first fully understand the Datastore.
Obviously JDO comes with optimistic concurrency checking (should the user enable it) for transactions, which would prevent/reduce the chance of such things. Optimistic concurrency is equally applicable with relational datastores, so you likely know what it does.
Google's JDO plugin uses the low-level API setProperty() method obviously. The log even tells you what low level calls are made (in terms of PUT and GET). Moving to some other API will not on its own solve such problems.
Whenever you need to handle write conflicts in GAE, you almost always need transactions. However, it's not just as simple as "use a transaction":
First of all, make sure each logical unit of work can be defined in a transaction. There are limits to transactions; no queries without ancestors, only a certain number of entity groups can be accessed. You might find you need to do some extra work prior to the transaction starting (ie, lookup keys of entities that will participate in the transaction).
Make sure each unit of work is idempotent. This is critical. Some units of work are automatically idempotent, for example "set my email address to xyz". Some units of work are not automatically idempotent, for example "move $5 from account A to account B". You can make transactions idempotent by creating an entity before the transaction starts, then deleting the entity inside the transaction. Check for existence of the entity at the start of the transaction and simply return (completing the txn) if it's been deleted.
When you run a transaction, catch ConcurrentModificationException and retry the process in a loop. Now when any txn gets conflicted, it will simply retry until it succeeds.
The only bad thing about collisions here is that it slows the system down and wastes effort during retries. However, you will get at least one completed transaction per second (maybe a bit less if you have XG transactions) throughput.
Objectify4 handles the retries for you; just define your unit of work as a run() method and run it with ofy().transact(). Just make sure your work is idempotent.
The way I see it, you can either prevent the first task from updating the object because certain values have changed from when the task was first launched.
Or you can you embed the object's values within the task request so that the 2nd calc task will restore the object state with consistent value and calcuated members.

What happens if during an SQLite transaction in android, a second thread attempts a transaction on the same table?

I do not really understand the proper usage pattern of SQLite transactions in Android.
Lets say method A starts a transaction. During the execution of the transaction another method B will attempt to do a transaction on the same table (possibly on the same row).
What will happen?
Will the method B "wait" until method A has finished the transaction, and then execute the transaction?
Or will the transction of method B fail?
How do I properly handle that?
To be more precise:
Method A should query (inside a transaction) if a certain row already exists, and if so, return the row, if that certain row does not exist yet, it should in insert that row, and return the newly inserted row.
Method B should query (inside a transaction) if a certain row already exists, update that row, if the row does not exist yet, Method B should insert that row.
The idea is to make sure, that, no matter which method first executes, to have either have inserted or updated that particular row
I am not even sure, if these can be achieved on the level of SQLite transactions, or do I have to synchronise methods in Java?
If so, can any body post an example, how such a synchronisation has to be coded in Java?
If 'A' modifies a table in transaction, 'B' should be blocked (inside SQLite itself) until 'A' is committed. It's pretty similar to java synchronization except it's dealing with persistent data. Some of details depend on isolation levels
http://en.wikipedia.org/wiki/Isolation_(database_systems)
Depends on what you mean by "transaction".
If you mean transactions established by SQLiteDatabase.beginTransaction or beginTransactionNonExclusive, then the rules for those methods apply. I would say that you should do beginTransaction for Method A, run it until finish, and then finish the transaction. If Method B attempts to do a transaction, it should encounter an Exception. Transactions allow you to do multiple operations and back them all out if they don't work.
I can't tell if separate threads will work or not, from the little you've posted. You might want to consider using an IntentService to handle your transactions; that way, one transaction can't start until the previous one ends.
The transactions are established by "beginTransaction", so from what I understand, the transactions are exclusive.
#sseand: Interestingly, the german version of mentioned wikipedia aricles is somewhat different from the english version. The german version states, that an application with isolation level of "serializable" must be able to handle serialization errors, and re-run the respective transaction. So I would expect, that there is a "SQLiteSerializationException" for that. But I cannot see such an exception in the API. So, does the SQLite API for Android automatically handle that?

Categories