We are using spring/JPA (not hibernate) to handle the deletion of all data in a table. However, the command would hit a java heap out of memory error. After some googling, it seems on delete that JPA pulls the entities first into memory before it deletes. My question is: Does JPA/spring always pull entity data into its cache? And how do we avoid out of memory errors when the returned data set is large? My current fix is to execute the DELETE command via a native query.
Thanks!
You shouldn't have to resort to native SQL.
I don't think that CrudRepository#deleteAll() or JpaRepository#deleteAllInBatch() pulls anything into local cache, that would be seriously inefficient.
Some repository implementations might take a list of ids and load them, so that the remove operation can iterate over the loaded entities and relationships to ensure they are all cleared up, and that any events for affected entities are properly fired. Some objects have validations and other logic that should prevent some operations if they are in a certain state, state that cannot be checked unless the entities are loaded.
Try a bulk JPQL query for deletes if you don't want objects loaded. Similar to SQL but based on the entities and their relationships. Note that it does not respect the cascade operations set on the entity model mappings as it cannot know what is referenced and so doesn't try.
Related
I have a set of very heavy queries whose result I want to cache into an external Cache implementation (cache the whole object list not just ids like in Hibernate's 2nd level cache).
The issue is that due to the lazy loading of several collections in the root object, once the session that queried the results is done, the objects become de-attached and the next request that tries to use the object might throw a LazyLoading exception.
Environment: Spring 4, Hibernate 4.3, Ehcache.
Is there any way to be able to re-attach the object to a new session without having it modify the underlaying DB (like with merge and update)?
There is no way to reattach a detached entity to a session just to load a lazy-initialized collection.
In order to get an updated copy of a persistent object without overwriting the session / calling merge, it's necessary to call either EntityManager.find() or do a query.
This is because the main goal of the session is to keep the database and the objects in memory in sync. Due to this there is no API for attaching new state without persisting it, as this is not in line with the main functionality of the session.
The 2nd level cache, if configured together with the query cache can solve the problem of caching the entities, queries and their associations in a much better way than any custom solution.
Everything can get cached to the point that no query hits the database. The two caches really go together, check this blog post for further info.
This seems like it would come up often, but I've Googled to no avail.
Suppose you have a Hibernate entity User. You have one User in your DB with id 1.
You have two threads running, A and B. They do the following:
A gets user 1 and closes its Session
B gets user 1 and deletes it
A changes a field on user 1
A gets a new Session and merges user 1
All my testing indicates that the merge attempts to find user 1 in the DB (it can't, obviously), so it inserts a new user with id 2.
My expectation, on the other hand, would be that Hibernate would see that the user being merged was not new (because it has an ID). It would try to find the user in the DB, which would fail, so it would not attempt an insert or an update. Ideally it would throw some kind of concurrency exception.
Note that I am using optimistic locking through #Version, and that does not help matters.
So, questions:
Is my observed Hibernate behaviour the intended behaviour?
If so, is it the same behaviour when calling merge on a JPA EntityManager instead of a Hibernate Session?
If the answer to 2. is yes, why is nobody complaining about it?
Please see the text from hibernate documentation below.
Copy the state of the given object onto the persistent object with the same identifier. If there is no persistent instance currently associated with the session, it will be loaded. Return the persistent instance. If the given instance is unsaved, save a copy of and return it as a newly persistent instance.
It clearly stated that copy the state(data) of object in database. if object is not there then save a copy of that data. When we say save a copy hibernate always create a record with new identifier.
Hibernate merge function works something like as follows.
It checks the status(attached or detached to the session) of entity and found it detached.
Then it tries to load the entity with identifier but not found in database.
As entity is not found then it treat that entity as transient.
Transient entity always create a new database record with new identifier.
Locking is always applied to attached entities. If entity is detached then hibernate will always load it and version value gets updated.
Locking is used to control concurrency problems. It is not the concurrency issue.
I've been looking at JSR-220, from which Session#merge claims to get its semantics. The JSR is sadly ambiguous, I have found.
It does say:
Optimistic locking is a technique that is used to insure that updates
to the database data corresponding to the state of an entity are made
only when no intervening transaction has updated that data since the
entity state was read.
If you take "updates" to include general mutation of the database data, including deletes, and not just a SQL UPDATE, which I do, I think you can make an argument that the observed behaviour is not compliant with optimistic locking.
Many people agree, given the comments on my question and the subsequent discovery of this bug.
From a purely practical point of view, the behaviour, compliant or not, could lead to quite a few bugs, because it is contrary to many developers' expectations. There does not seem to be an easy fix for it. In fact, Spring Data JPA seems to ignore this issue completely by blindly using EM#merge. Maybe other JPA providers handle this differently, but with Hibernate this could cause issues.
I'm actually working around this by using Session#update currently. It's really ugly, and requires code to handle the case when you try to update an entity that is detached, and there's a managed copy of it already. But, it won't lead to spurious inserts either.
1.Is my observed Hibernate behaviour the intended behaviour?
The behavior is correct. You just trying to do operations that are not protected against concurrent data modification :) If you have to split the operation into two sessions. Just find the object for update again and check if it is still there, throw exception if not. If there is one then lock it by using em.(class, primary key, LockModeType); or using #Version or #Entity(optimisticLock=OptimisticLockType.ALL/DIRTY/VERSION) to protect the object till the end of the transaction.
2.If so, is it the same behaviour when calling merge on a JPA EntityManager instead of a Hibernate Session?
Probably: yes
3.If the answer to 2. is yes, why is nobody complaining about it?
Because if you protect your operations using pessimistic or optimistic locking the problem will disappear:)
The problem you are trying to solve is called: Non-repeatable read
I am using hibernate criteria API to retrieve data. These data will just viewed by user. User can not modify these data. So, Is there any benefit by using readOnly? Can you suggest the pros & cons? Is there any other measure I need to consider?
Read-only entities
Hibernate is tracking all objects loaded within the session to find the modifications and persist all the changes when you flush the session. If you load the entity as read-only, you instruct Hibernate not to trace that entity for changes. In that way, you will get some performance increase.
However, the object will stay in session cache. If the cache is too big, it becomes a big pefrormance issue, as well as you risk running out of memory. If you read many objects, it's good to evict them.
If the performance of Hibernate is really an issue, than switching to pure JDBC is a better option. I never use Hibernate for loading large amounts of data (such as for reports, or batch processing). For displaying lists I load only the fields I need, not the whole entities (if you read only chosen fields, and not whole entities, they are always read-only).
So the answer is, yes, it will make Hibernate a bit faster, but there are other ways of gaining even more performance.
Using read-only entities is more like a expression of intend. By using read only hibernate will not check if a entity has changed even if it has changed. This is different to entities marked as immutable since those are guarded against change. Hibernate will error out if it detects a immutable entity object's property has changed but not so for read-only entities.
So making an entity read only just informs hibernate that you do not want changes to be persisted. Also this may not be true in case of associated collections.
The performance gain is minimal unless you have special requirements or do a special expensive dirtiness check.
So it is not about performance. It is a safety measure that allows you to protect your database from unintended changes to objects.
For example you have a Location entity describing a geographical location. Now you have a person and assign a location to it. When loading the location using the persons association, you can make location read-only and even if your coworker (or yourself) accidently change the location by accident. The change will not be stored but changes to the Person entity will still be. (It may be best to mark location as immutable but there are those rare cases this is not sufficient.)
I've got two lists of entities: One that is the current state of the rows in the DB, the other is the changes that were made to the list. How do I audit the rows that were deleted, added, and the changes made to the entities? My audit table is used by all the entities.
Entity listeners and Callback methods look like a perfect fit, until you notice the sentence that says: A callback method must not invoke EntityManager or Query methods! Because of this restriction, I can collect audits, but I can't persist them to the database :(
My solution has been a complex algorithm to discover the audits.
If the entity is in the change list and has no key, it's an add
If the entity is in the db but not the changes list, it's a delete
If the entity is in both list, recursively compare their fields to find differences to audit (if any)
I collect these and insert them into the DB in the same transaction I merge the changes list. But I hate the fact that I'm writing this by hand. It seems like JPA should be able to do this logic for me.
One solution we've come up with is to use an Entity Listener that posts the audits to a JMS queue. The queue then inserts the audits into the database. But I don't like this solution because I think setting up a JMS queue is a pain. It's currently the best solution we've got though.
I'm using eclipselink (ideally, that's not relevant) and have found these two things that look helpful but the JMS queue is a better solution than them:
http://wiki.eclipse.org/EclipseLink/FAQ/JPA#How_to_access_what_changed_in_an_object_or_transaction.3F This looks really difficult to use. You search for the fields by a string. So if I refactor my entity and forget to update this, it'll throw a runtime error.
http://wiki.eclipse.org/EclipseLink/Examples/JPA/History This isn't consistent with the way we currently audit. It expects a special entity_history table.
The EntityListener looks like a good approach since you are able to collect the audit information.
Have you tried persisting the information in a different transaction than the one persisting the changes? perhaps obtaining a reference to a Stateless EJB (assuming you are using EJBs) and using methods marked with #TransactionAttribute(TransactionAttributeType.REQUIRES_NEW). In this way the transaction persisting the original changes is put on hold while the transaction of the audit completes. Note that you will not be able to access the updated information in this separate audit transaction, since the original one has not committed yet.
I am trying to understand how JPA works. From what I know, if you persist an Entity, that object will remain in the memory until the application is closed. This means, that when I look for a previously persisted entity, there will be no query made on the database. Assuming that no insert, update or delete is made, if the application runs long enough, all the information in it might become persistent. Does this mean that at some point, I will no longer need the database?
Edit
My problem is not with the database. I am sure that the database can not be modified from outside the application. I am managing transactions by myself, so the data gets stored in the database as soon as I commit. My question is: What happens with the entities after I commit? Are they kept in the memory and act like a cache? If so, how long are they kept there? After I commit a persist, I make a select query. This select should return the object I persisted before. Will that object be brought from memory, or will the application query the database?
Not really. Think about it.
Your application probably isn't the only thing that will use the database. If an entity was persisted once and stored in memory, how can you be sure that, let's say, one hour later, it won't be changed by some other means? If that happens, you will have stale data that can harm logic of your application.
Storing data in memory and hoping that everything will be alright won't bring any benefits. That's why data stored in database is your primary source of information, and you should query it every time, unless you are absolutely sure that a subset of data won't change.
When you persist an entity an entity this will add it to the persistence context which acts like a first level cache (this is in-memory). When the actual persisting happens depends on whether you use container managed transactions or deal with transactions yourself. The entity instance will live in memory as long as the transaction is not commited, and when it is it will be persisted to the database or XML etc.
JPA can't work with only the persistence context (L1 cache) or the explicit cache (L2 cache). It always needs to be combined with a datasource, and this datasource typically points to a database that persists to stable storage.
So, the entity is in memory only as long as the transaction (which is required for JPA persist operations) isn't committed. After that it's send to the datasource.
If the transaction manager is transaction scoped (the 'normal' case) then the L1 cache (the persistence context) is closed and the entities do not longer exist there. If the L1 cache somehow bothers you, you can manage it a bit explicitly. There are operations to clear it and you could separate your read operations (which don't need transactions) from write operations. If there's no transaction active when reading, there's no persistence context, an entity becomes never attached and is thus never put into this L1 cache.
The L2 cache however is not cleared when the transaction commits and entities inside it remain available for the entire application. This L2 cache must be explicitly configured and you as an application developer must indicate which entities should be cached in it. Via vendor specific mechanisms (e.g. JBoss Cache, Infinispan) you can put a max on the number of entities being cached and set/define so-called eviction policies.
Of course, nothing prevents you from letting the datasource point to an in-memmory embedded DB, but this is outside the knowledge of JPA.
Persistence means in short terms: you can shut down your app, and the data is not lost.
To achieve that you need a database or some sort of saving data in a way that it's not lost when you shut down the app.
To "persist" an entity means to actually save it in the data base. Sure, JPA maintains some entity information in memory in the persistence context (and this is highly dependent on configuration and programming practices), but at certain point information will be stored in the data base - for instance, when a transaction commits, or likely (but not necessarily) after flush() or merge() operations.
If you want to keep your entities after committing and for a select query, you need to use the query cache. Just Google around on that term and it should be clear to you.