What is the difference between CacheStoreMode USE and REFRESH

What is the difference between CacheStoreMode USE and REFRESH - java

The javadocs for CacheStoreMode differentiate in a point I cannot really grasp:
The javadocs for the USE mode:
Insert/update entity data into cache when read from database and when
committed into database: this is the default behavior. Does not force
refresh of already cached items when reading from database.
The javadocs for the REFRESH mode differ in the last sentence:
Forces refresh of cache for items read from database.
When an existing cached entity instance is updated when reading from database, this would typically involve overwriting the existing data. So what is the difference between forcing and not forcing a refresh in this context?
Thank you.

As far as I know:
CacheStoreMode.USE should be used if a given EntityManagerFactory has an exclusive write-access to the underlying database thus it implies that there is no chance for an entity instance stored in the shared cache to be stale.
CacheStoreMode.REFRESH should be enabled if the underlying database might be accessed by multiple commiters (i.e. EntityManagerFactory instances, applications in different JVMs, external JDBC sources) thus an entity instance stored in the shared cache may become stale.
Since CacheStoreMode.USE does not force refresh of already cached entities when reading from the database, CacheStoreMode.REFRESH does.

I think it will make difference where the most recent updated data from the database is needed, where it gets updated from the back-end, not through the application.
In my application, its the same case (but not using any cache strategy), where we have to load all data each time; as it gets modified implicitly through messaging from the external system, else we will be dealing with the stale data.
There might be few cases where scheduled jobs, external systems etc. update database directly there CacheStoreMode.REFRESH is appropriate; while for normal case CacheStoreMode.USE.
[Other than this I can't recollect any other cases, where it might make difference between these two modes]
Edit: The documentation seems confusing & too short to explain properly. Also, in case of native queries, bulk updates etc. items are skipped & aren't cached.
CacheStoreMode.USE: Only new items are put into the cache, not for already cached ones.
CacheStoreMode.REFRESH: New items are put into the cache & already existing cached items are refreshed.

Related

Is it a bad idea to keep an EntityManager open for the duration of the application lifetime?

I am writing an application in Java SE 8 and have recently migrated the database system from raw JDBC code to JPA. The interface itself is so much simpler, but I am running into an issue with the way I have designed my code which does not work well with JPA and I am unsure of how to proceed.
The primary issue I am having is that I cannot store references to my entities in code for any period of time anymore, because they immediately become out-of-date. I used to have a central persistence context where the one "true" instance of all my entities were always stored in code, and changes made to them would always be reflected everywhere because there were no duplicate instances. I realize this is not smart design when it comes to memory efficiency, but that allowed me to, for instance, implement the observer pattern and guarantee that any entity updates would be immediately visible in GUIs. But now, as soon as I load an entity from the database using JPA and close the EntityManager (as I have read so often that you must do), that instance merely represents a snapshot in time from when it was loaded and my GUIs will be waiting for updates from a dead object. Loading that entity from elsewhere in the code and making a change will do nothing, as it is a different instance altogether, with an empty list of subscribers (transient). There are a lot more cases in my code where I attempt to hold a reference to an entity for whatever purpose, and a lot of them rely on those entities being up-to-date.
I know that EntityManager is intended to be a short-lived object, but now I am thinking that it maybe wouldn't be such a bad idea after all to keep an EntityManager open for the lifetime of my program to replace that construct that I had in my old code. I quite frankly don't understand what the point of closing EntityManager so quickly is - isn't it beneficial to have your entities managed over a longer period of time? When I was first reading about how changes to managed entities are detected and persisted automatically, I hoped that that would allow me to completely detach my business logic from my persistence layer, and trust that all my changes were being saved. It was rather disillusioning to discover that in order for those entities to be managed in the first place, I would have to leave the EntityManager open for the duration of that business logic. And that would require them to be scoped higher than the method they are created in, so I could close them later. But all the literature implores the use of short-lived, low-scoped EntityManagers, which just seems like a direct contradiction.
I am somewhat at a loss for how to proceed. I would love to make full use of JPA and all of its extremely useful features, but I feel like I might be missing the point of EntityManager being short-lived. It seems like it would be so much more convenient long-lived. Can anyone give me some guidance?

Your central 'cache' with a single instance of data is a common idea, but it is difficult to manage. Some orm/JPA providers have caching built in and maintain something similar (check out EclipeLink's shared cache) but they usually have complex mechanisms that allow for limiting and managing what could be endless amounts of data that can quickly become stale. EclipseLink has tie ins to the database to get notifications when data changes, and can be configured for cache coordination when being run in different servers. Without such capabilities, your cache will be stale - and worse, your cache will have great difficulty maintaining transactional isolation. Any change to those cached objects is immediately visible to all processes, regardless of the transaction going through to the database or rolling back. Use of JPA is meant to guarantee that you only see committed data (excluding the changes you've made in the current transaction/unit of work).
To answer your specific question about keeping an EM open as generally to JPA providers: EntityManagers keep hooks to the entities read in through them so that they can track and manage all changes made to them. This can lead to very large amounts of data being held - check the forum for memory leak questions, as keeping EMs open for an extended period is the cause of quite a few. You gain object identity, but have to realize it comes at the cost of tracking everything read in through them - so you will likely have to occasionally clear the memory (em.clear()) at some key points, or find provider specific mechanics to dereference what it might be holding onto so GC can do its thing.
Other draw backs are that the EntityManager then itself becomes very large and difficult to merge changes into. Depending on how you merge changes into your app, you'll need a way to get those changes into your database. Having JPA go through very large sets of entities that builds over time to find changes to a small dataset is very inefficient, and you'll still have to find ways to refresh these entities if change are done through other entityManagers or applications.

How to force Hibernate read external database changes

I have a common database that is used by two different applications (different technologies, different deployment servers, they just use the same database).
Let's call them application #1 and application #2.
Suppose we have the following scenario:
the database contains a table called items (doesn't matter its content)
application #2 is developed in Spring Boot and it is mainly used just for reading data from the database
application #2 retrieves an item from the database
application #1 changes that item
application #2 retrieves the same item again, but the changes are not visible
What I understood by reading a lot of articles:
when application #2 retrieves the item, Hibernate stores it in the first level cache
the changes that are done to the item by application #1 are external changes and Hibernate is unaware of them, and thus, the cache is not updated (same happens when you do a manual change in the database)
you cannot disable Hibernate's first level cache.
So, my question is, can you force Hibernate into refreshing the entities every time they are read (or make it go into the database) without explicitly calling em.refresh(entity)? The problem is that the business logic module from application1 is used as a dependency in application1 so I can only call service methods (i.e. I don't have access to the entityManager or session references).

Hibernate L1 cache is roughly equivalent to a DB transaction when you run in a repeatable-read level isolation. Basically, if you read/write some data, the next time you query in the context of the same session, you will get the same data. Further, within the same process, sessions run independent of each other, which means 2 session are looking at different data in the L1 cache.
If you use repeatable read or less, then you shouldn't really be concerned about the L1 cache, as you might run into this scenario regardless of the ORM (or no ORM).
I think you only need to think about the L2 cache here. The L2 cache is what stores data and assumes only hibernate is accessing the DB, which means that if some change happens in the DB, hibernate might not know about it. If you just disable the L2 cache, you are sorted.
Further reading - Short description of hibernate cache levels

Well, if you cannot access hibernate session you are left with nothing. Any operations you want to do requires session access. For instance you can remove entity from cache after reading it like this:
session.evict(entity);
or this
session.clear();
but first and foremost you need a session. Since you calling only services you need to create service endpoints clearing session cache after serving them or modify existing endpoints to do that.

You can try to use StatelessSession, but you will lose cascading and other things.
https://docs.jboss.org/hibernate/orm/current/userguide/html_single/Hibernate_User_Guide.html#_statelesssession
https://stackoverflow.com/a/48978736/3405171

You can force to start a new transaction, so in this manner hibernate will not be read from the cache and it will redo the read from the db.
You can annotate your function in this manner
#Transactional(readOnly = true, propagation = Propagation.REQUIRES_NEW)
Requesting a new transaction, the system will generation a new hibernate session, so the data will not be in the cache.

How long are Entities persited?

I am trying to understand how JPA works. From what I know, if you persist an Entity, that object will remain in the memory until the application is closed. This means, that when I look for a previously persisted entity, there will be no query made on the database. Assuming that no insert, update or delete is made, if the application runs long enough, all the information in it might become persistent. Does this mean that at some point, I will no longer need the database?
Edit
My problem is not with the database. I am sure that the database can not be modified from outside the application. I am managing transactions by myself, so the data gets stored in the database as soon as I commit. My question is: What happens with the entities after I commit? Are they kept in the memory and act like a cache? If so, how long are they kept there? After I commit a persist, I make a select query. This select should return the object I persisted before. Will that object be brought from memory, or will the application query the database?

Not really. Think about it.
Your application probably isn't the only thing that will use the database. If an entity was persisted once and stored in memory, how can you be sure that, let's say, one hour later, it won't be changed by some other means? If that happens, you will have stale data that can harm logic of your application.
Storing data in memory and hoping that everything will be alright won't bring any benefits. That's why data stored in database is your primary source of information, and you should query it every time, unless you are absolutely sure that a subset of data won't change.

When you persist an entity an entity this will add it to the persistence context which acts like a first level cache (this is in-memory). When the actual persisting happens depends on whether you use container managed transactions or deal with transactions yourself. The entity instance will live in memory as long as the transaction is not commited, and when it is it will be persisted to the database or XML etc.

JPA can't work with only the persistence context (L1 cache) or the explicit cache (L2 cache). It always needs to be combined with a datasource, and this datasource typically points to a database that persists to stable storage.
So, the entity is in memory only as long as the transaction (which is required for JPA persist operations) isn't committed. After that it's send to the datasource.
If the transaction manager is transaction scoped (the 'normal' case) then the L1 cache (the persistence context) is closed and the entities do not longer exist there. If the L1 cache somehow bothers you, you can manage it a bit explicitly. There are operations to clear it and you could separate your read operations (which don't need transactions) from write operations. If there's no transaction active when reading, there's no persistence context, an entity becomes never attached and is thus never put into this L1 cache.
The L2 cache however is not cleared when the transaction commits and entities inside it remain available for the entire application. This L2 cache must be explicitly configured and you as an application developer must indicate which entities should be cached in it. Via vendor specific mechanisms (e.g. JBoss Cache, Infinispan) you can put a max on the number of entities being cached and set/define so-called eviction policies.
Of course, nothing prevents you from letting the datasource point to an in-memmory embedded DB, but this is outside the knowledge of JPA.

Persistence means in short terms: you can shut down your app, and the data is not lost.
To achieve that you need a database or some sort of saving data in a way that it's not lost when you shut down the app.

To "persist" an entity means to actually save it in the data base. Sure, JPA maintains some entity information in memory in the persistence context (and this is highly dependent on configuration and programming practices), but at certain point information will be stored in the data base - for instance, when a transaction commits, or likely (but not necessarily) after flush() or merge() operations.

If you want to keep your entities after committing and for a select query, you need to use the query cache. Just Google around on that term and it should be clear to you.

Database changes done by one instance of an application is not being picked up another instance using Hibernate

I have an application which can read/write changes to a database table. Another instance of the same application should be able to see the updated values in the database. i am using hibernate for this purpose. If I have 2 instances of the application running, and if i make changes to the db from one instance for the first time, the updated values can be seen from the second. But any further changes from the first instance is not reflected in the second. Please throw some light.

This seems to be a bug in your cache settings. By default, Hibernate assumes that it's the only one changing the database. This allows it to efficiently cache objects. If several parties can change tables in the DB, then you must switch off caching for those tables/instances.

You can use hibernate.connection.autocommit=true
This will make hibernate commit each SQL update to the database immediately and u should be able to see the changes from the other application.
HOWEVER, I would strongly discourage you from doing so. Like Aaron pointed out, you should only use one Hibernate SessionFactory with a database.
If you do need multiple applications to be in sync, think about using a shared cache,e.g. Gemstone.

Hibernate query cache automatically refreshed on external update?

I'm creating a service that has read-only access to the database. I have a query cache and a second level cache enabled (READ_ONLY mode) in Hibernate to speed up the service, as the tables being accessed change rarely.
My question is, if someone goes into the DB and changes the tables manually (i.e. outside of Hibernate), does the cache recognize automatically that it needs to be cleared? Is there a time limit on the cache?

Nope, the cache isn't going to scan the database for you to magically update itself when the underlying data changes. Changes that do not come through the L2 cache will not appear in it. How long it takes to time out etc. depends on your provider and whatever the default settings are. It looks like the default ehcache.xml is for 2 minutes.

If you don't go through Hibernate APIs to make your changes, the second-level-cache won't get notified and the changes won't be visible. The usual way to deal with this situation is to evict the corresponding objects from the second-level-cache programatically to force a refresh. The SessionFactory provides methods allowing to do this. From the section 19.3. Managing the caches of the documentation:
For the second-level cache, there are
methods defined on SessionFactory
for evicting the cached state of an
instance, entire class, collection
instance or entire collection role.
sessionFactory.evict(Cat.class, catId); //evict a particular Cat
sessionFactory.evict(Cat.class); //evict all Cats
sessionFactory.evictCollection("Cat.kittens", catId); //evict a particular
//collection of kittens
sessionFactory.evictCollection("Cat.kittens"); //evict all kitten collections

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.