hibernate second level cache with Redis -will it improve performance?

hibernate second level cache with Redis -will it improve performance? - java

I am currently developing an application using Spring MVC4 and hibernate 4. I have implemented hibernate second level cache for performance improvement. If I use Redis which is an in-memory data structure store, used as a database, cache etc, the performance will increase but will it be a drastic change?

Drastic differences you may expect if you cache what is good to be cached and avoid caching data that should not be cached at all. Like beauty is in the eye of the beholder the same is with the performance. Here are several aspects you should have in mind when using hibernate AS second level cache provider:
No Custom serialization - Memory intensive
If you use second level caching, you would not be able to use fast serialization frameworks such as Kryo and will have to stick to java serializable which sucks.
On top of this for each entity type you will have a separate region and within each region, you will have an entry for each key of each entity.
In terms of memory efficiency, this is inefficient.
Lacks the ability to store and distribute rich objects
Most of the modern caches also present computing grid functionality having your objects fragmented into many small pieces decrease your ability to execute distributed tasks with guaranteed data co-location. That depends a little bit on the Grid provider, but for many would be a limitation.
Sub optimal performance
Depending on how much performance you need and what type of application you are having using hibernate second level cache might be a good or a bad choice. Good in terms that it is plug and play...." kind of..." bad because you will never squeeze the performance you would have gained. Also designing rich models mean more upfront work and more OOP.
Limited querying capabilities ON the Cache itself
That depends on the cache provider, but some of the providers really are not good doing JOINs with Where clause different than the ID. If you try to build and in memory index for a query on Hazelcast, for example, you will see what I mean.

Yes, if you use Redis, it will improve your performance.
No, it will not be a drastic change. :)
https://memorynotfound.com/spring-redis-application-configuration-example/
http://www.baeldung.com/spring-data-redis-tutorial
the above links will help you to find out the way of integration redis with your project.

It depends on the movement.
If You have 1000 or more requests per second and You are low on RAM, then Yes, use redis nodes on other machine to take some usage. It will greatly improve your RAM and request speed.
But If it's otherwise then do not use it.
Remember that You can use this approach later when You will see what is the RAM and database Connection Pool usage.

Your question was already discussed here. Check this link: Application cache v.s. hibernate second level cache, which to use?
This was the most accepted answer, which I agree with:
It really depends on your application querying model and the traffic
demands.
Using Redis/Hazelcast may yield the best performance since there won't
be any round-trip to DB anymore, but you end up having a normalized
data in DB and denormalized copy in your cache which will put pressure
on your cache update policies. So you gain the best performance at the
cost of implementing the cache update whenever the persisted data
changes.
Using 2nd level cache is easier to set up but it only stores
entities by id. There is also a query cache, storing ids returned by a
given query. So the 2nd level cache is a two-step process that you
need to fine tune to get the best performance. When you execute
projection queries the 2nd level object cache won't help you, since it
only operates on entity load. The main advantage of 2nd level cache is
that it's easier to keep it in sync whenever data changes, especially
if all your data is persisted by hibernate.
So, if you need ultimate
performance and you don't mind implementing your cache update logic
that ensures a minimum eventual consistency window, then go with an
external cache.
If you only need to cache entities (that usually don't change that
frequently) and you mostly access those through Hibernate entity
loading, then 2nd level cache can help you.
Hope it helps!

Related

How to analyze performance of Objectify?

Objectify is Google's API/service for storing Java objects in the Google data store. At first, my operations used to be fast (low tens of milliseconds). Now, they have become slow (400-600 ms).
Objectify also turns one operation into multiple operations, e.g. a query looks up the entity ids in an index and then retrieves some entities from memcache and others from the data store. There are annotations on the fields that affect how many operations are created. There are potentially a lot of places where something could go wrong for performance.
How can I get insight into what Objectify actually does both to improve the performance and reduce the billing (by triggering less and more efficient operations)?
I've looked at the Objectify documentation and searched the web extensively. I haven't been able to find a way to diagnose Objectify queries.

Look at the stackdriver analysis of GAE RPC calls to see what's going on under the covers. It'll give you a list of the raw operations.
There really aren't that many non-obvious places where things can go wrong for performance. Hybrid queries (turning a query into a keys-only query followed by a batch get) only apply to #Cache entities. The philosophy is simple - if it's efficient to cache your entities, it's probably efficient to use the cache as much as possible. If you're unsure, eliminate #Cache.
Other than that, Objectify just translates low level Entity objects into POJOs. It's reasonably efficient at this, but you can certainly construct pathological cases. Watch out for long and expensive lifecycle methods (#OnLoad and friends). Nesting lists of lists of lists etc can easily create O(N^3) operations. But these should be obvious when you create them. Especially if you use #Load on Ref<?> objects. Loads aren't free.

Fine grained vs coarse grained domain model In Memory Data Grid

I am wondering which approach is better. Should we use fine grained entities on the grid and later construct functionaly rich domain objects out of the fined grained entities.
Or alternatively we should construct the course grained domain objects and store them directly on the grid and the entities we just use for persistence.
Edit: I think that this question is not yet answered completely. So far we have comments from Hazelcast,Gemfire and Ignite. We are missing Infinispan, Coherence .... That is for completion sake :)

I agree with Valentin, it mainly depends on the system you want to use. Normally I would consider to store enhanced domain objects directly, anyhow if you would just have very few objects but their size is massive you end up with bad distribution and unequal memory usage on the nodes. If your domain object are "normally" sized and you have plenty, you shouldn't worry.
In Hazelcast it is better to store those objects directly but be aware of using a good serialization system as Java Serialization is slow. If you want to query on properties inside your domain objects you should also consider adding indexes.

I believe it can differ from one Data Grid to another. I'm more familiar with Apache Ignite, and in this case fine grained approach works much better, because it's more flexible and in many cases gives better data distribution and therefore better scalability. Ignite also provides rich SQL capabilities [1] that allow to join different entities and execute indexed search. This way you will not lose performance with fine grained model.
[1] https://apacheignite.readme.io/docs/sql-queries

One advantage of a coarse-grained object is data consistency. Everything in that object gets saved atomically. But if you split that object up into 4 small objects, you run the risk that 3 objects save and 1 fails (for whatever reason).
We use GemFire, and tend to favor coarse-grained objects...up to a point. For example our Customer object contains a list of Addresses. An alternative design would be to create one GemFire region for "Customer" and a separate GemFire region for "CustomerAddresses" and then hope you can keep those regions in sync.
The downside is that every time someone updates an Address, we re-write the entire Customer object. That's not very efficient, but our traffic patterns show that address changes are very rare (compared to all the other activity), so this works out fine.
One experience we've had though is the downside of using Java Serialization for long-term data storage. We avoid it now, because of all the problems caused by object compatibility as objects change over time. Not to mention it becomes headache for .NET clients to read the objects. :)

Java Map vs Backend database. Which is better for speed and for multithreading for relations?

My algorithm will likely not be used on the web. The object I describe may be used by multiple threads, however.
The original object I had designed emulated pointers.
Reduced, a symbol would map to multiple pointers, and each unique pointer would map to a single symbol.
When I was finally finished and had a working algorithm, it turns out I actually needed six maps in total (these maps are called tens of thousands of times).
Initial testing with a very very small sample set of symbols showed the program to be working very efficiently. However, I'm afraid that once I increase the number of symbols by a few thousand-fold it will become sluggish.
Once the program completes and closes, the pointers do not need to persist.
I was wondering if I should re implement my algorithm using a database as a backend. Would this be better than using all of these maps?
The maps are stored in memory. The database will be stored on a hard drive (I have a SSD, so I'm afraid there will be a large difference in performance on my machine vs a machine using SATA/PATA). The maps should also be O(1). The maps might also become very ugly once multithreading is introduced, unless I use thread safe mapping, which would slow the program down. A database would efficiently handle these tasks.
I've formally written out the proper relations, and I'm sure I can implement it in a database if that was the best option. Which is the better option?

If you need not to persist that data structure, do not try to support it on a database. In your place, I would try some load tests with a proper amount of data on the data structure you already have and try to refine it from there if performance was not what I expected.
Anyway, the trend currently is to use relational databases in hard disk for persistence and cache frequently queried data in "big hashtables" in memory for performance, I doubt falling back to a database would improve your performance

If your data structures fit in memory, I would be shocked if using a database would be faster (not even considering the complexity of using a database implementation). By throwing away all the assumptions, features, safety and consistency that a database must maintain, you will gain performance. Even the best DB implementation, assuming enough memory to cache everything, pretty much has a ConcurrentHashMap as an upper bound on performance. As a practical matter, you won't get CHM performance even with great caching, because a DB API will require defensive copies or cache invalidations that you can avoid with your in-memory structure.
Apart from the likely performance boost simply from using an in-memory hashmap, you may also get additional performance by tuning your structure based on your specific use case. For example, perhaps the initial lookup is multi-threaded, but individual values are only accessed by a single thread. In that case, you can avoid locking those values.

Hard drives, even fast, are several orders of magnitude slower than your memory. So if your goal is performance you should stay in memory and use maps. For thread safety you can just use a ConcurrentHashMap which uses a lock-free algorithm and the synchronisation penalty in a multi threaded environment should be minimal.
You should also check if a single thread does not provide enough performance - multiple threads always introduce some overhead and they need to bring enough gains to offset it.
You may also want to check in-memory DBs such as HyperSQL or H2 Database.

multi-layered caching with ehcache

A project I am working on, we have decided to implement ehcache for our caching purposes in our web based app. For the most part, we would be using it for Hibernate caching (of master tables) and query-caching.
However, we are also considering method caching at the DAO layer. Honestly, I am a bit skeptical about that. It could mean that say, we have a method at the DAO layer which in turn fires a query (which is already cached), would it make any sense to cache that method then? My feeling is that either we should cache the method OR cache the query that method eventually fires.
Please do let me know your inputs!

From my experience it very much depends on you application (and the kind / structure your data). The application I'm working currently on has 3 built-in layers of cache (all backed by Ehcache): one as Hibernate 2nd level cache, one for short termed hot objects at the middle tier and one for longer termed fat value object at the facade layer. The caches are tuned (cache query parameters, lifetimes, sizes, ...) to supplement each other well.
So a priori, I wouldn't say it doesn't work. There is a possiblity you can skip the entire ORM layer with that. If you have some profiling in place (I like perf4j for that) you can at least optimize for "hot" objects that are expensive to get from your DAOs. If you are using Spring Framework you can do this easily by applying e.g. #Cacheable annotations to your methods. Do your performance testing with (near) live data and requests if possible.
In fact, I think using Hibernate 2nd level cache is the easy / lazy thing to do (so to say a good first step), but the performance gains are limited. With some more specific caching you can easily get factors of hundreds or thousands as speedup for parts (hopefully the important ones) of you application, usually at reduced load.

Ehcache with DB persistence design

I'm designing an application that has to consume live data from several sources and periodically report on it. Consumed data will be added to an Ehcache cache and reports will query it. Once the live data is consumed it needs to be persisted for recovery purposes only. If the application restarts it will prime the cache with historical data from the DB before connecting to the live data sources (which queue new data).
I'm leaning toward implementing it as a cache-as-sor with JDBC caching:
1. Receive data from source
2. Persist to DB
3. Add to cache
4. Confirm receipt with source
with 2-4 wrapped in a JTA transaction.
I also looked into Hibernate with Ehcache as a 2nd level cache, but that doesn't seem appropriate.
I'm relatively new to Ehcache so would like some advice on the right design.

For persistence, rather than do a "cache-aside", you probably would want to configure your caches to use read-through and some cache writer (either write-through, or write-behind). You can read about these here: http://ehcache.org/documentation/user-guide/concepts#cache-as-sor
Now I'd avoid JTA, as I fear the overhead might be overkill (except if you really need XA Transaction Recovery) and rather opt for a fault tolerant approach. If you opt for a asynchronous persistence (write-behind), clustering your cache with Terracotta (the WriteBehind Queue would automatically be persistent, recoverable and even HA if multiple nodes are available) is one approach of ensuring every element gets written out to the underlying SoR... All depending on your needs I guess.
Ehcache would let you start with a single node, unclustered approach, simply using read- & write-through caches, that you could grow and fine tune to meet your SLA. As data grows, you'd then be able to move to clustered caches and asynchronous writers (should writes become the issues) or grow your cache sizes (if reads remain the issue). Obviously, you should measure (or at least know what the bottlenecks are you foresee) and choose accordingly. But putting a Cache in front of your RDBMS is a common and well understood pattern to scale read (and write) access to these "slower" stores...

If you want to have data in a cache, the Hibernate looks like overkill. All you need is JDBC, both to implement a cache loader for cache initialization and for saving the data to a database periodically. Or just setup your cache to persist on disk.

Then Ehcache + Hibernate is not the solution. What you are describing here is an asynchronous event processing system in which one of the listeners awaits a "event processed successfully" to persist.
NoSQL databases are a far better option in this case, unless you need to strictly rely to a relational database.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.