Is it safe to cache DataSource lookups in Java EE?

Is it safe to cache DataSource lookups in Java EE? - java

I'm developing a simple Java EE 5 "routing" application. Different messages from a MQ queue are first transformed and then, according to the value of a certain field, stored in different datasources (stored procedures in different ds need to be called).
For example valueX -> dataSource1, valueY -> dataSource2. All datasources are setup in the application server with different jndi entries. Since the routing info usually won't change while the app is running, is it save to cache the datasource lookups? For example I would implement a singleton, which holds a hashmap where I store valueX->DataSource1. When a certain entry is not in the list, I would do the resource lookup and store the result in the map. Do I gain any performance with the cache or are these resource lookups fast enough?
In general, what's the best way to build this kind of cache? I could use a cache for some other db lookups too. For example the mapping valueX -> resource name is defined in a simple table in a DB. Is it better too lookup the values on demand and save the result in a map, do a lookup all the time or even read and save all entries on startup? Do I need to synchronize the access? Can I just create a "enum" singleton implementation?

It is safe from operational/change management point of view, but not safe from programmer's one.
From programmer's PoV, DataSource configuration can be changed at runtime, and therefore one should always repeat the lookup.
But this is not how things are happening in real life.
When a change to a Datasource is to be implemented, this is done via a Change Management procedure. There is a c/r record, and that record states that the application will have a downtime. In other words, operational folks executing the c/r will bring the application down, do the change and bring it back up. Nobody does the changes like this on a live AS -- for safety reasons. As the result, you shouldn't take into account a possibility that DS changes at runtime.
So any permanent synchronized shared cache is good in the case.
Will you get a performance boost? This depends on the AS implementation. It likely to have a cache of its own, but that cache may be more generic and so slower and in fact you cannot count on its presence at all.
Do you need to build a cache? The answer usually comes from performance tests. If there is no problem, why waste time and introduce risks?
Resume: yes, build a simple cache and use it -- if it is justified by the performance increase.
Specifics of implementation depend on your preferences. I usually have a cache that does lookups on demand, and has a synchronized map of jndi->object inside. For high-concurrency cache I'd use Read/Write locks instead of naive synchronized -- i.e. many reads can go in parallel, while adding a new entry gets an exclusive access. But those are details much depending on the application details.

Related

hibernate second level cache with Redis -will it improve performance?

I am currently developing an application using Spring MVC4 and hibernate 4. I have implemented hibernate second level cache for performance improvement. If I use Redis which is an in-memory data structure store, used as a database, cache etc, the performance will increase but will it be a drastic change?

Drastic differences you may expect if you cache what is good to be cached and avoid caching data that should not be cached at all. Like beauty is in the eye of the beholder the same is with the performance. Here are several aspects you should have in mind when using hibernate AS second level cache provider:
No Custom serialization - Memory intensive
If you use second level caching, you would not be able to use fast serialization frameworks such as Kryo and will have to stick to java serializable which sucks.
On top of this for each entity type you will have a separate region and within each region, you will have an entry for each key of each entity.
In terms of memory efficiency, this is inefficient.
Lacks the ability to store and distribute rich objects
Most of the modern caches also present computing grid functionality having your objects fragmented into many small pieces decrease your ability to execute distributed tasks with guaranteed data co-location. That depends a little bit on the Grid provider, but for many would be a limitation.
Sub optimal performance
Depending on how much performance you need and what type of application you are having using hibernate second level cache might be a good or a bad choice. Good in terms that it is plug and play...." kind of..." bad because you will never squeeze the performance you would have gained. Also designing rich models mean more upfront work and more OOP.
Limited querying capabilities ON the Cache itself
That depends on the cache provider, but some of the providers really are not good doing JOINs with Where clause different than the ID. If you try to build and in memory index for a query on Hazelcast, for example, you will see what I mean.

Yes, if you use Redis, it will improve your performance.
No, it will not be a drastic change. :)
https://memorynotfound.com/spring-redis-application-configuration-example/
http://www.baeldung.com/spring-data-redis-tutorial
the above links will help you to find out the way of integration redis with your project.

It depends on the movement.
If You have 1000 or more requests per second and You are low on RAM, then Yes, use redis nodes on other machine to take some usage. It will greatly improve your RAM and request speed.
But If it's otherwise then do not use it.
Remember that You can use this approach later when You will see what is the RAM and database Connection Pool usage.

Your question was already discussed here. Check this link: Application cache v.s. hibernate second level cache, which to use?
This was the most accepted answer, which I agree with:
It really depends on your application querying model and the traffic
demands.
Using Redis/Hazelcast may yield the best performance since there won't
be any round-trip to DB anymore, but you end up having a normalized
data in DB and denormalized copy in your cache which will put pressure
on your cache update policies. So you gain the best performance at the
cost of implementing the cache update whenever the persisted data
changes.
Using 2nd level cache is easier to set up but it only stores
entities by id. There is also a query cache, storing ids returned by a
given query. So the 2nd level cache is a two-step process that you
need to fine tune to get the best performance. When you execute
projection queries the 2nd level object cache won't help you, since it
only operates on entity load. The main advantage of 2nd level cache is
that it's easier to keep it in sync whenever data changes, especially
if all your data is persisted by hibernate.
So, if you need ultimate
performance and you don't mind implementing your cache update logic
that ensures a minimum eventual consistency window, then go with an
external cache.
If you only need to cache entities (that usually don't change that
frequently) and you mostly access those through Hibernate entity
loading, then 2nd level cache can help you.
Hope it helps!

where we need to set hibernate session to thread local object

ThreadLocal<Session> tl = new ThreadLocal<Session>();
tl.set(session);
to get the session,
Employee emp = (Employee)((Session)tl.get().get(Employee.class, 1));
If our application is web based, the web container creates a separate thread for each request.
If all these requests concurrently using the same single Session object , we should get
unwanted results in our database operations.
To overcome from above results, it is good practice to set our session to threadLocal object
which does not allows concurrent usage of session.I think, If it is correct the application performance should be very poor.
What is the good approach in above scenarios.
If I'm in wrong track , in which situations we need to go for ThreadLocal.
I'm new to hibernate, please excuse me if this type questioning is silly.
thanks in advance.

Putting the Hibernate Session in ThreadLocal is unlikely to achieve the isolation between requests that you want. Surely you create a new Session for each request using a SessionFactory backed by a connection pooling implementation of DataSource, which means that the local reference to the Session is on the stack anyway. Changing that local reference to a member variable only complicates the code, imho.
Anyhow, ensuring isolation within a single container doesn't address the actual problem - how is data accessed efficiently while maintaining consistency within a multi-threaded environment.
There are two parts to the problem you mention - the first is that a database connection is an expensive resource, the second that you need to ensure some level of data consistency between threads/requests.
The general approach to the resource problem is to use a database connection pool (which I'd guess you're already doing). As each request is processed, connections are obtained from the pool and returned when finished but importantly the connections in the pool are maintained beyond the lifetime of a request thus avoiding the cost of creating a connection each time it is needed.
The consistency problem is a little trickier and there's no one size fits all model. What you need to be doing is thinking about what level of consistency you need - questions like does it matter if data is read at the same time it's being written, do updates absolutely have to be atomic, etc.
Once you know the answer to these questions there two places you need to look at consistency - in the database and in the code.
With the database you need to look at database level locks and create a scheme suitable for your application by applying that appropriate isolation levels.
With the code, things are a little more complicated. Data is often loaded and displayed for a period of time before updates are written back - no problem if there's a single user but in a multi-user system it's possible that updates are made based on stale data or multiple updates occur simulatiously. It may be acceptable to have a policy of last update wins, in which case it's simple, but if not you'll need to be using version numbers or old/new comparisons to ensure integrity at the time the updates are applied.

I am not sure if you have compulsion of using ThreadLocal. Using ThreadLocal to store session object is definitely is not a good idea, specially when you are using hibernate along with spring.
A typical scheme for using Hibernate with Spring is:
Inject the sessionFactory in your DAO. I assume that you have sessionFactory already configured which is backed by a pooled datasource.
Now in your DAO class, a session can be accessed as follows.
Session session = sessionFactory.getCurrentSession();
Here is a link to related article.
Please note that this example is specific to Hiberante 3.x APIs. This takes care of session creation/closure/thread-safety aspect internally and its neat too.

Retrieve all essential data on startup

In Java Web Application, i would like to know if it is a proper (or "standard"?) way that all the essential data such as the config data, message data, code maintenance data, dropdown option data and etc (assuming all data will not updated frequently) are loaded as a "static" variables from database when the server startup.Or is it more preferred way to retrieve data by querying db per request?
Thanks for all your advice here.

It is perfectly valid to pull out all the data that are not going to be modified during application life-cycle into and keep it in memory as singleton or something.
This is a good idea because it saves DB hits and retrieval is faster. A lot of environment specific settings and other data can also be pulled once and kept in an immutable hashmap for any future request.
In a common web-app you generally do not have so many config data/option objects that can eat up lot of memory and cause OOM. But, if you have a table with hundreds of thousands of config data, better assume pulling objects as and when requested. And if you do want to keep it in memory, think of putting this in some key-value store like MemcacheD.

We used DB to store config values and ehcache to avoid a lot of DB hits. This way you don't need to worry about memory consumption (it will use whatever memory you have).
EhCache is one of many available DB cache solution and can be configured on top of JPA etc.
You can configure ehcache (or many other cache providers) to deem the tables read-only, in which case it will only go to the DB if it's explicitly told to invalidate the cache. This performs pretty well. The overhead becomes visible though when the read occurs very frequently (like 100/sec), but usually storing the config value in a local variable and avoiding reading inside loops, passing it on through the method stack during the invocation mitigates this well enough.
Storing values in a Singleton as java objects performs the best, but if you want to modify these without app. start up, it becomes a little bit involved.
Here is a simple way to achieve dynamic configuration with Java objects:
private volatile ImmutableMap<String,Object> param_value
Basically you'll have to start thinking about multi-threaded access, and memory issues (while it's quite unlikely that you'll run out of memory because of configuration values, unless you have binary data as config values etc.).
In essence, I'd recommend using the DB and some cache provider unless that part of code really needs high-performance.

The best place to store large data retrieved by a java servlet (Tomcat)

I have the java servlet that retrieves data from a mysql database. In order to minimize roundtrips to the database, it is retrieved only once in init() method, and is placed to a HashMap<> (i.e. cached in memory).
For now, this HashMap is a member of the servlet class. I need not only store this data but also update some values (counters in fact) in the cached objects of underlying hashmap value class. And there is a Timer (or Cron task) to schedule dumping these counters to DB.
So, after googling i found 3 options of storing the cached data:
1) as now, as a member of servlet class (but servlets can be taken out of service and put back into service by the container at will. Then the data will be lost)
2) in ServletContext (am i right that it is recommended to store small amounts of data here?)
3) in a JNDI resource.
What is the most preferred way?

Put it in ServletContext But use ConcurrentHashMap to avoid concurrency issues.

From those 3 options, the best is to store it in the application scope. I.e. use ServletContext#setAttribute(). You'd like to use a ServletContextListener for this. In normal servlets you can access the ServletContext by the inherited getServletContext() method. In JSP you can access it by ${attributename}.
If the data is getting excessive large that it eats too much of Java's memory, then you should consider a 4th option: use a cache manager.

The most obvious way would be use something like ehcache and store the data in that. ehcache is a cache manager that works much like a hash map except the cache manager can be tweaked to hold things in memory, move them to disk, flush them, even write them into a database via a plugin etc. Depends if the objects are serializable, and whether your app can cope without data (i.e. make another round trip if necessary) but I would trust a cache manager to do a better job of it than a hand rolled solution.

If your cache can become large enough and you access it often it'll be reasonable to utilize some caching solution. For example ehcache is a good candidate and easily integrated with Spring applications, too. Documentation is here.
Also check this overview of open-source caching solutions for Java.

Update cached data in a hashtable

In order to minimize the number of database queries I need some sort of cache to store pairs of data. My approach now is a hashtable (with Strings as keys, Integers as value). But I want to be able to detect updates in the database and replace the values in my "cache". What I'm looking for is something that makes my stored pairs invalid after a preset timespan, perhaps 10-15 minutes. How would I implement that? Is there something in the standard Java package I can use?

I would use some existing solution(there are many cache frameworks).
ehcache is great, it can reset the values on given timespan and i bet it can do much more(i only used that)

You can either use existing solutions (see previous reply)
Or if you want a challenge, make your own easy cache class (not recommended for production project, but it's a great learning experience.
You will need at least 3 members
A cache data stored as hashtable object,
Next cache expiration date
Cache expiration interval set via constructor.
Then simply have public data getter methods, which verify cache expiration status:
if not expired, call hastable's accessors;
if expired, first call "data load" method that is also called in the constructor to pre-populate and then call hashtable accessors.
For an even cooler cache class (I have implemented it in Perl at my job), you can have additional functionality you can implement:
Individual per-key cache expiration (coupled with overall total cache expiration)
Auto, semi-auto, and single-shot data reload (e.g., reload entire cache at once; reload a batch of data defined either by some predefined query, or reload individual data elements piecemail). The latter approach is very useful when your cache has many hits on the same exact keys - that way you don't need to reload universe every time 3 kets that are always accessed expire.

You could use a caching framework like OSCache, EHCache, JBoss Cache, JCS... If you're looking for something that follows a "standard", choose a framework that supports the JCache standard interface (javax.cache) aka JSR-107.
For simple needs like what you are describing, I'd look at EHCache or OSCache (I'm not saying they are basic, but they are simple to start with), they both support expiration based on time.
If I had to choose one solution, I'd recommend Ehcache which has my preference, especially now that it has joined Terracotta. And just for the record, Ehcache provides a preview implementation of JSR107 via the net.sf.cache.jcache package.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.