Spring cache implementation with Elasticsearch

Spring cache implementation with Elasticsearch - java

I've got a service that is distributed between 9 VMs.
I want to create a single cache that will be used between all of them.
up until now I was using separate cache for each VM which gave me inconsistency between all of them.
I already have an elasticsearch that I am using so I was wondering if I can use the ES as a cache layer + spring cache abstraction together.
the service is in Java 8 + Spring Boot + ES 5.2.2
edit:
additional information, the original problem for my service is that I need to return a response in less then 100ms , that is when I've started using a simple spring cache implementation using ConcurrentCacheManager , works just fine for the speed, and the cache should be refreshed every hour~, for now I've got about 1300~ objects that needs to be cached, so on every service startup i have a process that fills the cache with all the responses that I've got and another process that would wake up every hour or so will refresh the entries with updated data. that would occur on each of the 9 vms.
the issues with this system are:
1. i would hit another service multiple of 9 times each hour with all the information it got (it can handle it but it would be better if only one vm would do it)
2. if i need to update a specific entry in cache with new information or delete it all together, i don't have an issue way to remove it from all the vm's that are in my pool.
3. since each vm runs the cache refresh in a slightly different time, the cache won't be aligned across all VM's and the same call to a LB can return different results from different specific VMs.
4. for now i've got only 1300 objects that needs to be cached but it could go exponentially to millions of entries and I don't want to get stuck with out of memory issues.
I understand its not the ideal use of a cache system, and I might use a wrong term for what I want to have but basically I need a good and fast name-value storage system that can be accessed across my service. and it would be great if it can use the spring-cache abstraction because its really easy to use and it's already implemented in my service.
Thanks,
A.

You could easily integrate Hazecast or EHCache as the distributed cache for your spring application.
It might be me, but I find it a weird choice to use Elastic Search as a cache.

Related

In a distributed Java web application, how to share a value between all servlets on all machines?

If I have a distributed java web application deployed in a cluster and I have say 10 servlets & 10 JSPs running the show, and if I want to share some data, say a variable or a simple POJO between all the threads of all the servlets on all the machines, what is the way to do it?
No framework like Spring/Struts is used and let's say I'm only using the basic Servlets and JSPs. Usually we think about ServletConfig, ServletContext, HttpSession and HttpServletRequest objects to store information which needs to be passed/shared from one component to another. ServletContext has the largest scope because it's accessible from all the servlets and JSPs in the web app. But in case of distributed application I guess ServeltContext object would be created one per JVM, so even for a single web app every machine in the cluster will have a different java object for ServletContext, correct? So in such a scenario what should be done to share a POJO between all the servlets on all the machines of a single web app?
If it's not possible using plain Servlets and JSP, do any frameworks make is possible? Would appreciate any inputs. Many thanks!

In a distributed architecture, it is useful to think beyond objects and think about "services". There are several possible solutions for this but all of them would include some form of service you could access from any of your 10 nodes.
So, you could for example create an 11th machine and host an API for putting and getting objects (values/maps/etc?). That would create a shareable region between the nodes.
However, this opens a whole world of possible issues if not done correctly, because you need to think about sinchronization, deadlocks, dirty reads and other concurrent processing stuff in a cross-JVM mindset.
Also, many systems sinchronize their nodes via the database, but this approach is somehow deprecated nowadays in favor of the more recent "microservices" approach where persistence is distributed, not monolithic.

you are using spring already, so maybe spring session project is a right choice for you - http://projects.spring.io/spring-session/. For sure its the easiest one to run.

You can use hazelcast, a framework as memcache but with auto-discovery for clustering . I use to used for the session and cache sharing on my Amazon cluster and works like a charm
http://hazelcast.com/use-cases/caching/
But if you want keep in simple you can always use as I said before memcached
http://memcached.org/

Sharing things between servers is:
error prone
sometimes complicated
The most common thing to want is user session data across a load balanced cluster of servers. If someone is talking to one server, then gets load balanced to a different server, you want to keep their session going. Tomcat Clusters does this, and it's already built in.
https://tomcat.apache.org/tomcat-7.0-doc/cluster-howto.html
The last time I played with that, it was touchy; don't count on session replication always working in any servlet container, and you'll be better off. Also, session replication is crazy expensive; once you're past a few machines, the cost (in RAM) of having all session data everywhere... starts to add up quickly, and you can't add more users easily anymore.
Wanting to share things between multiple JVMs is a code smell; if you can architect around it, do so. But other than clustering, you have the two normal options:
a database. Tried, true, tested; keep details that need to change there.
an in-memory store. If it gets called on every request, and/or must be really fast for whatever reason, just consider keeping it in memory; memcached is a multi-machine in-memory key-value-store that does just this.

The simplest solution is ConcurrentHashMap https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ConcurrentHashMap.html
If you want to scale your application - you will need something like hazelcast - http://hazelcast.com/

Cache in a distributed web application - complex queries use case

We are developing a distributed web application (3 tomcats with a load balancer).
Currently we are looking for a cache solution. This solution should be cluster safe ofcourse.
We are using spring, jpa (mysql)
We thought about the following solution :
Create a cache server that runs a simple cache and all DB operations from each tomcat will be delegated to it. (dao layer in web app will communicate with that server instead of accessing DB itself). This is appealing since the cache on the cache server configuration can be minimal.
What we are wondering about right now is:
If a complex query is passed to the cacheServer (i.e. select with multiple joins and where clauses) how exactly the standard cache form (map) can handle this? does it mean we have to single handedly implement a lookup for each complex query and adjust it to map search instead of DB?
P.S - there is a possibility that this architecture is flawed in its base and therefore a weird question like this was raised, if that's the case please suggest an alternative.
Best,

mySql already come with a query cache, see http://dev.mysql.com/doc/refman/5.1/en/query-cache-operation.html

If I understand correctly, you are trying to implement a method cache, using as a key the arguments of your DAO methods and as value, the resulted object/list.
This should work, but your concern about complex queries is valid, you will end up with a lot of entries in your cache. For a complex query you would hit the cache only if the same query is executed exactly with the same arguments as the one in the cache. You will have to figure out if it is useful to cache those complex queries, if there is a chance they will be hit, it really depends on the application business logic.
Another option would be to implement a cache with multiple levels: second level cache and query cache, using ehcache and big memory. You might find this useful:
http://ehcache.org/documentation/integrations/hibernate

Tomcat with multiple instances of the same application

We're building a java web application where each customer will have an instance of it with it's own database schema.
It will be managed by my company so I would like to know what is the best approach to have several apps instances running on the same Tomcat runtime since we tried to run 3 instances on a single Tomcat and it ended up on an Out of memory exception.
We considered to run multiple tomcat instances in the same server but we haven't already tested it. Also we are considering to have a separate server for each customer.
From your experience with similar scenarios, what is your opinion?
EDITED: This application can't be multi-tenant since there will be code customizations in some parts of it as well as some other business facts that require an application instance per client. So please the application architecture is not the subject here.
Thank you,
Gyo

You want to use multi tenant architecture. There will be only one database and web application instance, and every record will be qualified by the 'owner' company. You can use the subdomain/domain which the client uses to access your application to differentiate between them.
Simplistically, you add a 'domain_id' column to every table and you have a 'where domain_id=?' in every query. Each user will have an associated domain_id which you will pick up on login and put in session. In reality there will be other considerations.
EDITED: Based on the edit in the question, here is additional part to the answer.
In multitenant architecture it is possible to customise every instance without maintaining separate codebases. Some of the customisations can be part of the 'profile'. This is suitable for data values and flags, like currency, date format, etc. The case where new functionality specific to a client is required, this can be achieved by supporting plugins.
Taking a one time pain to fit your solution into a multi tenant architecture will be better than the on going pain of maintaining several separate versions of your code for each client. You might want to read up on the topic of 'technical debt'.
An ERP is a complex case of a business application, and you can get inspiration from reading the OpenBravo Trial FAQ to get an idea of what we are saying. Openbravo is open source and you may get technical details by looking at their code.

My Opinion is exactly the same as Kinjal Dixit one. Your approach is wrong and will be an huge waste of resources.
If you want to be able to deploy different version of the web-app for the same server you will have to isolate the class-loading of each app and this will imply an huge memory consumption. Otherwise if all web-app will always be the same there is no interest to deploy it many times.
Having a separate server for each customer will also be a waste of resource (multiple instance of JVM, multiple classloading of the same libraries, multiplication of the number of thread and so of the cost of scheduling) and will significantly complicate the deployment, especially if you plan some clustering strategy where the load balancing will probably become a hell
Moreover if you want to have some specific feature for a given client it will also become a hell to manage / deploy / upgrade, etc...
Multi-tenant architecture does not necessary imply to share the database (you can have a DB instance per client and dispatch the request with an interceptor at low level) however sharing the web-app is an absolute necessity.
I'd also advice to provide some kind of configuration to allow to enable custom features for a given client.
I worked for a company where we encountered exactly this problem (expose a legacy web application as a SAAS one) we started by deploying one web-app per client, spent huge time in various optimization (including class loading factorization) to reach the "huge" number of 14 customer per server.
This was far from our performance expectation and we finally switched to a multi-tenant architecture, but keeping one DB instance per customer to avoid the important cost of data-model refactoring. The new deployment was able to handle more than 100 customer on the same server with incomparable performance.
EDIT (according to question update)
If you absolutely want to avoid multi-tenancy then i'd recommend to use only one servlet container (tomcat) per server. In this case you will have to let the default web-app classloading isolation (as you will have custom code in different instances) which will imply a high memory footprint. You should however put all common libraries in the common/lib tomcat directory to factorize their loading ( see http://tomcat.apache.org/tomcat-6.0-doc/class-loader-howto.html).

Spring + Hibernate web application - When to use cache?

I'm building java web application which in future can generate a lot of traffic.
All in all it uses some quite simple queries to the database but still some kind of cache may be necessary to keep low latency and to prevent high database access rate.
Shall I bother with cache from the start? is it necessity?
Is it hard to implement or use some open source solutions on existing queries and how such cache will know when database state changed?

It all depends on how much traffic you expect, do you have some estimate of the max volume or the number of users?
Most of the times you don't need to worry about the cache from the beginning and can add later a hibernate second level cache later on.
You can start the development without a cache configured, then add it later on by choosing a cache provider and plug it as second level cache provider. EHCache is a frequent choice.
You can then annotate your entities with #Cache with different strategies, for example read only, etc.

how serializable is releated to caching a java/command class?

I am working on ibm websphere commerce (wcs). In this framework we have an option to cache our command class, basically they are just a java classes. While having a new cache entry i got to know that these java classes must be serializable (implement the java.io.Serializable interface). Why is that?
is it like caching is basically saving an output of some execution? and in this case it will save the sequence of bytes generated as part of serialization and whenever a requested to that cached object comes then it will just deserialize and returns the object without executing actual program? Can anyone please share knowledge about this??
Thanks in advance,
Santosh

For caching the result of a method execution and returning it for subsequent calls serialization is not needed.
The most likely reason it needs to be Serializable is that when you cache some data in a clustered environment changes made to the cached data on one node would have to be replicated on other nodes of the cluster. For doing this replication the data needs to be serialized and sent across to another node using some remoting api.
The other reason for requiring the class to be serialiazable is that the cache implementation might overflow the data to a disk. Even in this case the objects in the cache need to be converted to some form that can be stored on the disk and recreated.
The following is a passage from ehcache documentation that explains the overflow scenario in more detail.
When an element is added to a cache and it goes beyond its maximum
memory size, an existing element is either deleted, if overflowToDisk
is false, or evaluated for spooling to disk, if overflowToDisk is
true.
In the latter case, a check for expiry is carried out. If it is
expired it is deleted; if not it is spooled. The eviction of an item
from the memory store is based on the 'MemoryStoreEvictionPolicy'
setting specified in the configuration file.

Serialization saves the actual object itself, in its current state.

The reason why is due to WebSphere Commerce's use of WebSphere Application Server's Dynacache feature. WAS dynacache is an in-memory java cache that is very similiar to a built in memcached. Out of the box, the starter store uses the dynacache to cache JSP, servlets, controllers, commands, command tasks and other java objects. There is also caching done on the DB side. This is why in performance tests, IBM scales much better at high volumes than other software.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.