I'm building java web application which in future can generate a lot of traffic.
All in all it uses some quite simple queries to the database but still some kind of cache may be necessary to keep low latency and to prevent high database access rate.
Shall I bother with cache from the start? is it necessity?
Is it hard to implement or use some open source solutions on existing queries and how such cache will know when database state changed?
It all depends on how much traffic you expect, do you have some estimate of the max volume or the number of users?
Most of the times you don't need to worry about the cache from the beginning and can add later a hibernate second level cache later on.
You can start the development without a cache configured, then add it later on by choosing a cache provider and plug it as second level cache provider. EHCache is a frequent choice.
You can then annotate your entities with #Cache with different strategies, for example read only, etc.
Related
I've got a service that is distributed between 9 VMs.
I want to create a single cache that will be used between all of them.
up until now I was using separate cache for each VM which gave me inconsistency between all of them.
I already have an elasticsearch that I am using so I was wondering if I can use the ES as a cache layer + spring cache abstraction together.
the service is in Java 8 + Spring Boot + ES 5.2.2
edit:
additional information, the original problem for my service is that I need to return a response in less then 100ms , that is when I've started using a simple spring cache implementation using ConcurrentCacheManager , works just fine for the speed, and the cache should be refreshed every hour~, for now I've got about 1300~ objects that needs to be cached, so on every service startup i have a process that fills the cache with all the responses that I've got and another process that would wake up every hour or so will refresh the entries with updated data. that would occur on each of the 9 vms.
the issues with this system are:
1. i would hit another service multiple of 9 times each hour with all the information it got (it can handle it but it would be better if only one vm would do it)
2. if i need to update a specific entry in cache with new information or delete it all together, i don't have an issue way to remove it from all the vm's that are in my pool.
3. since each vm runs the cache refresh in a slightly different time, the cache won't be aligned across all VM's and the same call to a LB can return different results from different specific VMs.
4. for now i've got only 1300 objects that needs to be cached but it could go exponentially to millions of entries and I don't want to get stuck with out of memory issues.
I understand its not the ideal use of a cache system, and I might use a wrong term for what I want to have but basically I need a good and fast name-value storage system that can be accessed across my service. and it would be great if it can use the spring-cache abstraction because its really easy to use and it's already implemented in my service.
Thanks,
A.
You could easily integrate Hazecast or EHCache as the distributed cache for your spring application.
It might be me, but I find it a weird choice to use Elastic Search as a cache.
We are developing a distributed web application (3 tomcats with a load balancer).
Currently we are looking for a cache solution. This solution should be cluster safe ofcourse.
We are using spring, jpa (mysql)
We thought about the following solution :
Create a cache server that runs a simple cache and all DB operations from each tomcat will be delegated to it. (dao layer in web app will communicate with that server instead of accessing DB itself). This is appealing since the cache on the cache server configuration can be minimal.
What we are wondering about right now is:
If a complex query is passed to the cacheServer (i.e. select with multiple joins and where clauses) how exactly the standard cache form (map) can handle this? does it mean we have to single handedly implement a lookup for each complex query and adjust it to map search instead of DB?
P.S - there is a possibility that this architecture is flawed in its base and therefore a weird question like this was raised, if that's the case please suggest an alternative.
Best,
mySql already come with a query cache, see http://dev.mysql.com/doc/refman/5.1/en/query-cache-operation.html
If I understand correctly, you are trying to implement a method cache, using as a key the arguments of your DAO methods and as value, the resulted object/list.
This should work, but your concern about complex queries is valid, you will end up with a lot of entries in your cache. For a complex query you would hit the cache only if the same query is executed exactly with the same arguments as the one in the cache. You will have to figure out if it is useful to cache those complex queries, if there is a chance they will be hit, it really depends on the application business logic.
Another option would be to implement a cache with multiple levels: second level cache and query cache, using ehcache and big memory. You might find this useful:
http://ehcache.org/documentation/integrations/hibernate
I don't want to persist any data but still want to use Neo4j for it's graph traversal and algorithm capabilities. In an embedded database, I've configured cache_type = strong and after all the writes I set the transaction to failure. But my write speeds (node, relationship creation speeds) are a slow and this is becoming a big bottleneck in my process.
So, the question is, can Neo4j be run without any persistence aspects to it at all and just as a pure API? I tried others like JGraphT but those don't have traversal mechanisms like the ones Neo4j provides.
As far as I know, Neo4J data storage and Lucene indexes are always written to files. On Linux, at least, you could set up a ramfs filing system to hold the files in-memory.
See also:
Loading all Neo4J db to RAM
How many changes do you group in each transaction? You should try to group up to thousands of changes in each transaction since committing a transaction forces the logical log to disk.
However, in your case you could instead begin your transactions with:
db.tx().unforced().begin();
Instead of:
db.beginTx();
Which makes that transaction not wait for the logical log to force to disk and makes small transactions much faster, but a power outage could have you lose the last couple of seconds of data potentially.
The tx() method sits on GraphDatabaseAPI, which for example EmbeddedGraphDatabase implements.
you can try a virtual drive. It would make neo4j persist to the drive, but it would all happen in memory
https://thelinuxexperiment.com/create-a-virtual-hard-drive-volume-within-a-file-in-linux/
Say you have a 4-node J2EE application server cluster, all running instances of a Hibernate application. How does caching work in this situation? Does it do any good at all? Should it simply be turned off?
It seems to me that data on one particular node would quickly become stale, as other users hitting other nodes make changes to database data. In such a situation, how could Hibernate ever trust that its cache is up to date?
First of all, you should clarify what cache you're talking about, Hibernate has 3 of them (the first-level cache aka session cache, the second-level cache aka global cache and the query cache that relies on the second-level cache). I guess the question is about the second-level cache so this is what I'm going to cover.
How does caching work in this situation?
If you want to cache read only data, there is no particular problem.
If you want to cache read/write data, you need a cluster-safe cache implementation (via invalidation or replication).
Does it do any good at all?
It depends on a lot of things: the cache implementation, the frequency of updates, the granularity of cache regions, etc.
Should it simply be turned off?
Second-level caching is actually disabled by default. Turn it on if you want to use it.
It seems to me that data on one particular node would become stale quickly as other users hitting other nodes make changes to database data.
Which is why you need a cluster-safe cache implementation.
In such a situation, how could Hibernate ever trust that its cache is up to date?
Simple: Hibernate trusts the cache implementation which has to offer a mechanism to guarantee that the cache of a given node is not out of date. The most common mechanism is synchronous invalidation: when an entity is updated, the updated cache sends a notification to the other members of the cluster telling them that the entity has been modified. Upon receipt of this message, the other nodes will remove this data from their local cache, if it is stored there.
First of all, there are 2 caches in Hibernate.
There is the first level cache, which you cannot remove, and is called Hibernate session. Then, there is the second level cache which is optional and pluggable (e.g Ehcache). It works accross many requests and, most probably, it's the cache you are referring to.
If you work on a clustered environment, then you need a 2nd level cache which can replicate changes accross the members of the cluster. Ehcache can do that. Caching is a hard topic and you need a deep understanding in order to use it without introducing other problems. Caching in a clustered environment is slightly more difficult.
Is it possible to share the 2nd level cache between a hibernate and nhibernate solution? I have an environment where there are servers running .net and servers running java who both access the same database.
there is some overlap in the data they access, so sharing a 2nd level cache would be desirable. Is it possible?
If this is not possible, what are some of the solutions other have come up with?
There is some overlap in the data they access, so sharing a 2nd level cache would be desirable. Is it possible?
This would require (and this is very likely oversimplified):
Being able to access a cache from Java and .Net.
Having cache provider implementations for both (N)Hibernate.
Being able to read/write data in a format compatible with both languages (or there is no point at mutualizing the cache).
This sounds feasible but:
I'm not aware of an existing ready-to-use solution implementing this (my first idea was Memcache but AFAIK Memcache stores a serialized version of the data so this doesn't meet the requirement #3 which is the most important).
I wonder if using a language neutral format to store data would not generate too much overhead (and somehow defeat the purpose of using a cache).
If this is not possible, what are some of the solutions other have come up with?
I never had to do this but if we're talking about a read-write cache and if you use two separate caches, you'll have to invalidate a given Java cache region from the .Net side and inversely. You'll have to write the code to handle that.
As Pascal said, it's improbable that sharing the 2nd cache is technically possible.
However, you can think about this from a different perspective.
It's unlikely that both applications read and write the same data. So, instead of sharing the cache, what you could implement is a cache invalidation service (using the communications stack of your choice).
Example:
Application A mostly reads Customer data and writes Invoice data
Application B mostly reads Invoice data and writes Customer data
Therefore, Application A caches Customer data and Application B caches Invoice data
When Application A, for example, modifies an invoice, it sends a message to Application B and tells it to evict the invoice from the cache.
You can also evict whole entity types, collections and regions.