I am implementing a memory caching system for a web application. This memory caching system will have to handle objects sized from small scale to large scale and more and more hits to caches (reads and writes) . The system will have to handle multiple cachinng services such as JCS, ehCach, Memcach, SQL caching etc based on the configuration.
For learning and studying purposes and to implement a better architecture for my system, any one please let me have some resources. (example: sample class diagrams with project source files ).
The question is totally unspecific! The best thing you can do is to work through the tutorials, examples and manuals of the caching solutions.
You should also consider distributed caching solutions like infinispan and hazelcast.
For in-memory only caching Guava Cache and cache2k (I work on cache2k) might be sufficient.
If you start a new architecture around caching I strongly suggest that you look into the JSR107/JCache spec, because this is the new standard way to access caching services.
Related
We are trying to develop a system for distributed caching. Right now, we have 12 applications and they all load same cache. So each jvm loads cache in its in-memory. Problem with this system is redundant data. All 12 applications are loading same cache.
We want to develop a system where you add one or two(for failover) JVM's which load cache and the other 12 applications call these new Cache JVM's.
Can someone suggest me if there are any technologies/frameworks that has solution for my needs?
Thanks
Have a look at Memcached. It may offer a solution to your distributed cache needs.
Also, as #Guy Bouallet mentioned, ehcache is also a viable solution.
Ehcache is a good alternative. It can be used to cache data loaded from database, Web pages or other key/value elements in a distributed environment.
I personally used it in several professional applications and it had shown to be an efficent solution.
In Java spring, How exactly mem cache and Ehcahe stores data in server memory?A simple explanation or comparison would be helpful.
The best way to find the internal workings of a caching framework is to go through its source code. I could not find any authoritative article detailing the internal working of these frameworks. Here are few points which differentiates Ehcache from Memcache.
Distributed caches like MemCache and EHcache work like a Giant Hashmap. By Distributed, it means that the cache can be spread over multiple servers, virtually extending the storage capability to unlimited number of objects.
Although, both Ehcaceh and Memcached appear like a hashmap, the way they work is quite different.
Ehcache is general purpose Java Object cache, meaning generally used with the Java Application to cache java objects. Its generally used as an add-on to the application for caching requirements.
Ehcache is completely written in Java so its a pure Java application.
Ehcache offers RESTful APIs as an interface.
Memcached is a general purpose cache for caching any type of objects.
Its a client-server based scheme. So you have a memcached server which holds the actual data and there are clients (available in almost all the languages). The Memcached Server is written in native language(C/C++).
For using with Java, it needs a memcached client (spymemcached)
I know I have not answered your core question regarding the internal working of the cache frameworks but the points I mentioned should help you select one over other based on your requirements.
I'm looking for a solution to share a cache between two tomcat web apps running on different hosts. The cache is being used for data synchronization, so the cache must be guaranteed to be up-to-date at all times between the two tomcat instances. (Sorry, I'm not 100% sure if the correct terminology for this requirement is "consistency" or something more specific like having ACID property). Another requirement is of course is that it should be fast to access the cache, with about equal numbers of writes as reads. I do have access to a shared filesystem so that is a consideration.
I've looked at something like ehcache but in order to get a shared cache between the webapps I would either need to implement on top of a Terracotta environment or using the new ehcache cache server. The former (Terracotta) seems like overkill for this, while the cache web server seems like it wouldn't provide the fast performance that I want.
Another solution I've looked at is building something simple on top of a fast key-value store like Redis or memcachedb. Redis is in-memory but can easily be configured to be a centralized cache, while memcachedb is a disk-based persistent cache which could work because I have a shared filesystem.
I'm looking for suggestions on how to best solve this problem. The solution needs to be a relatively mature technology as it will be used in a production environment.
Thanks in advance!
I'm quite sure that you don't require terracotta or ehcache server if you need a distributed cache. Ehcache with one of the four replication mechanisms would do.
However, based on what you've written I guess that you're looking for more than just a cache. Memcached/Ehcache are examples of what you might call a caching layer for your application - nothing more.
If you find yourself using words like 'guaranteed' 'up-to-date' 'ACID' you're better off using an in-memory DB like Oracle Times Ten/MySQL Cluster/Redis with a disk-based persistent storage.
You can use memcached (not memcachedb) for fast and efficient caching. Redis or memcachedb could be an overkill unless you want persistent caching. Memcached can be clustered very easily and you can use spymemcached java client to access it. Memcacached is very mature and is running in several hundred thousands, if not millions of production servers. It can be monitored through Nagios and Munin systems when in production.
I am new to memcached and caching in general. I have a java web application running on Ubuntu + Tomcat + MySQL on a VPS Server with 1GB of memory.
Does it make sense to add a memcached layer with about 256MB for caching? Will this be too much load on the server? Which is more appropriate caching rendered html pages or database objects?
Please advise.
If you're going to cache pages, don't use memcached, use Varnish. However, there's a good chance that's not a great use of memory. Cacheing pages trades memory for computation and database work, but it does cost quite a lot of memory per page, so it's best for cases where the computation and database work needed to produce a single page amounts to a lot (or the pages are very small!). Also, consider that page cacheing won't be effective, or even possible, if you want to use per-user customisation on your pages (eg showing the number of items in a shopping cart). At least not without getting into some truly hairy shenanigans (edge-side includes, anyone?).
If you're not going to cache pages, and your app is on a single machine, then there's no point using memcached or similar. The point of cache servers like that is to make the memory on one machine work as a cache for another - like how a file server shares a disk, they're essentially memory servers. On a single machine, you might as well give all the memory to Java and cache objects on the heap.
Are you using an object-relational mapper? If so, see if it has any support for a second-level cache. The big three implementations (Hibernate, OpenJPA, and EclipseLink) all support in-memory caches. They're likely to do a much better job than you would if you did the cacheing yourself.
But, if you're not using a mapper, you have no choice but to do the cacheing yourself. There are extension points in LinkedHashMap for building LRU caches, and then of course there's the people's favourite, SoftReference, in combination with a HashMap. Plus, there are probably cache implementations out there you could download and use - i'd be shocked if there wasn't something in the Apache Commons libraries.
memcached won't add any noticeable load on your server, but it will be memory your app can't use. If you only plan to have a single app server for a while, you're better off using an in-JVM cache.
As far what to cache, the answer falls somewhere in the middle of the above. You don't want to cache exactly what's in your database and you certainly don't want to cache the final output. You have a data model representation in your application that isn't exactly what's in the DB (e.g. a User object might be made up of multiple queries from a few different tables). Cache that kind of thing as it's most reusable.
There's lots of info in the memcached site that should help you understand and get going with caching in general and memcached specifically.
It might make sense to do that, why don't try a smaller size like 64 MB and see how that goes. When you use more resources for the memcache, there is less for everything else. You should try it and see what will give you the best performance.
Is Terracotta a distributed cache?
Although you don't specify which product you are talking about, I'm going to assume you mean the open source platform itself. The short answer is no, but it can be used to write a distributed cache, and it has been in one of their own products (Ehcache).
You can see an overview of what the core engine is about here (it seems that they are hiding the information on their open source platform behind a registration wall now). It is a clustering engine that does not use J2EE technology, and its main purpose is to simplify distributed computing development. Besides caching, obvious use cases involve high availability and scalability needs. Think of it as enabling relatively plain java code to run "in the cloud" without having to worry about a lot of the details that that might involve.
Terracotta has nothing to do with 'caching' although most implementations use it for caching purpose.
Terracotta is about clustering and the terracotta itself is implemented using java (to my knowledge).
How Terracotta achieves clustering:
1) JVM1 running APP
2) JVM2 running APP (same)
3) JVM3 running APP (same)
Without Terracotta all JVMs are running independently with out knowing about each other performing some redundant tasks and maintaining their independent heaps
When you enable Terracotta (a Terracotta server running) across these 3 JVMs (configured to use Terracotta server)
Terracotta gives a logical view of all 3 JVMs as a single JVM. Any object graph that you designate to be stored at Server(Root ) is available to all 3 JVMs just like any local object but each JVM can can read/write to that object, whose changes are immediately(~) available to the other JVMs.
For this very reason Terracotta is used mainly for caching and distributed computing as idle JVMs with no work can process the work of the heavily loaded lagging JVM if the unit of work object is designated to be shareable.
Your question is unclear (Terracotta has several products) but yes, the Terracotta Platform does offer a solution for Distributed Caching.
L2 cache is the one that is external to a processor (a JVM, in our case) and shared among them. Serving as a transparent L2 cache, Terracotta combines your multicomputer into a multiporcessor. Thus, it is a distributed cache. But, you seem not to get it because you are SW guys who have never imagined that it can be transparent. You expect that a cache is a thing that has get/set methods and coherence problem that you need to resolve at application level.
Read the "Definite Guide to Terracotta". The authors are literal saying that Terracotta is a distributed cache. I think they understand this better than anybody who says "no" replying here.