How is the performance of BigMemory of Enterprise Ehcache compared to Diskstore of Ehcache Community Edition used with RAM disk?
Big Memory permits caches to use an additional type of memory store outside the object heap there by reducing the overhead of GC, had we used all of RAM in object heap. Serialization and deserialization does take place on putting and getting from this off-heap store.
Similarly Diskstore is also second level cache that stores the serialized object on disk.
On the link above it is mentioned that off-heap store is two order of magnitude faster then Diskstore. What happens if I configure the Diskstore to store data in RAM Disk? Will BigMemory still have noticeable performance benefit?
Are there some other optimizations done by BigMemory? Has anyone come across any such experiments that compare the two approaches?
Following is excerpt of the answer given to this question on terracotta forum.
"The three big problems I'd expect you to face with open source (community edition) Ehcache disk stores are: Firstly in open source only the values are stored on disk - the keys and the meta data to map keys to values is still stored in heap (which is not true for BigMemory). This means the heap would still be the limiting factor on cache size. Secondly the open source disk store is designed to be backed by a single (conventionally spinning disk - although some people do use SSD drives now), this means the backend is less concurrent (especially with regard to writing) than Enterprise BigMemory since the bottleneck is expected to be at the hardware level. Thirdly the serialization performed by the open source disk store is less space efficient so serialized values have much larger overheads."
Related
Is there any possibility to implement a cache in memory to avoid full heap consumption?
My spring-boot java application uses cache in memory with an expiration policy set to 1 hour (Caffeine library is used for caching purposes). After that time all cache instances are in the old generation and require a full GC to be collected. Now with XMX set to 10GB, I can see that after few hours of tests my cache contains around 100k instances, but in heap (exactly in the old generation) I can find a few millions instances of cached objects. Is there any possibility to use cache in memory and avoid such situation?
Problem which you described is call memory leaks.
Yes you Can, but it’s depends on which version GC you use.
For example in G1 this problem should not appear.
So, if that was possibile i recomend to you switch to G1.
XpauseTarget this flag is resposibility for avoid long pause in your system, so you Can split cleaning your heap to part.
Also you Can customize precent which demand run GC. -XX:InitiatingHeapOccupancyPercent=45
As you observed, caches and generational collectors have opposing goals. More modern collectors like G1 and Shenandoah are region-based which can allow them to better handle old gen collection. In the Shenandoah tech talks, you'll often hear their developers discussing an LRU cache as a stress test. This might not be a problem if your GC is well tuned.
You may be able to keep the cache data structures on heap, but move its entries off. That can be done by serializing the value to a ByteBuffer at the cost of access overhead. Another approach is offered by Apache Mnemonic which stores the object fields off-heap and transparently marshals the data. This avoids serialization costs but is invasive to the object model.
There are fully off-heap hash tables like Oak and caches like OHC. These move as much as possible outside of the GC, but there is a lot more overhead compared to an on-heap cache. This is comparable to using a remote cache, like memcached or redis, so that might be prefered. Memcached for instance uses slab allocation to very efficiently handle the memory churn.
Most often you'll see a small on-heap cache is used for fast local access of the most frequently used data that is backed by a large remote cache for everything else. If you truly need a multi-GB in-process cache, then off-heap might be necessary or you may have to tune your GC to accommodate this workload.
Objects in the cache are always there if the expiration is not set. What you can do is tuning JVM to avoid that situation, i.e, if you are using CMS, -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly, with these two options being set, JVM is forced to do full gc while old generation is over 75%.
Can anyone explain what is the difference between Heap Cache and normal Cache? are both same regarding speed and high availability or there is any difference?
Please refer to the ehCache documentation. Java organizes most of its objects within a heap structure. This means when using an "heap cache", ehCache stores the data there.
Additionally there is an "off-heap cache", meaning some other structures and memory locations are used to store the data. This might be useful for bigger caches and a separation from the normal Java heap.
I have a question with respect to performance optimization.
Which is faster with respect to retrieving, from a Cache or from Java's heap?
According to the definition which I got :-
https://www.google.co.in/search?client=ubuntu&channel=fs&q=cache+vs+heap&ie=utf-8&oe=utf-8&gfe_rd=cr&ei=G7V1Ve-xDoeCoAP6goHACg#channel=fs&q=difference+between+cache+and+RAM
And if storing my data in cache via my java code is faster than storing it in java heap, then should we always store data in cache if required for faster access for complex computations and results.?
Kindly guide which one is faster and the use case scenarios as to when what to be used over the other..
Thanks
You mix up different concepts.
The quote is:
The difference between RAM and cache is its performance, cost, and proximity to the CPU. Cache is faster, more costly, and closest to the CPU. Due to the cost there is much less cache than RAM. The most basic computer is a CPU and storage for data.
This is about Computer architecture and applies for all computers, regardless what programming language you are using. There is no way to directly control what data is inside the cache. The CPU cache will hold data that is requested very often automatically. Programmers can improve their programs to make it more "friendly" to a particular hardware architecture. For example if the CPU has only a small cache, the code could be optimized to work on a smaller data set.
A Java Cache is something different. This is a library that caches Java objects, e.g. to save requests to an external service. A Java Cache, can store the object data in heap, outside the heap in separate memory or disk. Inside the heap has fastest access, since for any storage outside the heap the Objects need to be converted to byte streams (Called serialization or marshalling)
For an upcoming project I will keep a large amount of data (up to 10GB) in RAM, but not as a cache. Is is possible to use BigMemory (in particular Go, i.e. the free edition) without EH Cache, simply as a non garbage collected memory storage? I have not found a clear answer in the docs, which mostly talk about the typical integration with EHCache.
Thank you.
Yes, EhCache is the API for BigMemory:
BigMemory Go currently uses Ehcache as its user-facing data access API.
Basically, the way BigMemory has been designed is as sort of another storage tier. You store things in the heap exceeding which you store things offheap (which is the bigmemory) and then exceeding which you store things on the disk. It makes sense to do so because in the nosql paradigm where we want to store bigdata; things work well if they are in key-value form. You can choose to store any kind of value by just making it serializable.
As for your constraint of "not as a cache", its very much possible to configure the cache so that values don't get evicted from the memory. Anyways if you use BigMemory Go, you get a limit of 32GB so storing 10GB won't trigger any eviction algorithms even without any configuration.
Ehcache talks about on-heap and off-heap memory. What is the difference? What JVM args are used to configure them?
The on-heap store refers to objects that will be present in the Java heap (and also subject to GC). On the other hand, the off-heap store refers to (serialized) objects that are managed by EHCache, but stored outside the heap (and also not subject to GC). As the off-heap store continues to be managed in memory, it is slightly slower than the on-heap store, but still faster than the disk store.
The internal details involved in management and usage of the off-heap store aren't very evident in the link posted in the question, so it would be wise to check out the details of Terracotta BigMemory, which is used to manage the off-disk store. BigMemory (the off-heap store) is to be used to avoid the overhead of GC on a heap that is several Megabytes or Gigabytes large. BigMemory uses the memory address space of the JVM process, via direct ByteBuffers that are not subject to GC unlike other native Java objects.
from http://code.google.com/p/fast-serialization/wiki/QuickStartHeapOff
What is Heap-Offloading ?
Usually all non-temporary objects you allocate are managed by java's garbage collector. Although the VM does a decent job doing garbage collection, at a certain point the VM has to do a so called 'Full GC'. A full GC involves scanning the complete allocated Heap, which means GC pauses/slowdowns are proportional to an applications heap size. So don't trust any person telling you 'Memory is Cheap'. In java memory consumtion hurts performance. Additionally you may get notable pauses using heap sizes > 1 Gb. This can be nasty if you have any near-real-time stuff going on, in a cluster or grid a java process might get unresponsive and get dropped from the cluster.
However todays server applications (frequently built on top of bloaty frameworks ;-) ) easily require heaps far beyond 4Gb.
One solution to these memory requirements, is to 'offload' parts of the objects to the non-java heap (directly allocated from the OS). Fortunately java.nio provides classes to directly allocate/read and write 'unmanaged' chunks of memory (even memory mapped files).
So one can allocate large amounts of 'unmanaged' memory and use this to save objects there. In order to save arbitrary objects into unmanaged memory, the most viable solution is the use of Serialization. This means the application serializes objects into the offheap memory, later on the object can be read using deserialization.
The heap size managed by the java VM can be kept small, so GC pauses are in the millis, everybody is happy, job done.
It is clear, that the performance of such an off heap buffer depends mostly on the performance of the serialization implementation. Good news: for some reason FST-serialization is pretty fast :-).
Sample usage scenarios:
Session cache in a server application. Use a memory mapped file to store gigabytes of (inactive) user sessions. Once the user logs into your application, you can quickly access user-related data without having to deal with a database.
Caching of computational results (queries, html pages, ..) (only applicable if computation is slower than deserializing the result object ofc).
very simple and fast persistance using memory mapped files
Edit: For some scenarios one might choose more sophisticated Garbage Collection algorithms such as ConcurrentMarkAndSweep or G1 to support larger heaps (but this also has its limits beyond 16GB heaps). There is also a commercial JVM with improved 'pauseless' GC (Azul) available.
The heap is the place in memory where your dynamically allocated objects live. If you used new then it's on the heap. That's as opposed to stack space, which is where the function stack lives. If you have a local variable then that reference is on the stack.
Java's heap is subject to garbage collection and the objects are usable directly.
EHCache's off-heap storage takes your regular object off the heap, serializes it, and stores it as bytes in a chunk of memory that EHCache manages. It's like storing it to disk but it's still in RAM. The objects are not directly usable in this state, they have to be deserialized first. Also not subject to garbage collection.
In short picture
pic credits
Detailed picture
pic credits
Not 100%; however, it sounds like the heap is an object or set of allocated space (on RAM) that is built into the functionality of the code either Java itself or more likely functionality from ehcache itself, and the off-heap Ram is there own system as well; however, it sounds like this is one magnitude slower as it is not as organized, meaning it may not use a heap (meaning one long set of space of ram), and instead uses different address spaces likely making it slightly less efficient.
Then of course the next tier lower is hard-drive space itself.
I don't use ehcache, so you may not want to trust me, but that what is what I gathered from their documentation.
The JVM doesn't know anything about off-heap memory. Ehcache implements an on-disk cache as well as an in-memory cache.