Spring too high memory usage - java

I have a webapp that combines Spring + Hibernate. I deploy this locally in a Tomcat Server and I get memory consumptions of around 150-200 Mb which to my knowledge is not that high at all. But apparently in production deployment, which works over WebSphere, I am getting 500Mb of memory usage, and a very high frequency of garbage collection.
At first I blamed Hibernate for the problem (as seen in this other question) but now I am lost as to what might be causing this apparent high usage.
I tried profiling my app using MAT and these are my histogram results:
I haven't got much experience at profiling memory and most of the time I feel as if I was looking at the car's engine after a failure, but it seems to me that the char[] references are related to spring's internal "database" of beans. I use annotation based configuration for spring's mvc components and transactions:
<tx:annotation-driven />
<context:annotation-config />
Do you find any of these values to be exceptionally high? Can you help me tracing the problem up?
PS. this is my dominator for a heap dump nearer to a full heap

Both your problems are (usually) normal.
The garbage collector running often indicates you have a lot of processing going on, especially the kind that creates lots of objects (like jdbc). Unless your server becomes non-responsive during garbage collection, it's nothing to worry about. If it's not returning to the baseline memory (the bottom of the sawtooth memory graph) after garbage collection, you might have a leak.
High memory usage is also normal, especially if you've given more memory to the production JVM than your local one. Compare your startup parameters and make sure they are the same before trying to compare their performance results.
[i know this is not really an answer, but it's too long to fit in comments]

You do realise that (unless you're using Liberty Profile) WebSphere piles a huge number of additional jar files into your classpath compared to Tomcat? Therefore, a significantly larger memory footprint is absolutely to be expected under WebSphere.
Excessive garbage collection implies that you have not allocated enough memory to your application JVM. Give it significantly more (2 gig for instance) and watch the runtime memory usage graph via JConsole or a similar tool. Watch what memory consumption it reaches before a more relaxed garbage collection and use that to educate yourself on how much the JVM does really need.
Note that if after each garbage collection, the graph does not return to a safe baseline, but instead only reduces partially, then you probably have a memory leak.

I have found that I do have a high memory consumption for each rendered page, and have pinpointed the issue to Apache Tiles. If I use Tiles, each page load burdens the heap with up to 30 mb of memory usage, which does feel like a lot. Disabling it causes almost no memory consumption.
Debugging the controller methods shows that within the method itself almost no memory consumption is done, but it is upon returning control to the viewresolver (i.e. after the controller method ends) when the 30mb memory usage fires.
The problem now is, how to reduce this memory usage??

Related

Java memory cache

Is there any possibility to implement a cache in memory to avoid full heap consumption?
My spring-boot java application uses cache in memory with an expiration policy set to 1 hour (Caffeine library is used for caching purposes). After that time all cache instances are in the old generation and require a full GC to be collected. Now with XMX set to 10GB, I can see that after few hours of tests my cache contains around 100k instances, but in heap (exactly in the old generation) I can find a few millions instances of cached objects. Is there any possibility to use cache in memory and avoid such situation?
Problem which you described is call memory leaks.
Yes you Can, but it’s depends on which version GC you use.
For example in G1 this problem should not appear.
So, if that was possibile i recomend to you switch to G1.
XpauseTarget this flag is resposibility for avoid long pause in your system, so you Can split cleaning your heap to part.
Also you Can customize precent which demand run GC. -XX:InitiatingHeapOccupancyPercent=45
As you observed, caches and generational collectors have opposing goals. More modern collectors like G1 and Shenandoah are region-based which can allow them to better handle old gen collection. In the Shenandoah tech talks, you'll often hear their developers discussing an LRU cache as a stress test. This might not be a problem if your GC is well tuned.
You may be able to keep the cache data structures on heap, but move its entries off. That can be done by serializing the value to a ByteBuffer at the cost of access overhead. Another approach is offered by Apache Mnemonic which stores the object fields off-heap and transparently marshals the data. This avoids serialization costs but is invasive to the object model.
There are fully off-heap hash tables like Oak and caches like OHC. These move as much as possible outside of the GC, but there is a lot more overhead compared to an on-heap cache. This is comparable to using a remote cache, like memcached or redis, so that might be prefered. Memcached for instance uses slab allocation to very efficiently handle the memory churn.
Most often you'll see a small on-heap cache is used for fast local access of the most frequently used data that is backed by a large remote cache for everything else. If you truly need a multi-GB in-process cache, then off-heap might be necessary or you may have to tune your GC to accommodate this workload.
Objects in the cache are always there if the expiration is not set. What you can do is tuning JVM to avoid that situation, i.e, if you are using CMS, -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly, with these two options being set, JVM is forced to do full gc while old generation is over 75%.

How to monitor memory after major garbage collection via JMX or code

Many monitoring tools, like the otherwise phantastic JavaMelody, just monitor the current memory usage. If you want to check for memory leaks or impending out of memory situations, this is not particularily helpful, if you have an application that generates loads of garbage which gets collected immediately. Not perfect, but IMHO much more interesting, would it be to monitor the memory usage immediately after a major garbage collection. If that's high, a crash is looming over you.
So: can you find out the memory usage immediately after the last major garbage collection - either from Java code or via JMX? I know there are some tools like VisualVM which do this (which is no option for production use), and it can be written in the garbage collection log, but I'm looking for a more straightforward solution than parsing the garbage collection logfile. :-) To be clear: I'm looking for something that can easily be used in any application in production, not any expensive tool for debugging.
In case that matters: JDK 7 with -XX:+UseConcMarkSweepGC , but I am interested in general answers, too.
Information about memory available right after gc (youg or old) is available via JMX.
Garbage collector MBean has attribute LastGcInfo which is composite data object including information about memory pool sizes before and after GC.
In addition, starting with Java 7 JMX notification subscription could be used to receive GC events without polling.
You can find example of code working with GC MBean here.
Probably 'Dynatrace' is an option... it's a very powerful monitoring tool (not only for memory).
http://www.dynatrace.com/en/index.html
A very crude way would be to monitor the minima of Runtime.getRuntime().totalMemory() - Runtime.getRuntime().freeMemory() for some time. At least that would not require you to know intimate details about memory pools, as monitoring LastGcInfo in Alexey Ragozin's answer does. This might require you to get notifications about garbage collections.

Is garbage collection detrimental to the performance of this type of program

I'm building a program that will live on an AWS EC2 instance (probably) be invoked periodically via a cron job. The program will 'crawl'/'poll' specific websites that we've partnered with and index/aggregate their content and update our database. I'm thinking java is a perfect fit for a language to program this application in. Some members of our engineering team are concerned about the performance detriment of java's garbage collection feature, and are suggesting using C++.
Are these valid concerns? This is an application that will be invoked possible once every 30 minutes via cron job, and as long as it finishes its task within that time frame the performance is acceptable I would assume. I'm not sure if garbage collection would be a performance issue, since I would assume the server will have plenty of memory and the actual act of tracking how many objects point to an area of memory and then declaring that memory free when it reaches 0 doesn't seem too detrimental to me.
No, your concerns are most likely unfounded.
GC can be a concern, when dealing with large heaps & fractured memory (requires a stop the world collection) or medium lived objects that are promoted to old generation but then quickly de-referenced (requires excessive GC, but can be fixed by resizing ratio of new:old space).
A web crawler is very unlikely to fit either of the above two profiles - you probably don't need a massive old generation and should have relatively short lived objects (page representation in memory while you parse out data) and this will be efficiently dealt with in the young generation collector.
We have an in-house crawler (Java) that can happily handle 2 million pages per day, including some additional post-processing per page, on commodity hardware (2G RAM), the main constraint is bandwidth. GC is a non-issue.
As others have mentioned, GC is rarely an issue for throughput sensitive applications (such as a crawler) but it can (if one is not careful) be an issue for latency sensitive apps (such as a trading platform).
The typical concern C++ programmers have for GC is one of latency. That is, as you run a program, periodic GCs interrupt the mutator and cause spikes in latency. Back when I used to run Java web applications for a living, I had a couple customers who would see latency spikes in the logs and complain about it — and my job was to tune the GC to minimize the impact of those spikes. There are some relatively complicated advances in GC over the years to make monstrous Java applications run with consistently low latency, and I'm impressed with the work of the engineers at Sun (now Oracle) who made that possible.
However, GC has always been very good at handling tasks with high throughput, where latency is not a concern. This includes cron jobs. Your engineers have unfounded concerns.
Note: A simple experimental GC reduced the cost of memory allocation / freeing to less than two instructions on average, which improved throughput, but this design is fairly esoteric and requires a lot of memory, which you don't have on EC2.
The simplest GCs around offer a tradeoff between large heap (high latency, high throughput) and small heap (lower latency, lower throughput). It takes some profiling to get it right for a particular application and workload, but these simple GCs are very forgiving in a large heap / high throughput / high latency configuration.
Fetching and parsing websites will take way more time than the garbage collector, its impact will be probably neliglible. Moreover, the automatic memory management is often more efficient when dealing with a lot of small objects (such as strings) than a manual memory management via new/delete. Not talking about the fact that the garbage collected memory is easier to use.
I don't have any hard numbers to back this up, but code that does a lot of small string manipulations (lots of small allocations and deallocations in a short period of time) should be much faster in a garbage-collected environment.
The reason is that modern GC's "re-pack" the heap on a regular basis, by moving objects from an "eden" space to survivor spaces and then to a tenured object heap, and modern GC's are heavily optimized for the case where many small objects are allocated and then deallocated quickly.
For example, constructing a new string in Java (on any modern JVM) is as fast as a stack allocation in C++. By contrast, unless you're doing fancy string-pooling stuff in C++, you'll be really taxing your allocator with lots of small and quick allocations.
Plus, there are several other good reasons to consider Java for this sort of app: it has better out-of-the-box support for network protocols, which you'll need for fetching website data, and it is much more robust against the possibility of buffer overflows in the face of malicious content.
Garbage collection (GC) is fundamentally a space-time tradeoff. The more memory you have, the less time your program will need to spend performing garbage collection. As long as you have a lot of memory available relative to the maximum live size (total memory in use), the main performance hit of GC -- whole-heap collections -- should be a rare event. Java's other advantages (notably robustness, security, portability, and an excellent networking library) make this a no-brainer.
For some hard data to share with your colleagues showing that GC performs as well as malloc/free with plenty of available RAM, see:
"Quantifying the Performance of Garbage Collection vs. Explicit Memory Management", Matthew Hertz and Emery D. Berger, OOPSLA 2005.
This paper provides empirical answers to an age-old question: is
garbage collection faster/slower/the same speed as malloc/free? We
introduce oracular memory management, an approach that lets us measure
unaltered Java programs as if they used malloc and free. The result: a
good GC can match the performance of a good allocator, but it takes 5X
more space. If physical memory is tight, however, conventional garbage
collectors suffer an order-of-magnitude performance penalty.
Paper: PDF
Presentation slides: PPT, PDF

Difference between "on-heap" and "off-heap"

Ehcache talks about on-heap and off-heap memory. What is the difference? What JVM args are used to configure them?
The on-heap store refers to objects that will be present in the Java heap (and also subject to GC). On the other hand, the off-heap store refers to (serialized) objects that are managed by EHCache, but stored outside the heap (and also not subject to GC). As the off-heap store continues to be managed in memory, it is slightly slower than the on-heap store, but still faster than the disk store.
The internal details involved in management and usage of the off-heap store aren't very evident in the link posted in the question, so it would be wise to check out the details of Terracotta BigMemory, which is used to manage the off-disk store. BigMemory (the off-heap store) is to be used to avoid the overhead of GC on a heap that is several Megabytes or Gigabytes large. BigMemory uses the memory address space of the JVM process, via direct ByteBuffers that are not subject to GC unlike other native Java objects.
from http://code.google.com/p/fast-serialization/wiki/QuickStartHeapOff
What is Heap-Offloading ?
Usually all non-temporary objects you allocate are managed by java's garbage collector. Although the VM does a decent job doing garbage collection, at a certain point the VM has to do a so called 'Full GC'. A full GC involves scanning the complete allocated Heap, which means GC pauses/slowdowns are proportional to an applications heap size. So don't trust any person telling you 'Memory is Cheap'. In java memory consumtion hurts performance. Additionally you may get notable pauses using heap sizes > 1 Gb. This can be nasty if you have any near-real-time stuff going on, in a cluster or grid a java process might get unresponsive and get dropped from the cluster.
However todays server applications (frequently built on top of bloaty frameworks ;-) ) easily require heaps far beyond 4Gb.
One solution to these memory requirements, is to 'offload' parts of the objects to the non-java heap (directly allocated from the OS). Fortunately java.nio provides classes to directly allocate/read and write 'unmanaged' chunks of memory (even memory mapped files).
So one can allocate large amounts of 'unmanaged' memory and use this to save objects there. In order to save arbitrary objects into unmanaged memory, the most viable solution is the use of Serialization. This means the application serializes objects into the offheap memory, later on the object can be read using deserialization.
The heap size managed by the java VM can be kept small, so GC pauses are in the millis, everybody is happy, job done.
It is clear, that the performance of such an off heap buffer depends mostly on the performance of the serialization implementation. Good news: for some reason FST-serialization is pretty fast :-).
Sample usage scenarios:
Session cache in a server application. Use a memory mapped file to store gigabytes of (inactive) user sessions. Once the user logs into your application, you can quickly access user-related data without having to deal with a database.
Caching of computational results (queries, html pages, ..) (only applicable if computation is slower than deserializing the result object ofc).
very simple and fast persistance using memory mapped files
Edit: For some scenarios one might choose more sophisticated Garbage Collection algorithms such as ConcurrentMarkAndSweep or G1 to support larger heaps (but this also has its limits beyond 16GB heaps). There is also a commercial JVM with improved 'pauseless' GC (Azul) available.
The heap is the place in memory where your dynamically allocated objects live. If you used new then it's on the heap. That's as opposed to stack space, which is where the function stack lives. If you have a local variable then that reference is on the stack.
Java's heap is subject to garbage collection and the objects are usable directly.
EHCache's off-heap storage takes your regular object off the heap, serializes it, and stores it as bytes in a chunk of memory that EHCache manages. It's like storing it to disk but it's still in RAM. The objects are not directly usable in this state, they have to be deserialized first. Also not subject to garbage collection.
In short picture
pic credits
Detailed picture
pic credits
Not 100%; however, it sounds like the heap is an object or set of allocated space (on RAM) that is built into the functionality of the code either Java itself or more likely functionality from ehcache itself, and the off-heap Ram is there own system as well; however, it sounds like this is one magnitude slower as it is not as organized, meaning it may not use a heap (meaning one long set of space of ram), and instead uses different address spaces likely making it slightly less efficient.
Then of course the next tier lower is hard-drive space itself.
I don't use ehcache, so you may not want to trust me, but that what is what I gathered from their documentation.
The JVM doesn't know anything about off-heap memory. Ehcache implements an on-disk cache as well as an in-memory cache.

What versions of java are slow for gc logging?

I've been told by my company's support team that some versions of java have a significant performance impact when we turn on -verbose:gc. However I can't figure out if this is the case or not.
Was this logging slow(ish) at some point, and when did it stop?
The reason I ask is that there's some hesitation about applying this to a production environment to investigate potential memory leaks (and whether we can stop doing periodic restarts of the system...).
Specifically I'm talking about Java 1.4.2 which I think introduced the argument, and what service pack it applies up to.
I know you asked about the impact of verbose:gc (Amir is correct), but based on the comments I see you are investigating a memory leak.
Is it possible for you to get a histogram of your environment? verbose GC will only show you that there is a memory leak, not where the memory is sitting.
you mention java 1.4.2, is that your current version? If you are using 1.5 or higher you can use
jmap -histo <pid> > file.txt
This will give you a breakdown of all the objects in memory. You will freeze your JVM for a time dependent on the amount of memory in the system. (2GB can freeze for a minute or so on even good hardware) test this on a development system first. I know you don't want to impact your production environment but this is a necessary evil to find the source of the problem. Do a capture right before the periodic restart to lesson your impact.
I suggest that you do the following:
Write some benchmark that is likely to stress the garbage collection. (Create large linked data structures with weak references, etc, etc).
Install a copy of the same version of the JVM as you are using in production on some test box.
Run the benchmark with various GC logging settings, including the settings that you want to run in production, measuring the performance impact on the benchmark.
If you do this right, it will give you some solid evidence about what the likely performance impact will be for your production server.

Categories