Understanding Heap graph from visualvm - java

I have a program which essentially reads many files (mixed PDF usually below 10 MB + XML size of Kb) from disk and upload them one at a time into a legacy system. I let it run under visualvm when going on weekend, and this morning I came back to a graph showing that the heap usage was rather uneven in the hour it took the program to run. I was expecting a roughly level amount of heap usage over time.
I am looking for an explanation for this. This is with 64-bit Oracle JVM under Ubuntu 17.04 (Desktop, not server) with 32 GB RAM on a 4-core i5-2400 (no hyperthreading). The program is essentially single-threaded utilizing about 50% of a core and took an expected time to run.
I understand that if memory is not fully used it is released over time. I do not understand that the usage goes down over time, as the load should be quite evenly distributed. Am I seeing the result of the CPU throttling as the system is otherwise idle? Is some JVM optimization kicking in?

Actual GC logs would be more helpful, but it's probably just the GC heuristics adjusting their decisions about young and old heap size to meet throughput and pause time goals.
As long as you don't place limits on the JVM it will happily use more memory than the bare minimum to keep the program running. If it did not you would experience very very frequent and inefficient GCs. In other words, heap size = n * live object set size with n > 1. Depending on the collector used there may be no upper limit for n other than available memory since ParallelGC defaults MaxHeapFreeRatio to 100 while G1 uses 70.
The heuristics are far from perfect, they can paint themselves into a corner in some edge-cases or unnecessarily fluctuate between different equilibrium points. But the graph itself does not indicate a problem.

Related

JVM Garbage Collector suddenly consumes 100% CPU after running for several hours

I've got a strange problem in my Clojure app.
I'm using http-kit to write a websocket based chat application.
Client's are rendered using React as a single page app, the first thing they do when they navigate to the home page (after signing in) is create a websocket to receive things like real-time updates and any chat messages. You can see the site here: www.csgoteamfinder.com
The problem I have is after some indeterminate amount of time, it might be 30 minutes after a restart or even 48 hours, the JVM running the chat server suddenly starts consuming all the CPU. When I inspect it with NR (New Relic) I can see that all that time is being used by the garbage collector -- at this stage I have no idea what it's doing.
I've take a number of screenshots where you can see the effect.
You can see a number of spikes, those spikes correspond to large increases in CPU usage because of the garbage collector. To free up CPU I usually have to restart the JVM, I have been relying on receiving a CPU alert from NR in my slack account to make sure I jump on these quickly....but I really need to get to the root of the problem.
My initial thought was that I was possibly holding onto the socket reference when the client closed it at their end, but this is not the case. I've been looking at socket count periodically and it is fairly stable.
Any ideas of where to start?
Kind regards, Jason.
It's hard to imagine what could have caused such an issue. But at first what I would do is taking a heap dump at the time of crash. This can be enabled with -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=<path_to_your_heap_dump> JVM args. As a general practice don't increase heap size more the size of physical memory available on your server machine. In some rare cases JVM is unable to dump heap space because process is doomed; in such cases you can use gcore(if you're on Linux, not sure about Windows).
Once you grab the heap dump, analyse it with mat, I have debugged such applications and this worked perfectly to pin down any memory related issues. Mat allows you to dissect the heap dump in depth so you're sure to find the cause of your memory issue if it is not the case that you have allocated very small heap space.
If your program is spending a lot of CPU time in garbage collection, that means that your heap is getting full. Usually this means one of two things:
You need to allocate more heap to your program (via -Xmx).
Your program is leaking memory.
Try the former first. Allocate an insane amount of memory to your program (16GB or more, in your case, based on the graphs I'm looking at). See if you still have the same symptoms.
If the symptoms go away, then your program just needed more memory. Otherwise, you have a memory leak. In this case, you need to do some memory profiling. In the JVM, the way this is usually done is to use jmap to generate a heap dump, then use a heap dump analyser (such as jhat or VisualVM) to look at it.
(Fair disclosure: I'm the creator of a jhat fork called fasthat.)
Most likely your tenure space is filling up triggering a full collection. At this time the GC uses all the CPUS for sometime seconds at time.
To diagnose why this is happening you need to look at your rate of promotion (how much data is moving from young generation to tenured space)
I would look at increasing the young generation size to decrease rate of promotion. You could also look at using CMS as this has shorter pause times (though it uses more CPU)
Things to try in order:
Reduce the heap size
Count the number of objects of each class, and see if the numbers makes sense
Do you have big byte[] that lives past generation 1?
Change or tune GC algorithm
Use high-availability, i.e. more than one JVM
Switch to Erlang
You have triggered a global GC. The GC time grows faster-than-linear depending on the amount of memory, so actually reducing the heap space will trigger the global GC more often and make it faster.
You can also experiment with changing GC algorithm. We had a system where the global GC went down from 200s (happened 1-2 times per 24 hours) to 12s. Yes, the system was at a complete stand still for 3 minutes, no the users were not happy :-) You could try -XX:+UseConcMarkSweepGC
http://www.fasterj.com/articles/oraclecollectors1.shtml
You will always have stops like this for JVM and similar; it is more about how often you will get it, and how fast the global GC will be. You should make a heap dump and get the count of the different objects of each class. Most likely, you will see that you have millions of one of them, somehow, you are keeping a pointer to them unnecessary in a ever growing cache or sessions or similar.
http://docs.oracle.com/javase/8/docs/technotes/guides/troubleshoot/memleaks001.html#CIHCAEIH
You can also start using a high-availability solution with at least 2 nodes, so that when one node is busy doing GC, the other node will have to handle the total load for a time. Hopefully, you will not get the global GC on both systems at the same time.
Big binary objects like byte[] and similar is a real problem. Do you have those?
At some time, these needs to be compacted by the global GC, and this is a slow operation. Many of the data-processing JVM based solution actually avoid to store all data as plain POJOs on the heap, and implement heaps themselves in order to overcome this problem.
Another solution is to switch from JVM to Erlang. Erlang is near real time, and they got by not having the concept of a global GC of the whole heap. Erlang has many small heaps. You can read a little about it at
https://hamidreza-s.github.io/erlang%20garbage%20collection%20memory%20layout%20soft%20realtime/2015/08/24/erlang-garbage-collection-details-and-why-it-matters.html
Erlang is slower than JVM, since it copies data, but the performance is much more predictable. It is difficult to have both. I have a websocket Erlang based solution, and it really works well.
So you run into a problem that is expected and normal for JVM, Microsoft CLR and similar. It will get worse and more common during the next couple of years when heap sizes grows.

Increasing heap memory utilization on Java Tomcat application

I have a Java application running on a Tomcat7 instance. I am using Java 8.
Now within my application there is a webservice of the format : http://zzz.xxx.zzz.xxx/someWebservice?somepar1=rrr&somePar2=yyy. This returns a String value of max 10 characters.
I have now started load testing this service using Jmeter. I am putting a load of 100 concurrent connections and getting a throughput of roughly 150 requests/second. The server is a 4 core-6GB machine and is only running Tomcat (application instance). The database instance is running on a separate machine. JVM is running with min 2GB and Max 4 GB memory allocation. Max Perm Size is 512 MB. Tomcat has enough threads to cater to my load (Max connections/threads/executor values have been setup correctly)
I am now trying to optimize this service and in order to do so I am trying to analyze memory consumption. I am using JConsole for the same. My CPU usage is not a concern , but when I look at the memory (HEAP) usage, I feel something is not right. What I observe is a sawtooth shaped graph, which I know is correct as regular GC clears heap memory.
My concern is that this sawtooth shaped graph has an upward trend. I mean that the troughs in the sawtooth seem to be increasing over time. With this trend eventually my server reaches max heap memory in an hour or so and then stabilizes at 4GB. I believed that if I am putting a CONSTANT load, then the heap memory utilization should have a constant trend, i.e. a saw tooth graph with its peaks and troughs aligned. If there is an upward trend I am suspecting that there is a memory leak of variables which are collecting over time and since GC isn't able to clear them there is an increase over a period of time. I am attaching a screenshot.
Heap Usage
Questions:
1). Is this normal behavior? If yes, then why does the heap continuously increase despite no change in load? I don't believe that a load of 100 threads should saturate 4GB heap in roughly 30 minutes.
2). What could be the potential reasons here? Do I need to look at memory leaks? Any JVM analyzer apart from JConsole which can help me pinpoint the variables which the GC is unable to clear?
The see-saw pattern most likely stems from minor-collections, the dip around 14:30 then is a major collection, which you did not take into account when doing your reasoning.
Your load may simply be so low that it needs a long time to reach a stable state.
With this trend eventually my server reaches max heap memory in an hour or so and then stabilizes at 4GB.
Supports that conclusion if you're not seeing any OOMEs.
But there's only so much one can deduce from such charts. If you want to know more you should enable GC logging and inspect the log outputs instead.

java heap size grows by portions up to some limit

I am writing a simple server application in Java, and I'm making some benchmarks using Apache benchmark now.
Right after the start the resident memory used by server is about 40M. I make the series of 1000 requests (100 concurrent requests). After each series I check the memory usage again. The result seems very strange to me.
During the first runs of benchmark, the 99% of requests are processed in ≈20 msecs and the rest 1% of them in about 300 msecs (!). Meantime the memory usage grows. After 5-7 runs this growth stops at 120M and requests begin to run much faster - about 10 msecs per request. Moreover, thee time and mem values remain the same when I increase the number of requests.
Why could this happen? If there was a memory leak then my server would require more and more resources. I could only suggest that this is because of some adaptive JVM memory allocation mechanism which increases the heap size in advance.
The heap starts out at Xms and grows as needed up to the specified limit Xmx.
It is normal for the JVM to take a while to "warn up" - all sorts of optimisations happen at run time.
If the memory climbs up to a point and then stops climbing, you are fine.
If the memory climbs indefinitely and the program eventually throws an OutOfMemoryError then you have a problem.
From your description it looks as if the former is true, so no memory leak.

Tuning Garbage Collection parameters in Java

I have a server java component which has a huge memory demand at startup which mellows down gradually. So as an example at startup the memory requirement might shoot unto 4g; which after the initial surge is over will go down to 2g. I have configured the component to start with memory of 5g and the component starts well; the used memory surges upto 4g and then comes down close to 2g. The memory consumed by the heap at this point still hovers around 4g and I want to bring this down (essentially keep the free memory down to few hundred mb's rather than 2g. I tried using the MinFreeHeapRatio and MaxFreeHeapRatio by lowering them down from the default values but this resulted in garbage collection not being triggered after the initial run during the initial spike and the used memory stayed at a higher than usual level. Any pointers would greatly help.
First, I ask why you are worried about freeing up 2 GB of ram on a server? 2GB of ram is about $100 or less. If this is on a desktop I guess I can understand the worry.
If you really do have a good reason to think about it, this may be tied to the garbage collection algorithm you are using. Some algorithms will release unused memory back to the OS, some will not. There are some graphs and such related to this at http://www.stefankrause.net/wp/?p=14 . You might want to try the G1 collector, as it seems to release memory back to the OS easily.
Edit from comments
What if they all choose to max their load at once? Are you ok with some of them paging memory to disk and slowing the server to a crawl? I would run them on a server with enough memory to run ALL applications at max heap, + another 4-6GB for the OS / caching. Servers with 32 or 64 GB are pretty common, and you can get more.
You have to remember that the JVM reserves virtual memory on startup and never gives it back to the OS (until the program exits) The most you can hope for is that unused memory is swapped out (this is not a good plan) If you don't want the application to use more than certain amount of memory, you have to restrict it to that amount.
If you have a number of components which use a spike of memory, you should consider re-writing them to use less memory on startup or use multiple components in the same JVM so the extra memory is less significant.

Tuning JVM (GC) for high responsive server application

I am running an application server on Linux 64bit with 8 core CPUs and 6 GB memory.
The server must be highly responsive.
After some inspection I found that the application running on the server creates rather a huge amount of short-lived objects, and has only about 200~400 MB long-lived objects(as long as there is no memory leak)
After reading http://java.sun.com/javase/technologies/hotspot/gc/gc_tuning_6.html
I use these JVM options
-server -Xms2g -Xmx2g -XX:MaxPermSize=256m -XX:NewRatio=1 -XX:+UseConcMarkSweepGC
Result: the minor GC takes 0.01 ~ 0.02 sec, the major GC takes 1 ~ 3 sec
the minor GC happens constantly.
How can I further improve or tune the JVM?
larger heap size? but will it take more time for GC?
larger NewSize and MaxNewSize (for young generation)?
other collector? parallel GC?
is it a good idea to let major GC take place more often? and how?
Result: the minor GC takes 0.01 ~ 0.02 sec, the major GC takes 1 ~ 3 sec the minor GC happens constantly.
Unless you are reporting pauses, I would say that the CMS collector is doing what you have asked it to do. By definition, CMS will use a larger percentage of the CPU than the Serial and Parallel collectors. This is the penalty you pay for low pause times.
If you are seeing 1 to 3 second pause times, I'd say that you need to do some tuning. I'm no expert, but it looks like you should start by reducing the value of CMSInitiatingOccupancyFraction from the default value of 92.
Increasing the heap size will improve the "throughput" of the GC. But if your problem is long pauses, increasing the heap size is likely to make the problem worse.
Careful .... GC can be a hairy subject if you are not cautious. Within any runtime (JVM for Java / CLR for .Net) there are several processes that take place. Generally there is an early stage optimization of memory (Young Generational Garbage Collection / Young Gen GC & Old Generational Garbage Collection / Old Gen GC). The young gen gc happens on a regular basis and is commonly attributed to your smaller pauses / hiccups. The old gen gc is normally what is going on when you see the long "stop the world" pauses.
Why you might ask? The reason you get pauses with your runtime / JVM is that when the runtime does its cleanup of the Heap it has to go through what is called a phase change. It stops the threads running your application in order to mark and swap pointers to optimize your available memory. Yong gen is faster as it is mainly releasing objects that are only temporary. Old gen, however, evaluates all the objects on the heap and when you run out of memory will it will kick of to free up much needed memory.
Why the Caution? Old gen gets exponentially worse in pause time the more heap you use. at 2-4 GB in total heap size you should be fine on modern runtimes like Java 6 (JDK 1.6+). Once you go beyond that threashold you will see exponential increases in pause times. I have run into some clients that have to restart servers - as in some rare cases where a heap is large GC pause times can take longer than a full restart.
There are some new tools out there that are pretty cool and can give you a leading edge on evaluating if GC is your pain. JHiccup is one and it is free from the azulsystemswebsite. At this time I think it is only for Linux though. They also have a JVM that has a re-built GC algorithm that runs pauseless ... but if you are on a single server deployment with a non-critical app it may not be cost effective (that one is not free).
To sum up - if your runtime / JVM / CLR heap is less than 2 GB adding more memory will help. Be sure to give yourself some overhead. You never want to hit 100% Heap size / memory size if possible. That is when the long pauses are the longest. Give yourself an extra 20%+ memory over what you think you will need. That way you have room for the GC algorithms to move objects around for optimization. If you ever plan to go large ... there is one tool that fixes the circa 1990 JVM technology (Azul Systems Zing JVM), but it is not free. They do offer an open source tool to diagnose GC issues. The JVM (as I have tried it) also has a really cool thread level visibility tool that lets you report on any leaks, bugs, or locks in production without overhead (some trick with offloading data the JVM already deals with and time stamping). That has saved tons of dev test time ... but again, not for small apps.
Stay below 4 GB. Give extra headroom. And if you want you can turn on these flags to monitor GC for Java / JVM:
java -verbose:gc myProgram
java -Xloggc:D:/log/myLogFile.log -XX:+PrintGCDetails myProgram
You may try some of the other collectors Hotspot uses. There are more than one.
If you are on Linux go ahead and try the JHiccup tool as well. It is free.
You may be interested in trying the low-pause Garbage-First collector instead of concurrent mark-sweep (although it's not necessarily more performant for all collections, it's supposed to have a better worst case). It's enabled by -XX:+UseG1GC and is supposed to be really awesomesauce, but you may want to give it a thorough evaluation before using it in production. It has probably improved since, but it seems to have been a bit buggy a year ago, as seen in Experience with JDK 1.6.x G1 (“Garbage First”)
It is perfectly fine for the garbage collection to run in parallel with your program, if you have plenty of cpu, which you do.
What you want, is to make absolutely certain that you will not run into a scenario where the garbage collection PAUSES your main program.
Have you tried simply not stating any flags except saying you want the server VM (for the Sun JVM), and then put your server under heavy load to see how it behaves? Only then can you see, if you get any improvements from tinkering with options.
This actually sounds like a throughput app and should probably use the throughput collector. I would balance the size of the new gen making it big enough to not GC too often and small enough to prevent long pauses. 20ms sounds like a long minor GC to me. I also suspect your survivor space is set too large and is just being wasted. If you don't have much surviving to old gen, you shouldn't have that much surviving your minor GCs.
In the end, you should use jvmstat and VisualGC to really get a feel for how your app is using memory.
For high responsive server application, I think you want to see the major GC happens less frequently. Here is the list of parameters would help.
-XX:+CMSParallelRemarkEnabled
-XX:+CMSScavengeBeforeRemark
-XX:+UseCMSInitiatingOccupancyOnly
-XX:CMSInitiatingOccupancyFraction=50
-XX:CMSWaitDuration=300000
-XX:GCTimeRatio=40
Larger heap size may not help on low pause, as long as your app doesn't run out of memory.
Larger NewSize and MaxNewSize would help on throughput, may not help on low pause. If you choose to take this approach, you may consider give GC threads more execution time by setting -XX:GCTimeRatio lower. The point is to remember to take a holistic when tuning JVM.
I think previous posters have missed something very obvious- the Perm Generation size is too low. If the system uses 200 to 400 MB as permanent generation- then it may be best to set Max Perm Gen to 400 MB. the PerGen size should also be set to the same value. You will then never run out of Permanent Generation Space.
Currently- it looks like the JVM has to spend a lot of time moving objects in and out of Permanent Generation. This can take time. JVM tries to allocate contiguous memory areas to Java Objects- this speeds memory access due to hardware level features. In order to do that, it is very helpful to have plenty of buffer in memory. If Permanent Generation is almost full, newly discovered permanent objects must be split or existing objects must be shuffled. This is what triggers a full GC, as well as causes long full GC pauses.
The question states that the Permanent Generation size has already been measured- if this has not been done, it should be measured using a tool. These tools process logs generated by the JVM with the verboseGC option turned on.
All the mark and sweep options listed above- may not be needed with this basic improvement.
People throw GC options as solutions without evaluating how mature they have proven to be in actual use.

Categories