Major GC decreasing performance

Major GC decreasing performance - java

We are having frecuent outages in our app, basically the heap grows over time to the point were the GC takes a lot of CPU time and execute for several minutes, decreasing the app perfomance drastically. The app is in JSF with a tomcat server.
In the mean time, we:
Increased the heap size from 15G to 26G (-Xms27917287424 -Xmx27917287424)
Take several heap dumps (we are trying to determine the problem using these)
Activated GC logs
With the heap size increase, GC is not executing for that much time but still takes a lot of CPU and frezees the app.
So the question is:
Is this normal? When the GC executes it frees memory, so i think this probably isn't a memory leak (Am I right?)
Is there a way of optimize the GC or maybe this behavior is just a sympthom of something wrong in the app itself?
How can I monitor and analyze this without taking a heap dump?
UPDATE:
I changed JSF from 2.2 to 2.3 because some heap dumps were pointing that JSF was using a lot of memory.
That didn't work out, and yesterday we had and outage again, but this time a little different (from my point of view). Also this time, we had to reset tomcat because the app didn't work anymore after a while
In this case, the garbage collector is running when de old gen heap is not full, and the new generation GC is running all the time.
¿What can be the cause of this?

As has been said in the comments, the behaviour of the application does not look unreasonable. Your code is continually allocating objects that leads to heap space filling up, causing the GC to run. There does not appear to be a memory leak since GC reclaims a lot of space and the overall used space is not continually increasing.
What does appear to be an issue is that a significant number of objects are being promoted to the old-gen before being collected. Major GC cycles are more expensive in terms of CPU due to the relocation and remapping of objects (assuming you're using a compacting algorithm).
To reduce this, you could try increasing the size of the young generation. This will have happened when you increased the overall heap size but not by enough. Ideally, you want the majority of objects to be collected during a minor GC cycle since this is effectively free (the GC does nothing to the objects in Eden space as they are collected). You can do this with the -XX:NewRatio= or -XX:NewSize= flags. You could also try changing the survivor space sizes, again to increase the number of objects collected before tenuring. (use the -XX:SurvivorRatio= flag for this).
For monitoring, I find Flight Recorder and Mission Control very useful as you can drill down into details of how many objects of specific types are allocated. It's also easy to connect to a running JVM or take dumps for later analysis.

Related

Java not releasing unused heap memory

I am facing an issue where my Java application heap increases with an increased no. of requests to the application but then it does not release even the unused heap memory.
Here is the description:
My java application starts with a heap memory of 200MB out of which around 100MB is in use.
As the no. of requests increases, the heap memory usage goes up to 1GB.
Once the requests processing is finished, the used heap memory drops back to normal but the unused/free heap space remains 1GB.
I have tried to use -XX:-ShrinkHeapInSteps, -XX:MinHeapFreeRatio and -XX:MaxHeapFreeRatio JVM arguments but was not able to solve this.
Note: If I try to run the Garbage Collector manually then it lowers the unused heap memory also.
Please suggest how we can lower the unused heap memory.

The used heap will not return if the -Xms is high. -Xms essentially overrides the FreeRation. Now there are other factors to consider, in the case of parallel GC you can't shrink the heap as parallel GC doesn't allow that.
Also, JVM can only relinquish the memory after the fullGC if parallelGC is not used.
So essentially, not much can be done here. The JVM doesn't relinquish the memory to OS to avoid recreating the memory. Memory allocation is expensive work, so JVM will hold on to that memory for some time and as the memory management is controlled by Java, it is not always possible to force things here.
One downside of reducing heap size would be, it will take time for Java to recreate the memory space over and over with incoming requests. So the clients will always see some higher latency. However, if the memory space is already created, the next stream of clients will see lower latency so essentially your amortized performance will increase.

Is controlling the process's resident memory by setting a low new generation size a bad idea?

The support team in our company reported our API process is consuming too much RAM, so I started analyzing the issue, I've used visualvm and jconsole to monitor the process, and when I call an endpoint that has JNI calls to C++ native code, I see the resident memory (RES) in my process on the top linux monitoring software grow above the maximum heap size (-Xmx JVM flag). For example -Xmx3G but top shows 6G after hitting the endpoint many times.
Initially we thought it could be a native memory leak but I've run a test where I called the GC manually with System.gc() every five seconds and RES memory behaved properly.
With that result in mind, I started studying JVM flags and garbage collection. I've made a test with the serial collector and RES memory did well, but since it's an API I prefered using the G1GC collector. Calling the garbage collection more often without doing it explicitly seemed like an interesting fix. However, the solution I found is to set the new generation size to a low value, I've noticed that significant spike in RES memory was directly related to a spike in eden space analyzing the JConsole.
So i set the flag to -Xmn256M to try and control this RES memory spike that doesn't seem to be allocated inside normal Heap space, and to try and force garbage collection to be run more often, at least the collection from young to old gen. However, I've noticed G1GC is making many young colllections and no old collections.
As you can see young collections were called 47 times, but old ones were never called.
So, in summary: I've controlled memory spikes in resident memory by running young collections frequently with the G1GC because I've noticed that the garbage collection fixes this memory surplus (RES >> Max Heap Size), even though wasn't able to explain it, and even though I know this frequent young collection may affect application performace a little, because G1GC young generation is collected by stopping all application threads (from Java Performance by Scott Oaks). Is this a bad idea? Should I dedicate more time to diagnosing a memory leak (Java or native C++)?
My time with this task is almost up

Confusing Zookeeper Memory usage

I have an instance of zookeeper that has been running for some time... (Java 1.7.0_131, ZK 3.5.1-1), with -Xmx10G -XX:+UseParallelGC.
Recently there was a leadership change, and the memory usage on most instances in the quorum went from ~200MB to 2GB+. I took a jmap dump, and what I found that was interesting was that there was a lot of byte[] serialization data (>1GB) that had no GC Root, but hadn't been collected.
(This is ByteArrayOutputStream, DataOutputStream, org.apache.jute.BinaryOutputArchive, or HeapByteBuffer, BinaryOutputArchive).
Looking at the gc log, shortly before the election change, the full GC was running every 4-5 minutes. After the election, the tenuring threshold increases from 1 to 15 (max) and the full GC runs less and less often, eventually it doesn't even run on some days.
After severals days, suddenly, and mysteriously to me, something changes, and the memory plummets back to ~200MB with Full GC running every 4-5 minutes.
What I'm confused about here, is how so much memory can have no GC Root, and not get collected by a full GC. I even tried triggering a GC.run from jcmd a few times.
I wondered if something in ZK native land was holding onto this memory, or leaking this memory... which could explain it.
I'm looking for any debugging suggestions; I'm planning on upgrading Java 1.8, maybe ZK 3.5.4, but would really like to root cause this before moving on.
So far I've used visualvm, GCviewer and Eclipse MAT.
(Solid vertical black lines are full GC. Yellow is young generation).

I am not an expert on ZK. However, I have been tuning JVMs on Weblogic for a while and I feel, based on this information, that your configuration is generating the expansion and shrinking of the heaps (-Xmx10G -XX:+UseParallelGC). Thus, perhaps you should try using -Xms10G and -Xmx10G to avoid this resizing. Importantly, each time the JVM is resized a full GC is executed so avoiding this process is a good way to minimize the number of full garbage collections.
Please read this
"When a Hotspot JVM starts, the heap, the young generation and the perm generation space are
allocated to their initial sizes determined by the -Xms, -XX:NewSize, and -XX:PermSize parameters
respectively, and increment as-needed to the maximum reserved size, which are -Xmx, -
XX:MaxNewSize, and -XX:MaxPermSize. The JVM may also shrink the real size at runtime if the
memory is not needed as much as originally specified. However, each resizing activity triggers a
Full Garbage Collection (GC), and therefore impacts performance. As a best practice, we
recommend that you make the initial and maximum sizes identical"
Source: http://www.oracle.com/us/products/applications/aia-11g-performance-tuning-1915233.pdf
If you could provide your gc.log, it would be useful to analyse this case thoroughly.
Best regards,
RCC

How Jvm6 reduce heap size when is not necessary

I'm monitoring a Java application running on a Jvm 6.
Here a screenshot of the jvisualVM panel 1.
I notice that when the heap size is small (before 12:39 in the picture) the garbage collector runs frequently.
Then I run a memory expensive task a couple of times (from 12:39 to 12:41) and the heap space grows. Why from that point on the garbage collector runs less frequently?
After one hour or more, if I avoid executing the expensive tasks on the application the heap space slowly decrease.
Why the used heap space takes so long to decrease?
Is there something I can do to avoid this behaviour?
Does the new Java8 VM have a different behaviour?

Is there something I can do to avoid this behaviour?
Set -XX:MaxHeapFreeRatio=30 -XX:MinHeapFreeRatio=15, that'll shrink the heap size more aggressively. Note that not all GC implementations yield the memory they don't use back to the OS. At least G1 does, but that's not available on java 6.

The behaviour looks normal.
Up until 12:39 on your attached profile snapshot there isn't a lot of GC going on.
Then you run your tasks and as objects that are no longer reachable become eligible for GC the sweep marks them and they get removed.
You do not necessarily need to worry about the size of the heap unless you are maxing out and crashing frequently due to some memory leak. The GC will take care of removing eligible objects from the heap and you are limited in terms of how you can impact GC (unless of course you switch GC implementation).
Each major release of the platform includes some JVM and GC changes/improvements but the behaviour of application will be very similar in Hotspot 7/8. Try it.
Modern JVMs have highly optimized garbage collectors and you shouldn't need to worry about how/when it reclaims memory, but more about making sure you release objects so that they become eligible for collection. How often after startup do you experience out of memory issues?
If you are getting crashes due to out of memory configure the JVM to take a heap dump on exit:
-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=date.hprof

Reducing JVM pause time > 1 second using UseConcMarkSweepGC

I'm running a memory intensive app on a machine with 16Gb of RAM, and an 8-core processor, and Java 1.6 all running on CentOS release 5.2 (Final). Exact JVM details are:
java version "1.6.0_10"
Java(TM) SE Runtime Environment (build 1.6.0_10-b33)
Java HotSpot(TM) 64-Bit Server VM (build 11.0-b15, mixed mode)
I'm launching the app with the following command line options:
java -XX:+UseConcMarkSweepGC -verbose:gc -server -Xmx10g -Xms10g ...
My application exposes a JSON-RPC API, and my goal is to respond to requests within 25ms. Unfortunately, I'm seeing delays up to and exceeding 1 second and it appears to be caused by garbage collection. Here are some of the longer examples:
[GC 4592788K->4462162K(10468736K), 1.3606660 secs]
[GC 5881547K->5768559K(10468736K), 1.2559860 secs]
[GC 6045823K->5914115K(10468736K), 1.3250050 secs]
Each of these garbage collection events was accompanied by a delayed API response of very similar duration to the length of the garbage collection shown (to within a few ms).
Here are some typical examples (these were all produced within a few seconds):
[GC 3373764K->3336654K(10468736K), 0.6677560 secs]
[GC 3472974K->3427592K(10468736K), 0.5059650 secs]
[GC 3563912K->3517273K(10468736K), 0.6844440 secs]
[GC 3622292K->3589011K(10468736K), 0.4528480 secs]
The thing is that I thought the UseConcMarkSweepGC would avoid this, or at least make it extremely rare. On the contrary, delays exceeding 100ms are occurring almost once a minute or more (although delays of over 1 second are considerably rarer, perhaps once every 10 or 15 minutes).
The other thing is that I thought only a FULL GC would cause threads to be paused, yet these don't appear to be full GCs.
It may be relevant to note that most of the memory is occupied by a LRU memory cache that makes use of soft references.
Any assistance or advice would be greatly appreciated.

First, check out the Java SE 6 HotSpot[tm] Virtual Machine Garbage Collection Tuning documentation, if you haven't already done so. This documentation says:
the concurrent collector does most of its tracing and sweeping work with the
application threads still running, so only brief pauses are seen by the
application threads. However, if the concurrent collector is unable to finish
reclaiming the unreachable objects before the tenured generation fills up, or if
an allocation cannot be satisfied with the available free space blocks in the
tenured generation, then the application is paused and the collection is completed
with all the application threads stopped. The inability to complete a collection
concurrently is referred to as concurrent mode failure and indicates the need to
adjust the concurrent collector parameters.
and a little bit later on...
The concurrent collector pauses an application twice during a
concurrent collection cycle.
I notice that those GCs don't seem to be freeing very much memory. Perhaps many of your objects are long lived? You may wish to tune the generation sizes and other GC parameters. 10 Gig is a huge heap by many standards, and I would naively expect GC to take longer with such a huge heap. Still, 1 second is a very long pause time and indicates either something is wrong (your program is generating a large number of unneeded objects or is generating difficult-to-reclaim objects, or something else) or you just need to tune the GC.
Usually, I would tell someone that if they have to tune GC then they have other problems they need to fix first. But with an application of this size, I think you fall into the territory of "needing to understand GC much more than the average programmer."
As others have said, you need to profile your application to see where the bottleneck is. Is your PermGen too large for the space allocated to it? Are you creating unnecessary objects? jconsole works to at least show a minimum of information about the VM. It's a starting point. As others have indicated however, you very likely need more advanced tools than this.
Good luck.

Since you mention your desire to cache, I'm guessing that most of your huge heap is occupied by that cache. You might want to limit the size of the cache so that you are sure it never attempts to grow large enough to fill the tenured generation. Don't rely on SoftReference alone to limit the size. As the old generation fills with soft references, older references will be cleared and become garbage. New references (perhaps to the same information) will be created, but cleared quickly because free space is in short supply. Eventually, the tenured space is full of garbage and needs to be cleaned.
Consider adjusting the -XX:NewRatio setting too. The default is 1:2, meaning that one-third of the heap is allocated to the new generation. For a large heap, this is almost always too much. You might want to try something like 9, which would keep 9 Gb of your 10 Gb heap for the old generation.

Turns out that part of the heap was getting swapped out to disk, so that garbage collection had to pull a bunch of data off the disk back into memory.
I resolved this by setting Linux's "swappiness" parameter to 0 (so that it wouldn't swap data out to disk).

Here are some things I have found which could be significant.
JSON-RPC can generate a lot of objects. Not as much as XML-RPC, but still something to watch for. In any case you do appear to be generating as much at 100 MB of objects per second which means your GC is running a high percentage of the time and is likely to be adding to your random latency. Even though the GC is concurrent, your hardware/OS is very likely to exhibit non-ideal random latency under load.
Have a look at your memory bank architecture. On Linux the command is numactl --hardware. If your VM is being split across more than one memory bank this will increase your GC times significantly. (It will also slow down your application as these accesses can be significantly less efficient) The harder you work the memory subsystem the more likely the OS will have to shift memory around (Often in large amounts) and you get dramatic pauses as a result (100 ms is not surprising). Don't forget your OS does more than just run your app.
Consider compacting/reducing the memory consumption of your cache. If you are using multiple GB of cache it is worth looking at ways to cut memory consumption further than you have already.
I suggest you profile your app with memory allocation tracing AND cpu sampling on at the same time. This can yield very different results and often points to the cause of these sort of problems.
Using these approaches, the latency of an RPC call can be reduced to below 200 micro-second and the GC times reduced to 1-3 ms effecting less than 1/300 of calls.

I'd also suggest GCViewer and a profiler.

Some places to start looking:
https://visualvm.dev.java.net/
http://java.sun.com/j2se/1.5.0/docs/tooldocs/share/jstat.html
http://www.javaperformancetuning.com/tools/gcviewer/index.shtml
Also I'd run the code through a profiler.. I like the one in NetBeans but there are other ones as well. You can view the gc behaviour in real time. The Visual VM does that as well... but I haven't run it yet (been looking for a reason to... but haven't had the time or the need yet).

I haven't personally used such a huge heap but I've experienced very low latency in general using following switches for Oracle/Sun Java 1.6.x:
-Xincgc -XX:+UseConcMarkSweepGC -XX:CMSIncrementalSafetyFactor=50
-XX:+UseParNewGC
-XX:+CMSConcurrentMTEnabled -XX:ConcGCThreads=2 -XX:ParallelGCThreads=2
-XX:CMSIncrementalDutyCycleMin=0 -XX:CMSIncrementalDutyCycle=5
-XX:GCTimeRatio=90 -XX:MaxGCPauseMillis=20 -XX:GCPauseIntervalMillis=1000
The important parts are, in my opinion, the use of CMS for tenured generation and ParNewGC for young generation. In addition, this adds a pretty big safety factor for CMS (default is 10% instead of 50%) and request short pause times. As you're targeting for 25 ms response time, I'd try setting -XX:MaxGCPauseMillis to even smaller value. You could even try to use more than two cores for concurrent GC but I would guess that is not worth the CPU usage.
You should probably also check the HotSpot JVM GC cheat sheet.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.