Confusing Zookeeper Memory usage - java

I have an instance of zookeeper that has been running for some time... (Java 1.7.0_131, ZK 3.5.1-1), with -Xmx10G -XX:+UseParallelGC.
Recently there was a leadership change, and the memory usage on most instances in the quorum went from ~200MB to 2GB+. I took a jmap dump, and what I found that was interesting was that there was a lot of byte[] serialization data (>1GB) that had no GC Root, but hadn't been collected.
(This is ByteArrayOutputStream, DataOutputStream, org.apache.jute.BinaryOutputArchive, or HeapByteBuffer, BinaryOutputArchive).
Looking at the gc log, shortly before the election change, the full GC was running every 4-5 minutes. After the election, the tenuring threshold increases from 1 to 15 (max) and the full GC runs less and less often, eventually it doesn't even run on some days.
After severals days, suddenly, and mysteriously to me, something changes, and the memory plummets back to ~200MB with Full GC running every 4-5 minutes.
What I'm confused about here, is how so much memory can have no GC Root, and not get collected by a full GC. I even tried triggering a GC.run from jcmd a few times.
I wondered if something in ZK native land was holding onto this memory, or leaking this memory... which could explain it.
I'm looking for any debugging suggestions; I'm planning on upgrading Java 1.8, maybe ZK 3.5.4, but would really like to root cause this before moving on.
So far I've used visualvm, GCviewer and Eclipse MAT.
(Solid vertical black lines are full GC. Yellow is young generation).

I am not an expert on ZK. However, I have been tuning JVMs on Weblogic for a while and I feel, based on this information, that your configuration is generating the expansion and shrinking of the heaps (-Xmx10G -XX:+UseParallelGC). Thus, perhaps you should try using -Xms10G and -Xmx10G to avoid this resizing. Importantly, each time the JVM is resized a full GC is executed so avoiding this process is a good way to minimize the number of full garbage collections.
Please read this
"When a Hotspot JVM starts, the heap, the young generation and the perm generation space are
allocated to their initial sizes determined by the -Xms, -XX:NewSize, and -XX:PermSize parameters
respectively, and increment as-needed to the maximum reserved size, which are -Xmx, -
XX:MaxNewSize, and -XX:MaxPermSize. The JVM may also shrink the real size at runtime if the
memory is not needed as much as originally specified. However, each resizing activity triggers a
Full Garbage Collection (GC), and therefore impacts performance. As a best practice, we
recommend that you make the initial and maximum sizes identical"
Source: http://www.oracle.com/us/products/applications/aia-11g-performance-tuning-1915233.pdf
If you could provide your gc.log, it would be useful to analyse this case thoroughly.
Best regards,
RCC

Related

Make ZGC run often

ZGC runs not often enough. GC logs show that it runs once every 2-3 minutes for my application and because of this, my memory usage goes high between GC cycles (as high as 90%). After GC, it drops to as low as 20%.
How to increase GC run's frequency to run more often?
-XX:ZCollectionInterval=N - set maximum gap between collections to N seconds.
-XX:ZUncommitDelay=M - set the delay until unused memory is returned to the OS to M seconds.
Before tuning the GC, I would recommend to investigate why this is happening. Might have some issue/bug in your application.
[Some notes about GC]
-XX:ZUncommitDelay=M (Check if it is supported by your linux kernel)
-XX:+ZProactive: Enables proactive GC cycles when using ZGC. By default, this option is enabled. ZGC will start a proactive GC cycle if doing so is expected to have minimal impact on the running application. This is useful if the application is mostly idle or allocates very few objects, but you still want to keep the heap size down and allow reference processing to happen even when there are a lot of free space on the heap.
More details about ZGC config. options can be found:
ZGC Home Page.
Oracle Documentation
Presently (as of JDK 17), ZGC's primary strategy is to wait until the last possible moment of the heap filling up and then do a collection. Its goals are
Avoid unnecessary CPU load by running GC only when it's necessary.
Start the GC early enough so that it will finish before the heap actually fills up (since the heap filling up would be bad, leading to a temporary application stall).
It does this by measuring how fast your app is allocating memory, how long the GC takes to run, and predicting at what point it should start the GC. You can find the exact algorithm in the source code.
ZGC also exposes some knobs for running GC more often (ie, proactively), but honestly I don't find them terribly effective. You can find more info in my other answer. G1 does a better job of being proactive, but whether that's good or not depends on your use-case. (It sounds like you care more about throughput than memory usage, so I think you should prefer ZGC's behavior.)
However, if you find that ZGC is making mistakes in predicting when the heap will fill up and that your application really is hitting stalls, please share that info here or on the ZGC mailing list.

JConsole heap dump much smaller than memory usage

We have a few containers running java processes with docker. One thing we've been noticing is a huge amount of memory that is taken up just by running a simple spring-boot app without even including our own code (just to try and get some kind of memory profile independent of any issues we might introduce).
What I saw was the memory consumed by docker/the JVM was hovering around 2.5. We did have a decent amount of extra deps included in it (camel, hibernate, some spring-boot deps) but that wasn't what really threw me off. What I saw was that despite docker saying it consumed 2.5GB of memory for the app, running jconsole against it read that it was consuming up to 1GB (down to ~200MB after a GC and slowly climbing). The memory footprint on docker remained where it was after the GC as well (2.5GB).
Furthermore, when I dumped the heap to see what kinds of object are taking up that space, it looks like the heap was only 33MB large after I loaded the .hprof file into MAT. None of this makes much sense to me. Currently, I'm looking at the non-heap space in jconsole reported at 115MB while the heap space is at 331MB.
I've already read a ton (on SO and other sites) about the JVM memory regions and some things specifically reporting that the heap dumps might be smaller but none of them were this far off that I could tell and beyond that, many of the suggested things to watch for were that the GC is run whenever a heap dump is taken and that MAT has a setting to show or hide unreachable objects. All of this was taken into account before posting here and now I just feel like something else is at play that I can't capture myself and I haven't found online.
I fully expect that the numbers might be a little off but it seems extreme that they're off by a factor of 10 in the best case scenario and off by nearly a factor of 100 when looking at the docker-reported memory usage.
Does anyone know what I might be missing here?
EDIT: This is also an app running with Java 8, not yet running with Java 11. It's on the JIRA board to do but not yet planned for.
EDIT2: Adding screenshots. Spike in the JConsole screen shot is from running GC.
JConsole gives you the amount of committed memory: 3311616 KiB ~= 3GiB
This is how much memory your java process consumes, as seen by the OS.
It is unrelated to how much heap is currently in use to hold Java objects, also reported by JConsole as 130237 kbyte ~= 130 MiB.
This is also unrelated to how many Objects are actually alive: By default MAT will remove unreachable Objects when you load the heap dump. You can enable the option by going to Preferences -> Memory Analyzer -> Keep Unreachable Objects (See the MAT documentation). So if you have a lot of short lived objects, the difference can be quite massive.
I see that it also reports a Max Heap of about 9GiB. It means that you have set Xmx parameter to a large value.
Hotspot GC's are not very good at reclaiming unused memory. They tend to use all the space available to them (the Max heap size, set by Xmx) and then never decommit the heap, effectively keeping it reserved for the Java process instead of releasing it to the OS.
If you want to minimize the memory footprint of your process from the OS perspective, I recommend that you set a lower Xmx, maybe -Xmx1g, so as to not allow Java to grow too much (of course, Xmx will also need to be high enough to accomodate for your application workload!).
If really want an adaptative heap, you can also switch to G1 (-XX:+UseG1GC) and a more recent Java, as the hotspot team has delivered some improvements recently.
Dave
OS monitoring tools will show to you the amount of memory that is allocated by a process. So this:
mean that your java process have 2.664G of memory allocated (java heap + meta space)
JConsole shows to you the memory that your code is "consuming" (ignoring the meta space)
I see 2 possible explanations:
You have set -Xms with a huge value
You have a lot of static
code (or other content) loaded on your meta space.

JVM Garbage Collector suddenly consumes 100% CPU after running for several hours

I've got a strange problem in my Clojure app.
I'm using http-kit to write a websocket based chat application.
Client's are rendered using React as a single page app, the first thing they do when they navigate to the home page (after signing in) is create a websocket to receive things like real-time updates and any chat messages. You can see the site here: www.csgoteamfinder.com
The problem I have is after some indeterminate amount of time, it might be 30 minutes after a restart or even 48 hours, the JVM running the chat server suddenly starts consuming all the CPU. When I inspect it with NR (New Relic) I can see that all that time is being used by the garbage collector -- at this stage I have no idea what it's doing.
I've take a number of screenshots where you can see the effect.
You can see a number of spikes, those spikes correspond to large increases in CPU usage because of the garbage collector. To free up CPU I usually have to restart the JVM, I have been relying on receiving a CPU alert from NR in my slack account to make sure I jump on these quickly....but I really need to get to the root of the problem.
My initial thought was that I was possibly holding onto the socket reference when the client closed it at their end, but this is not the case. I've been looking at socket count periodically and it is fairly stable.
Any ideas of where to start?
Kind regards, Jason.
It's hard to imagine what could have caused such an issue. But at first what I would do is taking a heap dump at the time of crash. This can be enabled with -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=<path_to_your_heap_dump> JVM args. As a general practice don't increase heap size more the size of physical memory available on your server machine. In some rare cases JVM is unable to dump heap space because process is doomed; in such cases you can use gcore(if you're on Linux, not sure about Windows).
Once you grab the heap dump, analyse it with mat, I have debugged such applications and this worked perfectly to pin down any memory related issues. Mat allows you to dissect the heap dump in depth so you're sure to find the cause of your memory issue if it is not the case that you have allocated very small heap space.
If your program is spending a lot of CPU time in garbage collection, that means that your heap is getting full. Usually this means one of two things:
You need to allocate more heap to your program (via -Xmx).
Your program is leaking memory.
Try the former first. Allocate an insane amount of memory to your program (16GB or more, in your case, based on the graphs I'm looking at). See if you still have the same symptoms.
If the symptoms go away, then your program just needed more memory. Otherwise, you have a memory leak. In this case, you need to do some memory profiling. In the JVM, the way this is usually done is to use jmap to generate a heap dump, then use a heap dump analyser (such as jhat or VisualVM) to look at it.
(Fair disclosure: I'm the creator of a jhat fork called fasthat.)
Most likely your tenure space is filling up triggering a full collection. At this time the GC uses all the CPUS for sometime seconds at time.
To diagnose why this is happening you need to look at your rate of promotion (how much data is moving from young generation to tenured space)
I would look at increasing the young generation size to decrease rate of promotion. You could also look at using CMS as this has shorter pause times (though it uses more CPU)
Things to try in order:
Reduce the heap size
Count the number of objects of each class, and see if the numbers makes sense
Do you have big byte[] that lives past generation 1?
Change or tune GC algorithm
Use high-availability, i.e. more than one JVM
Switch to Erlang
You have triggered a global GC. The GC time grows faster-than-linear depending on the amount of memory, so actually reducing the heap space will trigger the global GC more often and make it faster.
You can also experiment with changing GC algorithm. We had a system where the global GC went down from 200s (happened 1-2 times per 24 hours) to 12s. Yes, the system was at a complete stand still for 3 minutes, no the users were not happy :-) You could try -XX:+UseConcMarkSweepGC
http://www.fasterj.com/articles/oraclecollectors1.shtml
You will always have stops like this for JVM and similar; it is more about how often you will get it, and how fast the global GC will be. You should make a heap dump and get the count of the different objects of each class. Most likely, you will see that you have millions of one of them, somehow, you are keeping a pointer to them unnecessary in a ever growing cache or sessions or similar.
http://docs.oracle.com/javase/8/docs/technotes/guides/troubleshoot/memleaks001.html#CIHCAEIH
You can also start using a high-availability solution with at least 2 nodes, so that when one node is busy doing GC, the other node will have to handle the total load for a time. Hopefully, you will not get the global GC on both systems at the same time.
Big binary objects like byte[] and similar is a real problem. Do you have those?
At some time, these needs to be compacted by the global GC, and this is a slow operation. Many of the data-processing JVM based solution actually avoid to store all data as plain POJOs on the heap, and implement heaps themselves in order to overcome this problem.
Another solution is to switch from JVM to Erlang. Erlang is near real time, and they got by not having the concept of a global GC of the whole heap. Erlang has many small heaps. You can read a little about it at
https://hamidreza-s.github.io/erlang%20garbage%20collection%20memory%20layout%20soft%20realtime/2015/08/24/erlang-garbage-collection-details-and-why-it-matters.html
Erlang is slower than JVM, since it copies data, but the performance is much more predictable. It is difficult to have both. I have a websocket Erlang based solution, and it really works well.
So you run into a problem that is expected and normal for JVM, Microsoft CLR and similar. It will get worse and more common during the next couple of years when heap sizes grows.

JVM consumes 100% CPU with a lot of GC

After running a few days the CPU load of my JVM is about 100% with about 10% of GC (screenshot).
The memory consumption is near to max (about 6 GB).
The tomcat is extremely slow at that state.
Since it's too much for a comment i'll write it up ans answer:
Looking at your charts it seems to be using CPU for non-GC tasks, peak "GC activity" seems to stay within 10%.
So on first impression it would seem that your task is simply CPU-bound, so if that's unexpected maybe you should do some CPU-profiling on your java application to see if something pops out.
Apart from that, based on comments I suspect that physical memory filling up might evict file caches and memory-mapped things, leading to increased page faults which forces the CPU to wait for IO.
Freeing up 500MB on a manual GC out of a 4GB heap does not seem all that much, most GCs try to keep pause times low as their primary goal, keep the total time spent in GC within some bound as secondary goal and only when the other goals are met they try to reduce memory footprint as tertiary goal.
Before recommending further steps you should gather more statistics/provide more information since it's hard to even discern what your actual problem is from your description.
monitor page faults
figure out which GC algorithm is used in your setup and how they're tuned (-XX:+PrintFlagsFinal)
log GC activity - I suspect it's pretty busy with minor GCs and thus eating up its pause time or CPU load goals
perform allocation profiling of your application (anything creating excessive garbage?)
You also have to be careful to distinguish problems caused by the java heap reaching its sizing limit vs. problems causing by the OS exhausting its physical memory.
TL;DR: Unclear problem, more information required.
Or if you're lazy/can afford it just plug in more RAM / remove other services from the machine and see if the problem goes away.
I learned to check this on GC problems:
Give the JVM enough memory e.g. -Xmx2G
If memory is not sufficient and no more RAM is available on the host, analyze the HEAP dump (e.g. by jvisualvm).
Turn on Concurrent Marc and Sweep:
-XX:+UseConcMarkSweepGC -XX:+UseParNewGC
Check the garbage collection log: -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Xloggc:gc.log
My Solution:
But I solved that problem finally by tuning the cache sizes.
The cache sizes were to big, so memory got scarce.
if you want keep the memory of your server free you can simply try the vm-parameter
-Xmx2G //or any different value
This ensures your program never takes more than 2 Gigabyte of Ram. But be aware if case of high workload the server may be get an OutOfMemoryError.
Since a old generation (full) GC may block your whole server from working for some seconds java will try to avoid a Full Garbage collection.
The Ram-Limitation may trigger a Full-Generation GC more easy (or even support more objects to be collected by Young-Generation GC).
From my (more guessing than actually knowing) opinion: I don't think another algorithm can help so much here.

Tuning JVM (GC) for high responsive server application

I am running an application server on Linux 64bit with 8 core CPUs and 6 GB memory.
The server must be highly responsive.
After some inspection I found that the application running on the server creates rather a huge amount of short-lived objects, and has only about 200~400 MB long-lived objects(as long as there is no memory leak)
After reading http://java.sun.com/javase/technologies/hotspot/gc/gc_tuning_6.html
I use these JVM options
-server -Xms2g -Xmx2g -XX:MaxPermSize=256m -XX:NewRatio=1 -XX:+UseConcMarkSweepGC
Result: the minor GC takes 0.01 ~ 0.02 sec, the major GC takes 1 ~ 3 sec
the minor GC happens constantly.
How can I further improve or tune the JVM?
larger heap size? but will it take more time for GC?
larger NewSize and MaxNewSize (for young generation)?
other collector? parallel GC?
is it a good idea to let major GC take place more often? and how?
Result: the minor GC takes 0.01 ~ 0.02 sec, the major GC takes 1 ~ 3 sec the minor GC happens constantly.
Unless you are reporting pauses, I would say that the CMS collector is doing what you have asked it to do. By definition, CMS will use a larger percentage of the CPU than the Serial and Parallel collectors. This is the penalty you pay for low pause times.
If you are seeing 1 to 3 second pause times, I'd say that you need to do some tuning. I'm no expert, but it looks like you should start by reducing the value of CMSInitiatingOccupancyFraction from the default value of 92.
Increasing the heap size will improve the "throughput" of the GC. But if your problem is long pauses, increasing the heap size is likely to make the problem worse.
Careful .... GC can be a hairy subject if you are not cautious. Within any runtime (JVM for Java / CLR for .Net) there are several processes that take place. Generally there is an early stage optimization of memory (Young Generational Garbage Collection / Young Gen GC & Old Generational Garbage Collection / Old Gen GC). The young gen gc happens on a regular basis and is commonly attributed to your smaller pauses / hiccups. The old gen gc is normally what is going on when you see the long "stop the world" pauses.
Why you might ask? The reason you get pauses with your runtime / JVM is that when the runtime does its cleanup of the Heap it has to go through what is called a phase change. It stops the threads running your application in order to mark and swap pointers to optimize your available memory. Yong gen is faster as it is mainly releasing objects that are only temporary. Old gen, however, evaluates all the objects on the heap and when you run out of memory will it will kick of to free up much needed memory.
Why the Caution? Old gen gets exponentially worse in pause time the more heap you use. at 2-4 GB in total heap size you should be fine on modern runtimes like Java 6 (JDK 1.6+). Once you go beyond that threashold you will see exponential increases in pause times. I have run into some clients that have to restart servers - as in some rare cases where a heap is large GC pause times can take longer than a full restart.
There are some new tools out there that are pretty cool and can give you a leading edge on evaluating if GC is your pain. JHiccup is one and it is free from the azulsystemswebsite. At this time I think it is only for Linux though. They also have a JVM that has a re-built GC algorithm that runs pauseless ... but if you are on a single server deployment with a non-critical app it may not be cost effective (that one is not free).
To sum up - if your runtime / JVM / CLR heap is less than 2 GB adding more memory will help. Be sure to give yourself some overhead. You never want to hit 100% Heap size / memory size if possible. That is when the long pauses are the longest. Give yourself an extra 20%+ memory over what you think you will need. That way you have room for the GC algorithms to move objects around for optimization. If you ever plan to go large ... there is one tool that fixes the circa 1990 JVM technology (Azul Systems Zing JVM), but it is not free. They do offer an open source tool to diagnose GC issues. The JVM (as I have tried it) also has a really cool thread level visibility tool that lets you report on any leaks, bugs, or locks in production without overhead (some trick with offloading data the JVM already deals with and time stamping). That has saved tons of dev test time ... but again, not for small apps.
Stay below 4 GB. Give extra headroom. And if you want you can turn on these flags to monitor GC for Java / JVM:
java -verbose:gc myProgram
java -Xloggc:D:/log/myLogFile.log -XX:+PrintGCDetails myProgram
You may try some of the other collectors Hotspot uses. There are more than one.
If you are on Linux go ahead and try the JHiccup tool as well. It is free.
You may be interested in trying the low-pause Garbage-First collector instead of concurrent mark-sweep (although it's not necessarily more performant for all collections, it's supposed to have a better worst case). It's enabled by -XX:+UseG1GC and is supposed to be really awesomesauce, but you may want to give it a thorough evaluation before using it in production. It has probably improved since, but it seems to have been a bit buggy a year ago, as seen in Experience with JDK 1.6.x G1 (“Garbage First”)
It is perfectly fine for the garbage collection to run in parallel with your program, if you have plenty of cpu, which you do.
What you want, is to make absolutely certain that you will not run into a scenario where the garbage collection PAUSES your main program.
Have you tried simply not stating any flags except saying you want the server VM (for the Sun JVM), and then put your server under heavy load to see how it behaves? Only then can you see, if you get any improvements from tinkering with options.
This actually sounds like a throughput app and should probably use the throughput collector. I would balance the size of the new gen making it big enough to not GC too often and small enough to prevent long pauses. 20ms sounds like a long minor GC to me. I also suspect your survivor space is set too large and is just being wasted. If you don't have much surviving to old gen, you shouldn't have that much surviving your minor GCs.
In the end, you should use jvmstat and VisualGC to really get a feel for how your app is using memory.
For high responsive server application, I think you want to see the major GC happens less frequently. Here is the list of parameters would help.
-XX:+CMSParallelRemarkEnabled
-XX:+CMSScavengeBeforeRemark
-XX:+UseCMSInitiatingOccupancyOnly
-XX:CMSInitiatingOccupancyFraction=50
-XX:CMSWaitDuration=300000
-XX:GCTimeRatio=40
Larger heap size may not help on low pause, as long as your app doesn't run out of memory.
Larger NewSize and MaxNewSize would help on throughput, may not help on low pause. If you choose to take this approach, you may consider give GC threads more execution time by setting -XX:GCTimeRatio lower. The point is to remember to take a holistic when tuning JVM.
I think previous posters have missed something very obvious- the Perm Generation size is too low. If the system uses 200 to 400 MB as permanent generation- then it may be best to set Max Perm Gen to 400 MB. the PerGen size should also be set to the same value. You will then never run out of Permanent Generation Space.
Currently- it looks like the JVM has to spend a lot of time moving objects in and out of Permanent Generation. This can take time. JVM tries to allocate contiguous memory areas to Java Objects- this speeds memory access due to hardware level features. In order to do that, it is very helpful to have plenty of buffer in memory. If Permanent Generation is almost full, newly discovered permanent objects must be split or existing objects must be shuffled. This is what triggers a full GC, as well as causes long full GC pauses.
The question states that the Permanent Generation size has already been measured- if this has not been done, it should be measured using a tool. These tools process logs generated by the JVM with the verboseGC option turned on.
All the mark and sweep options listed above- may not be needed with this basic improvement.
People throw GC options as solutions without evaluating how mature they have proven to be in actual use.

Categories