JVM Optimizations for Docker and DC/OS

JVM Optimizations for Docker and DC/OS - java

I'm moving a bare metal java application (jar jdk8) to docker containers and DC/OS. I am noticing an odd pattern on the dockers, we set -XMX to 32 gig and allocate a 36 gig docker container. Every few hours or so the application will spike in old gen mem allocation and the GC will get stuck in a loop ( maxing CPU) while it tries to do the heap dump.
Are there any optimizations or things I can use to see why in that 1-5 second interval we are spiking so fast? Are there any gotchas I might need to be aware of with Docker and JVM?
We are using default GC

Just for future reference:
We are using JDK 8 and it seems as if Oracle has just recently added some experimental flags for using Docker. I believe the case could have been when GC was allocating threads it wasn't respecting the docker thread count from cgroup. The experimental flags seemed to have fixed our "off the rails issue"
https://blogs.oracle.com/java-platform-group/java-se-support-for-docker-cpu-and-memory-limits

Usually you would like to avoid this gigantic applications with > 30GB of memory and split your application into smaller parts with less memory requirements if you have the possibility to use a container platform like DC/OS.
In general about GC and heap size: If you have big heap sizes, full GC can take a long time. Personally I experienced full GC freezes up to a minute or more with a quite similar heap size to your mentioned 30GB.
About Java in containers: The JVM actually needs more memory than you configure with -Xmx. So, if you specify a memory limit of 2GB within your DC/OS (Marathon) application, you can not set -Xmx2G, because this memory restriction is a hard limitation. If your process inside the container will exceed these memory limit, the container will be killed. By the fact that the JVM will reserve temporary more memory than in -Xmx configured, this is really likely to happen. In general I would suggest to use around 75% of your configured memory as value for -Xmx.
You could have a look at newer JRE versions, which support -XX:+UseCGroupMemoryLimits. This is a JRE flag to use cgroup container limitations for memory consumption, see https://developers.redhat.com/blog/2017/04/04/openjdk-and-containers/ for more informations.

Related

JConsole heap dump much smaller than memory usage

We have a few containers running java processes with docker. One thing we've been noticing is a huge amount of memory that is taken up just by running a simple spring-boot app without even including our own code (just to try and get some kind of memory profile independent of any issues we might introduce).
What I saw was the memory consumed by docker/the JVM was hovering around 2.5. We did have a decent amount of extra deps included in it (camel, hibernate, some spring-boot deps) but that wasn't what really threw me off. What I saw was that despite docker saying it consumed 2.5GB of memory for the app, running jconsole against it read that it was consuming up to 1GB (down to ~200MB after a GC and slowly climbing). The memory footprint on docker remained where it was after the GC as well (2.5GB).
Furthermore, when I dumped the heap to see what kinds of object are taking up that space, it looks like the heap was only 33MB large after I loaded the .hprof file into MAT. None of this makes much sense to me. Currently, I'm looking at the non-heap space in jconsole reported at 115MB while the heap space is at 331MB.
I've already read a ton (on SO and other sites) about the JVM memory regions and some things specifically reporting that the heap dumps might be smaller but none of them were this far off that I could tell and beyond that, many of the suggested things to watch for were that the GC is run whenever a heap dump is taken and that MAT has a setting to show or hide unreachable objects. All of this was taken into account before posting here and now I just feel like something else is at play that I can't capture myself and I haven't found online.
I fully expect that the numbers might be a little off but it seems extreme that they're off by a factor of 10 in the best case scenario and off by nearly a factor of 100 when looking at the docker-reported memory usage.
Does anyone know what I might be missing here?
EDIT: This is also an app running with Java 8, not yet running with Java 11. It's on the JIRA board to do but not yet planned for.
EDIT2: Adding screenshots. Spike in the JConsole screen shot is from running GC.

JConsole gives you the amount of committed memory: 3311616 KiB ~= 3GiB
This is how much memory your java process consumes, as seen by the OS.
It is unrelated to how much heap is currently in use to hold Java objects, also reported by JConsole as 130237 kbyte ~= 130 MiB.
This is also unrelated to how many Objects are actually alive: By default MAT will remove unreachable Objects when you load the heap dump. You can enable the option by going to Preferences -> Memory Analyzer -> Keep Unreachable Objects (See the MAT documentation). So if you have a lot of short lived objects, the difference can be quite massive.
I see that it also reports a Max Heap of about 9GiB. It means that you have set Xmx parameter to a large value.
Hotspot GC's are not very good at reclaiming unused memory. They tend to use all the space available to them (the Max heap size, set by Xmx) and then never decommit the heap, effectively keeping it reserved for the Java process instead of releasing it to the OS.
If you want to minimize the memory footprint of your process from the OS perspective, I recommend that you set a lower Xmx, maybe -Xmx1g, so as to not allow Java to grow too much (of course, Xmx will also need to be high enough to accomodate for your application workload!).
If really want an adaptative heap, you can also switch to G1 (-XX:+UseG1GC) and a more recent Java, as the hotspot team has delivered some improvements recently.

Dave
OS monitoring tools will show to you the amount of memory that is allocated by a process. So this:
mean that your java process have 2.664G of memory allocated (java heap + meta space)
JConsole shows to you the memory that your code is "consuming" (ignoring the meta space)
I see 2 possible explanations:
You have set -Xms with a huge value
You have a lot of static
code (or other content) loaded on your meta space.

Java GC runs cause increase of mem usage on OS

The last days I have seen some behaviour which crashes some assumption of me. Maybe someone can give me an explanation of the following
The facts
We are running Java Applications in Docker Containers in K8s in GCP
Our base image is openjdk:8u171-jre-slim-stretch
We engaged the flags -XX:+UnlockExperimentalVMOptions -XX:+UseCGroupMemoryLimitForHeap => so resource limits should be obtained from the container and not from the hosts => IMO that works
We engaged XX:MaxRAMFraction=2 so we get about the half of the containers memory limit for the heap => IMO that works
There is no -Xmx setting for the JVM
The only way to influence the JVMs memory is via Kubernetes resource limits => which is exactly what we want
The thing I do not get
When GC kicks in (Scavenge or G1) there is an increase in container memory usage, but not in the jvm memory usage. Why does the GC consume memory from the container? OK, it copies stuff eg from old gen to young gen so I understand that for a short time more memory is used. But I have expected that this would be taken from the JVM process and not from some kernel process.

While setting memory and CPU limits in one of our K8s cluster running Java pods, we faced some issue where the pod memory usage was shown as a lot higher in Prometheus (and even some pods crashed due to probe errors).
While further analyzing, we figured out that the GC was working hard to free up the memory as the limits were too small compared to the object creation; causing the CPU cycles to be spend on GC; rather than on actual work. We avoided that by increasing the pod memory limit.
So maybe you can increase the maximum memory limits provided to pods and see if this goes away. I am not exactly sure whether this will help you, but thought of sharing in case it might.

Process Memory Vs Heap -- JVM

We have a web application deployed on a tomcat server. There are certain scheduled jobs which we run, after which the heap memory peaks up and settles down, everything seems fine.
However the system admin is complaining that memory usage ('top' on Linux ) keeps increasing the more the scheduled jobs are.
Whats the co-relation between heap memory and memory of the CPU? Can it be controlled by any JVM settings? I used JConsole to monitor the system.
I forced the garbage collection through JConsole and the heap usage came down, however the memory usage on Linux remained high and it never decreased.
Any ideas or suggestions would of great help?

The memory allocated by the JVM process is not the same as the heap size. The used heap size could go down without an actual reduction in the space allocated by the JVM. The JVM has to receive a trigger indicating it should shrink the heap size. As #Xepoch mentions, this is controlled by -XX:MaxHeapFreeRatio.
However the system admin is complaining that memory usage ('top' on Linux ) keeps increasing the more the scheduled jobs are [run].
That's because you very likely have some sort of memory leak. System admins tend to complain when they see processes slowly chew up more and more space.
Any ideas or suggestions would of great help?
Have you looked at the number of threads? Is you application creating its own threads and sending them off to deadlock and wait idly forever?
Are you integrating with any third party APIs which may be using JNI?

What is likely being observed is the virtual size and not the resident set size of the Java process(es)? If you have a goal for a small footprint, you may want to not include -Xms or any minimum size on the JVM heap arguments and adjust the 70% -XX:MaxHeapFreeRatio= to a smaller number to allow for more aggressive heap shrinkage.
In the meantime, provide more detail as to what was observed with the comment the Linux memory never decreased? What metric?

You can use -Xmx and -Xms settings to adjust the size of the heap. With tomcat you can set an environment variable before starting:
export JAVA_OPTS=”-Xms256m -Xmx512m”
This initially creates a heap of 256MB, with a max size of 512MB.
Some more details:
http://confluence.atlassian.com/display/CONF25/Fix+'Out+of+Memory'+errors+by+increasing+available+memory

Java memory usage on Linux

I'm running a handfull of Java Application servers that are all running the latest versions of Tomcat 6 and Sun's Java 6 on top of CentOS 5.5 Linux. Each server runs multiple instances of Tomcat.
I'm setting the -Xmx450m -XX:MaxPermSize=192m parameters to control how large the heap and permgen will grow. These settings apply to all the Tomcat instances across all of the Java Application servers, totaling about 70 Tomcat instances.
Here is a typical memory usage of one of those Tomcat instances as reported by Psi-probe
Eden = 13M
Survivor = 1.5M
Perm Gen = 122M
Code Cache = 19M
Old Gen = 390M
Total = 537M
CentOS however is reporting RAM usage for this particular process at 707M (according to RSS) which leaves 170M of RAM unaccounted for.
I am aware that the JVM itself and some of it's dependancy libraries must be loaded into memory so I decided to fire up pmap -d to find out their memory footprint.
According to my calculations that accounts for about 17M.
Next there is the Java thread stack, which is 320k per thread on the 32 bit JVM for Linux.
Again, I use Psi-probe to count the number of threads on that particular JVM and the total is 129 threads. So 129 + 320k = 42M
I've read that NIO uses memory outside of the heap, but we don't use NIO in our applications.
So here I've calculated everything that comes to (my) mind. And I've only accounted for 60M of the "missing" 170M.
What am I missing?

Try using the incremental garbage collector, using the -Xincgc command line option.
It's little more aggressive on the whole GC efforts, and has a special happy little anomaly: it actually hands back some of its unused memory to the OS, unlike the default and other GC choices !
This makes the JVM consume a lot less memory, which is especially good if you're running multiple JVM's on one machine. At the expense of some performance - but you might not notice it. The incgc is a little secret it seems, because noone ever brings it up... It's been there for eons (90's even).

Arnar, In JVM initialization process JVM will allocate a memory (mmap or malloc) of size specified by -Xmx and MaxPermSize,so anyways JVM will allocate 450+192=642m of heap space for application at the start of the JVM process. So java heap space for application is not 537 but its 642m.So now if you do the calculation it will give you your missing memory.Hope it helps.

Java allocates as much virtual memory as it might need up front, however the resident side will be how much you actually use. Note: Many of the libraries and threads have their own over heads and while you don't use direct memory, it doesn't mean none of the underlying system do. e.g. if you use NIO, it will use some direct memory even if you use heap ByteBuffers.
Lastly, 100 MB is worth about £8. It may be that its not worth spending too much time worrying about it.

Not a direct answer, but, have you also considered hosting multiple sites within the same Tomcat instance? This could save you some memory at the expense of some additional configuration.

Arnar, the JVM also mmap's all jar files in use, which will use NIO and will contribute to the RSS. I don't believe those are accounted for in any of your measurements above. Do you by chance have a significant number of large jar files? If so, the pages used for those could be your missing memory.

How is your JVM 6 memory setting for JBOSS AS 5?

I'm using an ICEFaces application that runs over JBOSS, my currently heapsize is set to
-Xms1024m –Xmx1024m -XX:MaxPermSize=256m
what is your recommendation to adjust memory parameters for JBOSS AS 5 (5.0.1 GA) JVM 6?

According to this article:
AS 5 is known to be greedy when it comes to PermGen. When starting, it often throws OutOfMemoryException: PermGen Error.
This can be particularly annoying during development when you are hot deploying frequently an application. In this case, JBoss QA recommends to raise the permgen size, allow classes unloading and permgen sweep:
-XX:PermSize=512m -XX:MaxPermSize=1024 -XX:+UseConcMarkSweepGC -XX:+CMSPermGenSweepingEnabled -XX:+CMSClassUnloadingEnabled
But this is more FYI, I'm not suggesting to apply this configuration blindly (as people wrote in comments, "if it ain't broken, don't fix it").
Regarding your heap size, always keep in mind: the bigger the heap, the longer the major GC. Now, when you say "it was definitely too small", I don't really know what this means (what errors, symptoms, etc). To my knowledge, a 1024m heap is actually pretty big for a webapp and should really be more than enough for most of them. Just beware of the major GC duration.

Heap: Start with 512 MB, set the cap to where you believe your app should never get, and not to make your server start swapping.
Permgen: That's usually stable enough, once the app reads all classes used in the app. If you have tested the app and it works with 256 MB, then leave it so.

#wds: It's definitely not a good idea to set the heap maximum as high as possible for two reasons:
Large heaps make full GC take longer. If you have PermGen scanning enabled, a large PermGen space will take longer to GC as well.
JBoss AS on Linux can leave unused I/O handles open long enough to make Linux clean them up forcibly, blocking all processes on the machine until it is complete (might take over 1 minute!). If you forget to turn off the hot deploy scanner, this will happen much more frequently.
This would happen maybe once a week in my application until I:
decreased -Xms to a point where JBoss AS startup was beginning to slow down
decreased -Xmx to a point where full GCs happened more frequently, so the Linux I/O handle clean up stopped
For developers I think it's fine to increase PermGen, but in production you probably want to use only what is necessary to avoid long GC pauses.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.