The last days I have seen some behaviour which crashes some assumption of me. Maybe someone can give me an explanation of the following
The facts
We are running Java Applications in Docker Containers in K8s in GCP
Our base image is openjdk:8u171-jre-slim-stretch
We engaged the flags -XX:+UnlockExperimentalVMOptions -XX:+UseCGroupMemoryLimitForHeap => so resource limits should be obtained from the container and not from the hosts => IMO that works
We engaged XX:MaxRAMFraction=2 so we get about the half of the containers memory limit for the heap => IMO that works
There is no -Xmx setting for the JVM
The only way to influence the JVMs memory is via Kubernetes resource limits => which is exactly what we want
The thing I do not get
When GC kicks in (Scavenge or G1) there is an increase in container memory usage, but not in the jvm memory usage. Why does the GC consume memory from the container? OK, it copies stuff eg from old gen to young gen so I understand that for a short time more memory is used. But I have expected that this would be taken from the JVM process and not from some kernel process.
While setting memory and CPU limits in one of our K8s cluster running Java pods, we faced some issue where the pod memory usage was shown as a lot higher in Prometheus (and even some pods crashed due to probe errors).
While further analyzing, we figured out that the GC was working hard to free up the memory as the limits were too small compared to the object creation; causing the CPU cycles to be spend on GC; rather than on actual work. We avoided that by increasing the pod memory limit.
So maybe you can increase the maximum memory limits provided to pods and see if this goes away. I am not exactly sure whether this will help you, but thought of sharing in case it might.
Related
The goal is to understand what should be tuned in order for the Java process to stop restarting itself.
We have a Java Springboot backend application with Hazelcast running that restarts instead of garbage collecting.
Environment is:
Amazon Corretto 17.0.3
The only memory tuning parameter supplied is:
-XX:+UseContainerSupport -XX:MaxRAMPercentage=80.0
The memory limit in kubernetes is 2Gi so the container gets 1.6Gi
Graphs of memory usage:
The huge drop towards the end is where I performed a heap dump. Performing the dump lead to a drastic decrease in memory usage (due to a full GC?).
The GC appears to be working against me here. If the memory dump was not performed, the container hits what appears to be a memory limit, it is restarted by kubernetes, and it continues in this cycle. Are there tuning parameters that are missed, is this a clear memory leak (perhaps due to hazelcast metrics) https://github.com/hazelcast/hazelcast/issues/16672)?
So the JVM will determine which garbage collector (GC) to use based on the amount of memory and CPU given to the application. By default, it will use the Serial GC if the RAM is under 2GB or the CPU cores is less than 2. For a Kubernetes server application, the Serial GC is not a great choice as it runs in a single thread and it seems to wait until the heap is near the max limit to reclaim the heap space. It also results in a lot of pausing of the application which can lead to health check failures or scaling to due to momentary higher cpu usage. What has worked best for us, is to force the use of the G1 GC collector. It is a concurrent collector that runs side by side with your app and tries its best to minimize application pausing. I would suggest setting your CPU limit to at least 1 and setting your RAM limit to however much you think your application is going to use plus a little overhead. To force the G1 GC collector add the following option to your java XX:+UseG1GC.
I'm moving a bare metal java application (jar jdk8) to docker containers and DC/OS. I am noticing an odd pattern on the dockers, we set -XMX to 32 gig and allocate a 36 gig docker container. Every few hours or so the application will spike in old gen mem allocation and the GC will get stuck in a loop ( maxing CPU) while it tries to do the heap dump.
Are there any optimizations or things I can use to see why in that 1-5 second interval we are spiking so fast? Are there any gotchas I might need to be aware of with Docker and JVM?
We are using default GC
Just for future reference:
We are using JDK 8 and it seems as if Oracle has just recently added some experimental flags for using Docker. I believe the case could have been when GC was allocating threads it wasn't respecting the docker thread count from cgroup. The experimental flags seemed to have fixed our "off the rails issue"
https://blogs.oracle.com/java-platform-group/java-se-support-for-docker-cpu-and-memory-limits
Usually you would like to avoid this gigantic applications with > 30GB of memory and split your application into smaller parts with less memory requirements if you have the possibility to use a container platform like DC/OS.
In general about GC and heap size: If you have big heap sizes, full GC can take a long time. Personally I experienced full GC freezes up to a minute or more with a quite similar heap size to your mentioned 30GB.
About Java in containers: The JVM actually needs more memory than you configure with -Xmx. So, if you specify a memory limit of 2GB within your DC/OS (Marathon) application, you can not set -Xmx2G, because this memory restriction is a hard limitation. If your process inside the container will exceed these memory limit, the container will be killed. By the fact that the JVM will reserve temporary more memory than in -Xmx configured, this is really likely to happen. In general I would suggest to use around 75% of your configured memory as value for -Xmx.
You could have a look at newer JRE versions, which support -XX:+UseCGroupMemoryLimits. This is a JRE flag to use cgroup container limitations for memory consumption, see https://developers.redhat.com/blog/2017/04/04/openjdk-and-containers/ for more informations.
I have a Java application running on a Tomcat7 instance. I am using Java 8.
Now within my application there is a webservice of the format : http://zzz.xxx.zzz.xxx/someWebservice?somepar1=rrr&somePar2=yyy. This returns a String value of max 10 characters.
I have now started load testing this service using Jmeter. I am putting a load of 100 concurrent connections and getting a throughput of roughly 150 requests/second. The server is a 4 core-6GB machine and is only running Tomcat (application instance). The database instance is running on a separate machine. JVM is running with min 2GB and Max 4 GB memory allocation. Max Perm Size is 512 MB. Tomcat has enough threads to cater to my load (Max connections/threads/executor values have been setup correctly)
I am now trying to optimize this service and in order to do so I am trying to analyze memory consumption. I am using JConsole for the same. My CPU usage is not a concern , but when I look at the memory (HEAP) usage, I feel something is not right. What I observe is a sawtooth shaped graph, which I know is correct as regular GC clears heap memory.
My concern is that this sawtooth shaped graph has an upward trend. I mean that the troughs in the sawtooth seem to be increasing over time. With this trend eventually my server reaches max heap memory in an hour or so and then stabilizes at 4GB. I believed that if I am putting a CONSTANT load, then the heap memory utilization should have a constant trend, i.e. a saw tooth graph with its peaks and troughs aligned. If there is an upward trend I am suspecting that there is a memory leak of variables which are collecting over time and since GC isn't able to clear them there is an increase over a period of time. I am attaching a screenshot.
Heap Usage
Questions:
1). Is this normal behavior? If yes, then why does the heap continuously increase despite no change in load? I don't believe that a load of 100 threads should saturate 4GB heap in roughly 30 minutes.
2). What could be the potential reasons here? Do I need to look at memory leaks? Any JVM analyzer apart from JConsole which can help me pinpoint the variables which the GC is unable to clear?
The see-saw pattern most likely stems from minor-collections, the dip around 14:30 then is a major collection, which you did not take into account when doing your reasoning.
Your load may simply be so low that it needs a long time to reach a stable state.
With this trend eventually my server reaches max heap memory in an hour or so and then stabilizes at 4GB.
Supports that conclusion if you're not seeing any OOMEs.
But there's only so much one can deduce from such charts. If you want to know more you should enable GC logging and inspect the log outputs instead.
After running a few days the CPU load of my JVM is about 100% with about 10% of GC (screenshot).
The memory consumption is near to max (about 6 GB).
The tomcat is extremely slow at that state.
Since it's too much for a comment i'll write it up ans answer:
Looking at your charts it seems to be using CPU for non-GC tasks, peak "GC activity" seems to stay within 10%.
So on first impression it would seem that your task is simply CPU-bound, so if that's unexpected maybe you should do some CPU-profiling on your java application to see if something pops out.
Apart from that, based on comments I suspect that physical memory filling up might evict file caches and memory-mapped things, leading to increased page faults which forces the CPU to wait for IO.
Freeing up 500MB on a manual GC out of a 4GB heap does not seem all that much, most GCs try to keep pause times low as their primary goal, keep the total time spent in GC within some bound as secondary goal and only when the other goals are met they try to reduce memory footprint as tertiary goal.
Before recommending further steps you should gather more statistics/provide more information since it's hard to even discern what your actual problem is from your description.
monitor page faults
figure out which GC algorithm is used in your setup and how they're tuned (-XX:+PrintFlagsFinal)
log GC activity - I suspect it's pretty busy with minor GCs and thus eating up its pause time or CPU load goals
perform allocation profiling of your application (anything creating excessive garbage?)
You also have to be careful to distinguish problems caused by the java heap reaching its sizing limit vs. problems causing by the OS exhausting its physical memory.
TL;DR: Unclear problem, more information required.
Or if you're lazy/can afford it just plug in more RAM / remove other services from the machine and see if the problem goes away.
I learned to check this on GC problems:
Give the JVM enough memory e.g. -Xmx2G
If memory is not sufficient and no more RAM is available on the host, analyze the HEAP dump (e.g. by jvisualvm).
Turn on Concurrent Marc and Sweep:
-XX:+UseConcMarkSweepGC -XX:+UseParNewGC
Check the garbage collection log: -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Xloggc:gc.log
My Solution:
But I solved that problem finally by tuning the cache sizes.
The cache sizes were to big, so memory got scarce.
if you want keep the memory of your server free you can simply try the vm-parameter
-Xmx2G //or any different value
This ensures your program never takes more than 2 Gigabyte of Ram. But be aware if case of high workload the server may be get an OutOfMemoryError.
Since a old generation (full) GC may block your whole server from working for some seconds java will try to avoid a Full Garbage collection.
The Ram-Limitation may trigger a Full-Generation GC more easy (or even support more objects to be collected by Young-Generation GC).
From my (more guessing than actually knowing) opinion: I don't think another algorithm can help so much here.
We have a web application deployed on a tomcat server. There are certain scheduled jobs which we run, after which the heap memory peaks up and settles down, everything seems fine.
However the system admin is complaining that memory usage ('top' on Linux ) keeps increasing the more the scheduled jobs are.
Whats the co-relation between heap memory and memory of the CPU? Can it be controlled by any JVM settings? I used JConsole to monitor the system.
I forced the garbage collection through JConsole and the heap usage came down, however the memory usage on Linux remained high and it never decreased.
Any ideas or suggestions would of great help?
The memory allocated by the JVM process is not the same as the heap size. The used heap size could go down without an actual reduction in the space allocated by the JVM. The JVM has to receive a trigger indicating it should shrink the heap size. As #Xepoch mentions, this is controlled by -XX:MaxHeapFreeRatio.
However the system admin is complaining that memory usage ('top' on Linux ) keeps increasing the more the scheduled jobs are [run].
That's because you very likely have some sort of memory leak. System admins tend to complain when they see processes slowly chew up more and more space.
Any ideas or suggestions would of great help?
Have you looked at the number of threads? Is you application creating its own threads and sending them off to deadlock and wait idly forever?
Are you integrating with any third party APIs which may be using JNI?
What is likely being observed is the virtual size and not the resident set size of the Java process(es)? If you have a goal for a small footprint, you may want to not include -Xms or any minimum size on the JVM heap arguments and adjust the 70% -XX:MaxHeapFreeRatio= to a smaller number to allow for more aggressive heap shrinkage.
In the meantime, provide more detail as to what was observed with the comment the Linux memory never decreased? What metric?
You can use -Xmx and -Xms settings to adjust the size of the heap. With tomcat you can set an environment variable before starting:
export JAVA_OPTS=”-Xms256m -Xmx512m”
This initially creates a heap of 256MB, with a max size of 512MB.
Some more details:
http://confluence.atlassian.com/display/CONF25/Fix+'Out+of+Memory'+errors+by+increasing+available+memory