We are running a set of Java applications in docker containers on OpenShift. On a regular basis we experience oom kills for our containers.
To analyse this issue we set up Instana and Grafana for monitoring. In Grafana we have graphs for each of our containers showing memory metrics e.g. JVM heap, memory.usage and memory.total_rss. From these graphs we know that the heap as well as the memory.total_rss of our containers is pretty stable on a certain level over a week. So we assume that we do not have a memory leak in our Java application. However, the memeory.total is constantly increasing over the time and after a couple of days it goes beyond the configured memory limit of the docker container. As far as we can see this doesn't cause Openshift to kill the container immediately but sooner or later it happens.
We increased the memory limit of all our containers and this seems to help since Openshift is not killing our containers that often anymore. However we still see in Grafana that the memeory.total is exceeding the configures memory limit of our containers significantly after a couple of days (rss memory is fine).
To better understand Openshifts OOM killer, does anybody know which memory metric Openshift takes into account to decide if a container has to be killed or not? Is the configured container memory limit related to the memory.usage or the memory.total_rss or something completely different?
Thanks for help in advance.
Related
Can mounting jre directory from host system reduce ram memory usage by sharing heapspace? Or will this cause some problems?
I have a lot of containers running java service inside. The problem is, that sometimes when the services have very strong workload, they need (eventually) a lot if heapspace. When i assign for each container (for example) -Xmx2g, then im pretty fast running out of RAM on my host system. Unfortunately once java allocated heap, it will not be free anymore (for the container RAM, host RAM). Restarting the container will free the allocated memory for the heapspace used in the peak, but for container with solr inside it will (probably) take several hours to index all the data again, what makes the downtime only possible on the weekend.
The idea is to using common jre in the host system to share the heapspace between single services. Probably i can assign -Xmx the following value (only an example): 250m times a number of services plus 3g for the workload peaks. This way i will using much less memory, because the services sharing the heap space.
Is there an error in my idea or can it really be worth?
Maybe someone is already faced such a problem and and probably solved it in another way?
I don't think it is a good idea to share memory between containers. Docker is designed to isolate different environments and reduce the effects from other containers. So run with their own jvm is the current way to use Docker and other containers.
Also if you shared memory, it is hard to migrate the container.
I already found a solution here (schrinking java heapspace): https://stackoverflow.com/a/4952645/2893873
I assumed that shrinking java heap space is not possible, but it is. I think it will be a better solution instead of sharing the JVM between the container.
Since moving to Tomcat8/Java8, now and then the Tomcat server is OOM-killed. OOM = Out-of-memory kill by the Linux kernel.
How can I prevent the Tomcat server be OOM-killed?
Can this be the result of a memory leak? I guess I would get a normal Out-of-memory message, but no OOM-kill. Correct?
Should I change settings in the HEAP size?
Should I change settings for the MetaSpace size?
Knowing which Tomcat process is killed, how to retrieve info so that I can reconfigure the Tomcat server?
Firstly check that the oomkill isn't being triggered by another process in the system, or that the server isn't overloaded with other processes. It could be that Tomcat is being unfairly targeted by oomkill when some other greedy process is the culprit.
Heap should be set as a maximum size (-Xmx) to be smaller than the physical RAM on the server. If it is more than this, then paging will cause desperately poor performance when garbage collecting.
If it's caused by the metaspace growing in an unbounded fashion, then you need to find out why that is happening. Simply setting the maximum size of metaspace will cause an outofmemory error once you reach the limit you've set. And raising the limit will be pointless, because eventually you'll hit any higher limit you set.
Run your application and before it crashes (not easy of course but you'll need to judge it), kill -3 the tomcat process. Then analyse the heap and try to find out why metaspace is growing big. It's usually caused by dynamically loading classes. Is this something your application is doing? More likely, it's some framework doing this. (Nb oom killer will kill -9 the tomcat process, and you won't be able to diagnostics after that, so you need to let the app run and intervene before this happens).
Check out also this question - there's an intriguing answer which claims an obscure fix to an XML binding setting cleared the problem (highly questionable but may be worth a try) java8 "java.lang.OutOfMemoryError: Metaspace"
Another very good solution is transforming your application to a Spring Boot JAR (Docker) application. Normally this application has a lot less memory consumption.
So steps the get huge improvements (if you can move to Spring Boot application):
Migrate to Spring Boot application. In my case, this took 3 simple actions.
Use a light-weight base image. See below.
VERY IMPORTANT - use the Java memory balancing options. See the last line of the Dockerfile below. This reduced my running container RAM usage from over 650MB to ONLY 240MB. Running smoothly. So, SAVING over 400MB on 650MB!!
This is my Dockerfile:
FROM openjdk:8-jdk-alpine
ENV JAVA_APP_JAR your.jar
ENV AB_OFF true
EXPOSE 8080
ADD target/$JAVA_APP_JAR /deployments/
CMD ["java","-XX:+UnlockExperimentalVMOptions", "-XX:+UseCGroupMemoryLimitForHeap", "-jar","/deployments/your.jar"]
We have an application deployed on Kubernetes. One of the services is an app running in the JVM.
Our application is faulty, it consumes to much memory. We are hitting the limit set in the Replication Controller, which makes it restart the pod.
Is it a good idea to use the replication controller for this? Or is it a better idea to limit the memory on the JVM (set it to something below the replication controller limit) and use something else inside the pod to restart our application?
If the JVM would stop with an out-of-memory exception, I could use memory dumps written by the JVM. Now I'm blind as to what occupied the memory.
Thank you for your reply!
In fact this is not a responsibility of Deployment / ReplicationController. The restarts are handled within Pod it self. As for handling memory, this is not a trivial issue and you should control it on both levels (things like heap sizes etc.) so that your app avoids hitting the limit, but if it does go awall, limits on pod will handle it, and that is perfectly ok (specially if you run more then one pod so you have HA).
The thing here is, that is is a bit tricky to tune memory limits well, and probably can be done best with trial and error + monitoring pod/container metrics (Prometheus/Grafana to the rescue :) ).
Let's say I have a very large Java application that's deployed on Tomcat. Over the course of a few weeks, the server will run out of memory, application performance is degraded, and the server needs a restart.
Obviously the application has some memory leaks that need to be fixed.
My question is.. If the application were deployed to a different server, would there be any change in memory utilization?
Certainly the services offered by the application server might vary in their memory utilization, and if the server includes its own unique VM -- i.e., if you're using J9 or JRockit with one server and Oracle's JVM with another -- there are bound to be differences. One relevant area that does matter is class loading: some app servers have better behavior than others with regard to administration. Warm-starting the application after a configuration change can result in serious memory leaks due to class loading problems on some server/VM combinations.
But none of these are really going to help you with an application that leaks. It's the program using the memory, not the server, so changing the server isn't going to affect much of anything.
There will probably be a slight difference in memory utilisation, but only in as much as the footprint differs between servlet containers. There is also a slight chance that you've encountered a memory leak with the container - but this is doubtful.
The most likely issue is that your application has a memory leak - in any case, the cause is more important than a quick fix - what would you do if the 'new' container just happens to last an extra week etc? Moving the problem rarely solves it...
You need to start analysing the applications heap memory, to locate the source of the problem. If your application is crashing with an OOME, you can add this to the JVM arguments.
-XX:-HeapDumpOnOutOfMemoryError
If the performance is just degrading until you restart the container manually, you should get into the routine of triggering periodic heap dumps. A timeline of dumps is often the most help, as you can see which object stores just grow over time.
To do this, you'll need a heap analysis tool:
JHat or IBM Heap Analyser or whatever your preference :)
Also see this question:
Recommendations for a heap analysis tool for Java?
Update:
And this may help (for obvious reasons):
How do I analyze a .hprof file?
Tomcat 5.5.x and 6.0.x
Grails 1.6.x
Java 1.6.x
OS CentOS 5.x (64bit)
VPS Server with memory as 384M
JAVA_OPTS : tried many combinations- including the following
export JAVA_OPTS='-Xms128M -Xmx512M -XX:MaxPermSize=1024m'
export JAVA_OPTS='-server -Xms128M -Xmx128M -XX:MaxPermSize=256M'
(As advised by http://www.grails.org/Deployment)
I have created a blank Grails application i.e simply by giving the command grails create-app and then WARed it
I am running Tomcat on a VPS Server
When I simply start the Tomcat server, with no apps deployed, the free memory is about 236M
and used memory is about 156M
When I deploy my "blank" application, the memory consumption spikes to 360M and finally the Tomcat instance is killed as soon as it takes up all free memory
As you have seen, my app is as light as it can be.
Not sure why the memory consumption is as high it is.
I am actually troubleshooting a real application, but have narrowed down to this scenario which is easier to share and explain.
UPDATE
I tested the same "blank" application on my local Tomcat 5.5.x on Windows and it worked fine
The memory consumption of the Java process shot from 32 M to 107M. But it did not crash and it remained under acceptable limits
So the hunt for answer continues... I wonder if something is wrong about my Linux box. Not sure what though...
UPDATE 2
Also see this http://www.grails.org/Grails+Test+On+Virtual+Server
It confirms my belief that my simple-blank app should work on my configuration.
It is a false economy to try to run a long running Java-based application in the minimal possible memory. The garbage collector, and hence the application will run much more efficiently if it has plenty of regular heap memory. Give an application too little heap and it will spend too much time garbage collecting.
(This may seem a bit counter-intuitive, but trust me: the effect is predictable in theory and observable in practice.)
EDIT
In practical terms, I'd suggest the following approach:
Start by running Tomcat + Grails with as much memory as you can possibly give it so that you have something that runs. (Set the permgen size to the default ... unless you have clear evidence that Tomcat + Grails are exhausting permgen.)
Run the app for a bit to get it to a steady state and figure out what its average working set is. You should be able to figure that out from a memory profiler, or by examining the GC logging.
Then set the Java heap size to be (say) twice the measured working set size or more. (This is the point I was trying to make above.)
Actually, there is another possible cause for your problems. Even though you are telling Java to use heaps of a given size, it may be that it is unable to do this. When the JVM requests memory from the OS, there are a couple of situations where the OS will refuse.
If the machine (real or virtual) that you are running the OS does not have any more unallocated "real" memory, and the OS's swap space is fully allocated, it will have to refuse requests for more memory.
It is also possible (though unlikely) that per-process memory limits are in force. That would cause the OS to refuse requests beyond that limit.
Finally, note that Java uses more virtual memory that can be accounted for by simply adding the stack, heap and permgen numbers together. There is the memory used by the executable + DLLs, memory used for I/O buffers, and possibly other stuff.
384MB is pretty small. I'm running a small Grails app in a 512MB VPS at enjoyvps.net (not affiliated in any way, just a happy customer) and it's been running for months at just under 200MB. I'm running a 32-bit Linux and JDK though, no sense wasting all that memory in 64-bit pointers if you don't have access to much memory anyway.
Can you try deploying a tomcat monitoring webapp e.g. psiprobe and see where the memory is being used?