Sspring-boot app in docker container consumes high CPU without being used - java

I have a very strange issue. I have a java web app (spring boot 1.5) which runs inside a docker container.
At some point the app starts consuming the CPU quire hard.
So i was thinking that the app itself has a bug of some sort.
BUT
If i remove the app from the load balancer, so it will no accept any connections, the app continues to consume a lot of CPU, even it is not accessed at all.
I continue to see a lot of GC log entries from the app in the log file.
It seems that the JVM continues to run GC on Young gen every 300ms, even when the app should be completely idle (and it is idle as there is nothing in the logfile)!
The app itself is just a website using spring boot. Nothing really special there (no scheduled task or what so ever).
Any idea what might going on here ? Can it be docker related ?
Thanks in advance

OK,
it turns out this had nothing to do with docker. Was a bug in the app, creating many(unnecessary) short lived objects which needed GC.

Related

How to prevent a Spring Boot / Tomcat (Java8) process be OOM-killed?

Since moving to Tomcat8/Java8, now and then the Tomcat server is OOM-killed. OOM = Out-of-memory kill by the Linux kernel.
How can I prevent the Tomcat server be OOM-killed?
Can this be the result of a memory leak? I guess I would get a normal Out-of-memory message, but no OOM-kill. Correct?
Should I change settings in the HEAP size?
Should I change settings for the MetaSpace size?
Knowing which Tomcat process is killed, how to retrieve info so that I can reconfigure the Tomcat server?
Firstly check that the oomkill isn't being triggered by another process in the system, or that the server isn't overloaded with other processes. It could be that Tomcat is being unfairly targeted by oomkill when some other greedy process is the culprit.
Heap should be set as a maximum size (-Xmx) to be smaller than the physical RAM on the server. If it is more than this, then paging will cause desperately poor performance when garbage collecting.
If it's caused by the metaspace growing in an unbounded fashion, then you need to find out why that is happening. Simply setting the maximum size of metaspace will cause an outofmemory error once you reach the limit you've set. And raising the limit will be pointless, because eventually you'll hit any higher limit you set.
Run your application and before it crashes (not easy of course but you'll need to judge it), kill -3 the tomcat process. Then analyse the heap and try to find out why metaspace is growing big. It's usually caused by dynamically loading classes. Is this something your application is doing? More likely, it's some framework doing this. (Nb oom killer will kill -9 the tomcat process, and you won't be able to diagnostics after that, so you need to let the app run and intervene before this happens).
Check out also this question - there's an intriguing answer which claims an obscure fix to an XML binding setting cleared the problem (highly questionable but may be worth a try) java8 "java.lang.OutOfMemoryError: Metaspace"
Another very good solution is transforming your application to a Spring Boot JAR (Docker) application. Normally this application has a lot less memory consumption.
So steps the get huge improvements (if you can move to Spring Boot application):
Migrate to Spring Boot application. In my case, this took 3 simple actions.
Use a light-weight base image. See below.
VERY IMPORTANT - use the Java memory balancing options. See the last line of the Dockerfile below. This reduced my running container RAM usage from over 650MB to ONLY 240MB. Running smoothly. So, SAVING over 400MB on 650MB!!
This is my Dockerfile:
FROM openjdk:8-jdk-alpine
ENV JAVA_APP_JAR your.jar
ENV AB_OFF true
EXPOSE 8080
ADD target/$JAVA_APP_JAR /deployments/
CMD ["java","-XX:+UnlockExperimentalVMOptions", "-XX:+UseCGroupMemoryLimitForHeap", "-jar","/deployments/your.jar"]

Should Kubernetes restart my JVM app when going out of memory?

We have an application deployed on Kubernetes. One of the services is an app running in the JVM.
Our application is faulty, it consumes to much memory. We are hitting the limit set in the Replication Controller, which makes it restart the pod.
Is it a good idea to use the replication controller for this? Or is it a better idea to limit the memory on the JVM (set it to something below the replication controller limit) and use something else inside the pod to restart our application?
If the JVM would stop with an out-of-memory exception, I could use memory dumps written by the JVM. Now I'm blind as to what occupied the memory.
Thank you for your reply!
In fact this is not a responsibility of Deployment / ReplicationController. The restarts are handled within Pod it self. As for handling memory, this is not a trivial issue and you should control it on both levels (things like heap sizes etc.) so that your app avoids hitting the limit, but if it does go awall, limits on pod will handle it, and that is perfectly ok (specially if you run more then one pod so you have HA).
The thing here is, that is is a bit tricky to tune memory limits well, and probably can be done best with trial and error + monitoring pod/container metrics (Prometheus/Grafana to the rescue :) ).

Performance --- response time slows down

We have a Java EE application(jsp/servlet,jdbc) running on Apache-tomcat server. The response time slows with time. It slows down at faster rate when continuously worked on.
The response time is back to normal after restart of the server.
I connected Jconsole to the server and I am attaching the screen shot of the heap memory usage,which goes up when doing intensive work and garbage collector kicks off periodically and the memory usage comes down.
However, when testing towards the end, despite kicking off garbage collector manually the response time is not going down. I
I also checked the connections and they seem to be closing off properly. i.e I do not notice any zcx
Any help is appreciated.
Attach with jvisualvm in the JDK. It allows you to profile Tomcat and find where the time goes.
My guess right now is the database connections. Either they go stale or the pool runs dry.
How much slower are the response times? Have you done any profiling or logging to help know which parts of your app are slower? It might be useful to setup a simple servlet to see if that also slows down as the other does. That might tell you if Tomcat or something in your app is slowing down.
Did you fine-tune your tomcat memory settings? Perhaps you need to increase perm gen size a bit.
e.g. -XX:MaxPermSize=512M
You can know it for sure if you can get a heap dump and load it to a tool like MemoryAnalyzer.

CPU Usage Spikes in WebSphere 6.1

First, just a bit of background:
One of our customers is experiencing CPU usage spikes for WebSphere instances running one of our web apps (other instances with other apps are fine). They have a test environment and a live environment (both iSeries) which both experience the problem - with a single app per instance setup. We have deployed this application locally in our own test environments and also for many other customers all on iSeries with no similar problems.
What's actually happening:
Every one second or so, the CPU usage for the WebSphere process' CPU usage jumps to anywhere from 7%-20% even though there are no requests being processed at the time. Customer has reported seeing spikes as high as 30%. These spikes average out to be 1.5% of CPU overall - the other WebSphere instances typically use 0%-0.1% when idle.
My investigations so far
So, I had a look at the threads. One thread in there test environment was using ~350 CPU cycles per second. A similar thread in their live environment was using ~1500 CPU cycles per second (showing that it has bigger CPU). The call stack for these threads looks like
Type Program Statement Procedure
QLESPI QSYS 17 LE_Create_Thread2__FP12crtt >
QJVALIBJVM QSYS 7 startThread__FPv
J com/ibm/ws/util/Threa > run
J com/ibm/ws/util/Threa > run
J com/ibm/ws/util/Threa > getTask
J com/ibm/ws/util/Bound > poll
The entire class name from the bottom line is com/ibm/ws/util/BoundedBuffer. I asked the customer to do a JVM Dump for me - the only additional information I got from this was the thread name:
Thread: 00002F82 Deferrable Alarm : 11
Now for my questions:
Can any of you identify the problem, given these symptoms? (Maybe that's a long shot!)
What is Deferrable Alarm? From the JVM Dump, I can see 4 threads with this name. The other three seem to be doing just fine. By debugging my local WebSphere (on Windows) and adding breakpoints in the BoundedBuffer class, I see that BoudedBuffers are polling and periodically invoking some listener.
I don't have access to the WebSphere console for the customer machines, and they aren't owning up to having made any config changes. I can ask them to check the console for me though - what should I be asking them to look at?
I have telnet access to the customer boxes, is there anything else I can investigate here? Looking at the WebSphere profile files, etc? Which files should I be looking at?
Because the Call Stack and JVM Dump don't explicitly reference our code, is it safe to assume that this is a configuration problem?
It's been a long question, so thanks for reading this far.
30 April Update (1)
This morning I've noticed that this behaviour only happens after the first request of the day has been processed (irrespective of which Web Service is invoked). This points the finger back at our application or Apache Axis. Could it be that this is just normal behaviour?!
30 April Update (2)
So it seems that this CPU activity is some kind of housekeeping activity for the web-container or maybe something within Apache Axis. I've now observed this happening on a few different web-applications on a few different servers. Applications with no web component don't suffer the same additional CPU overhead.
I'd imagine if it is housekeeping work, that "tuning" it somehow could be counter productive - by that, I mean that making the App Server idle better would probably negatively affect the amount of "real" work it can do.
You could try to profile and do heap dumps of the application, that could answer a few questions related to memory and cpu usage.
I would recommend following the must gather documentation provided by IBM, and raising a PMR along with your own investigation. Things you might suspect:
Garbage collection (unlikely on low application utilization)
Timers or tasks (such as java.util.Timer or commonj work manager)
Pretest connection that has a complex SQL query (in the DataSource's WebSphere Application Server data source properties)
I would also recommend using the profiler to determine the cause, YourKit profiler is a pretty decent one.
Very instinctively (being unfamiliar with iSeries platforms) I would look at disk IO related issues. Can you describe the disk subsystem? Can you see if your app is spending an unusually large amount of time in iowait?
I know this doesn't quite match your problem. But might be worth a look if your running prior to WAS 6.1 patch 17.
http://www-01.ibm.com/support/docview.wss?uid=swg24018437
Hope this helps. Cheers John
My best guess is that it is some type of monitoring is being done on the instance, like Tivioli etc. Have you ruled out any GC activity?
HTH Tom
Most application servers are implemented in java itself and so is WebSphere. This servers apart from serving client requests have to do other periodical jobs like say resource pool management. Performing this jobs will create some temporary objects that needs to be garbage collected.
Depending on how much heap you have allocated, usage and garbage collector settings, garbage collector will be invoked. I'd say try to see if it is garbage collector thread that is taking up your CPU. For this connect jconsole utility to remote websphere process for a day and see if there is any co-relation between heap usage and cpu usage.
I am also experiencing this very same issue, [Deferrable Alarm:x] using with BoundedBuffer. The only difference I have is that this is on a Windows 7 64bit machine. There is absolutely no Tivioli or other batch process running, no requests being made, the single instance is just idle.
I can run the application in DEBUG mode and pause the Deferrable Alarm thread and the CPU spikes stop, resume and they start again.
I've checked disk activity, network activity and their is nothing happening there.
I am running WebSphere 6.1.0.27 .

Java Application Server Performance

I've got a somewhat dated Java EE application running on Sun Application Server 8.1 (aka SJSAS, precursor to Glassfish). With 500+ simultaneous users the application becomes unacceptably slow and I'm trying to assist in identifying where most of the execution time is spent and what can be done to speed it up. So far, we've been experimenting and measuring with LoadRunner, the app server logs, Oracle statpack, snoop, adjusting the app server acceptor and session (worker) threads, adjusting Hibernate batch size and join fetch use, etc but after some initial gains we're struggling to improve matters more.
Ok, with that introduction to the problem, here's the real question: If you had a slow Java EE application running on a box whose CPU and memory use never went above 20% and while running with 500+ users you showed two things: 1) that requesting even static files within the same app server JVM process was exceedingly slow, and 2) that requesting a static file outside of the app server JVM process but on the same box was fast, what would you investigate?
My thoughts initially jumped to the application server threads, both acceptor and session threads, thinking that even requests for static files were being queued, waiting for an available thread, and if the CPU/memory weren't really taxed then more threads were in order. But then we upped both the acceptor and session threads substantially and there was no improvement.
Clarification Edits:
1) Static files should be served by a web server rather than an app server. I am using the fact that in our case this (unfortunately) is not the configuration so that I can see the app server performance for files that it doesn't execute -- therefore excluding any database performance costs, etc.
2) I don't think there is a proxy between the requesters and the app server but even if there was it doesn't seem to be overloaded because static files requested from the same application server machine but outside of the application's JVM instance return immediately.
3) The JVM heap size (Xmx) is set to 1GB.
Thanks for any help!
SunONE itself is a pain in the ass. I have a very same problem, and you know what? A simple redeploy of the same application to Weblogic reduced the memory consumption and CPU consumption by about 30%.
SunONE is a reference implementation server, and shouldn't be used for production (don't know about Glassfish).
I know, this answer doesn't really helps, but I've noticed considerable pauses even in a very simple operations, such as getting a bean instance from a pool.
May be, trying to deploy JBoss or Weblogic on the same machine would give you a hint?
P.S. You shouldn't serve static content from under application server (though I do it too sometimes, when CPU is abundant).
P.P.S. 500 concurrent users is quite high a load, I'd definetely put SunONE behind a caching proxy or Apache which serves static content.
After using a Sun performance monitoring tool we found that the garbage collector was running every couple seconds and that only about 100MB out of the 1GB heap was being used. So we tried adding the following JVM options and, so far, this new configuration as greatly improved performance.
-XX:+DisableExplicitGC -XX:+AggressiveHeap
See http://java.sun.com/docs/performance/appserver/AppServerPerfFaq.html
Our lesson: don't leave JVM option tuning and garbage collection adjustments to the end. If you're having performance trouble, look at these settings early in your troubleshooting process.

Categories