Analysing Memory Issues

Analysing Memory Issues - java

We have an issue where on the JVMs memory usage increases gradually and then this impacts CPU performance. It increases CPU time.
We are trying to take heap dump to analyse the issue. But wanted to understand what is the typical procedure - does looking at gc log, looking at heap dump provide the required information.
What are the other things which one needs to watch out for?

Please look at the heap dump and GC dump and try to anaylse that at the time when GC reaches its peak, any major task is going on. For example - Cron jobs scehduled at that time on your server etc.
You will get an idea as to what exactly is affecting your application.

The approach I take it is to use a memory profile to reduce both the amount memory retained after a GC and the amount of garbage discarded. Reducing object discarding, can improve performance and reduce noise in the profile results.

Here is how we analyzed the problem:
We looked at the recent code changes to identify the areas changed
Took a heapdump
Looked at thread dump.(was not very useful for our current analysis). Heapdump told us the kind of the objects which are very high in memory, these were cache objects.
But the recent code changes kind of helped us explain our current situation. Actually the interesting thing was that we had increased timeouts for some of threads we were forking. This internally caused lot of concurrency and the result was increased memory usage since some objects might now be held longer before being GC-able. Our usecases supported possibility of such contention.
Overall what I could understand was that heap dumps, thread dumps and recent changes - all looked together helped us understand the issue. Any one of them individually did not make sense.

Related

Make ZGC run often

ZGC runs not often enough. GC logs show that it runs once every 2-3 minutes for my application and because of this, my memory usage goes high between GC cycles (as high as 90%). After GC, it drops to as low as 20%.
How to increase GC run's frequency to run more often?

-XX:ZCollectionInterval=N - set maximum gap between collections to N seconds.
-XX:ZUncommitDelay=M - set the delay until unused memory is returned to the OS to M seconds.

Before tuning the GC, I would recommend to investigate why this is happening. Might have some issue/bug in your application.
[Some notes about GC]
-XX:ZUncommitDelay=M (Check if it is supported by your linux kernel)
-XX:+ZProactive: Enables proactive GC cycles when using ZGC. By default, this option is enabled. ZGC will start a proactive GC cycle if doing so is expected to have minimal impact on the running application. This is useful if the application is mostly idle or allocates very few objects, but you still want to keep the heap size down and allow reference processing to happen even when there are a lot of free space on the heap.
More details about ZGC config. options can be found:
ZGC Home Page.
Oracle Documentation

Presently (as of JDK 17), ZGC's primary strategy is to wait until the last possible moment of the heap filling up and then do a collection. Its goals are
Avoid unnecessary CPU load by running GC only when it's necessary.
Start the GC early enough so that it will finish before the heap actually fills up (since the heap filling up would be bad, leading to a temporary application stall).
It does this by measuring how fast your app is allocating memory, how long the GC takes to run, and predicting at what point it should start the GC. You can find the exact algorithm in the source code.
ZGC also exposes some knobs for running GC more often (ie, proactively), but honestly I don't find them terribly effective. You can find more info in my other answer. G1 does a better job of being proactive, but whether that's good or not depends on your use-case. (It sounds like you care more about throughput than memory usage, so I think you should prefer ZGC's behavior.)
However, if you find that ZGC is making mistakes in predicting when the heap will fill up and that your application really is hitting stalls, please share that info here or on the ZGC mailing list.

JVM Memory usage patter issue

I recently configured one JBOSS application with application monitoring tools (StatsD) which helps to capture JVM utilization of the application. Even without any single users using the application, the memory pattern touches around 90-95% (850 - 970 MB)of the allocated memory (1024 MB).
Minor GC runs at every point when the memory reaches 90-95%. Please see the below screenshot for the same.
Request your help to know what can be the reason/s for such a memory pattern.
*No batch jobs or background process is running.

This just looks like normal behavior to me. The heap space used rises gradually to a point where a GC runs. Then the GC reclaims a lot of free space and the heap space used drops steeply. Then repeat.
It looks like you have stats from two separate JVMs in the same graph, but I guess you knew that. (You have obscured the labels on the graph that could explain that.)
The only other thing I can glean from this is that the memory allocation rate is on the high side to be causing the GC to run that frequently. It may be advisable to do some GC tuning. But I would only advise that if application-level performance was suffering. (And it may well be that the real problem was application efficiency rather than GC performance.)
Then:
But I got an issue here too where the heap dump says it is of ~1GB but when I upload it on Eclipse MAT, it only shows the dump of 11MB. Most of the heavy objects are seen under "unreached objects" section of MAT. Please let me know why a 1GB dump is only showing 11MB size in MAT if you have any idea or have used MAT for analysis.
That is also easy to explain. The "unreached objects" are garbage. You must have run the heap dump tool at an instant when the heap usage was close to one of the peaks.
Stepping back, it is not clear to me what you are actually looking for here:
If you are just curious to understand what the monitoring looks like, this is what a JVM normally looks like.
If you are trying to investigate a performance problem (GC pauses, etc) you need to look at the other evidence.
If you are looking for evidence of a memory leak, you are looking in the wrong place. These graphs won't help with that. You need to look at the JVM's behavior over the long term. Look for things like long term trends in the "saw tooth" such as the level of the bottom of troughs trending upwards. And to investigate a suspected memory leak you need to compare MAT analyses for dumps taken over time.
But bear in mind, increasing memory usage over time is not necessarily a memory leak. It could be the application of a library caching things. A properly implemented cache will release objects if the JVM starts running out of memory.

Painfully Slow JVM Not Caused by Memory Leak?

I'm programming in Java using eclipse and after running JVM for a couple of hours, my program tends to slow to a trickle. What's normally printed (or executed) in a few fraction's of a second, is taking a couple of minutes or hours.
I'm aware this is usually caused by a memory leak in program. However, I'm under the impression that a memory leak slows PC bec it uses the majority of CPU power for garbage collection. When I take a look at task manager I only see 22-25% of CPU being used at the moment (it has remained steady for the last couple of hours) and approx. 35% of memory free on my machine.
Could the slowing down of my program be caused by something other than a memory leak or is it for sure a memory leak (which means I now need to take a hard look to track down source of leak..) And if yes, why would CPU usage be relatively low?
Thanks

Sometimes this happens when you have loop relationships over your objects or entities. JVM tries to read the data or bind the data looping through same set of objects, this drastically effect the performance of the JVM; most of the time crash the application even. As on previous answer, you can use jconsole to check which time this happens and take an action. Hope you get the idea; may be this is not the case, this is what came to my mind when I read your question.
cheers!!!

Well, at first, Memory Leak/any other malfunction doesn't affect your PC or any other part of your computer unless you are referencing some external resource which is choking. To answer your question, Generically speaking, while there is a possibility that slowing down your program could be caused by CPU, in your case however since your program/process is going slow gradually, most likely there is a memory Leak in your code.
You could use any profiler / jVIsualVM to monitor the mermoy usage/ object's state to nail down the issue.

You may be aware that a modern computer system has more than one CPU core. A single threaded program will use only a single core, which is consistent with task manager reporting an overall cpu usage of 25% (1 core fully loaded, 3 cores idle = 25% total cpu capacity used).
Garbage collection can cause slowdowns, but usually only does so if the JVM is memory constrained. To verify whether it is garbage collection, you can use jconsole or jvisualvm (which are part of the JDK) to see how much CPU time was spent doing garbage collection.
To investigate why your program is slow, using a profiler is usually the most efficient approach.

I think We can not say anything straight forward for this issue. You need to check the behaviour of you program using jconsole or jvisualvm which is part of you JDK.

JVM Garbage Collector suddenly consumes 100% CPU after running for several hours

I've got a strange problem in my Clojure app.
I'm using http-kit to write a websocket based chat application.
Client's are rendered using React as a single page app, the first thing they do when they navigate to the home page (after signing in) is create a websocket to receive things like real-time updates and any chat messages. You can see the site here: www.csgoteamfinder.com
The problem I have is after some indeterminate amount of time, it might be 30 minutes after a restart or even 48 hours, the JVM running the chat server suddenly starts consuming all the CPU. When I inspect it with NR (New Relic) I can see that all that time is being used by the garbage collector -- at this stage I have no idea what it's doing.
I've take a number of screenshots where you can see the effect.
You can see a number of spikes, those spikes correspond to large increases in CPU usage because of the garbage collector. To free up CPU I usually have to restart the JVM, I have been relying on receiving a CPU alert from NR in my slack account to make sure I jump on these quickly....but I really need to get to the root of the problem.
My initial thought was that I was possibly holding onto the socket reference when the client closed it at their end, but this is not the case. I've been looking at socket count periodically and it is fairly stable.
Any ideas of where to start?
Kind regards, Jason.

It's hard to imagine what could have caused such an issue. But at first what I would do is taking a heap dump at the time of crash. This can be enabled with -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=<path_to_your_heap_dump> JVM args. As a general practice don't increase heap size more the size of physical memory available on your server machine. In some rare cases JVM is unable to dump heap space because process is doomed; in such cases you can use gcore(if you're on Linux, not sure about Windows).
Once you grab the heap dump, analyse it with mat, I have debugged such applications and this worked perfectly to pin down any memory related issues. Mat allows you to dissect the heap dump in depth so you're sure to find the cause of your memory issue if it is not the case that you have allocated very small heap space.

If your program is spending a lot of CPU time in garbage collection, that means that your heap is getting full. Usually this means one of two things:
You need to allocate more heap to your program (via -Xmx).
Your program is leaking memory.
Try the former first. Allocate an insane amount of memory to your program (16GB or more, in your case, based on the graphs I'm looking at). See if you still have the same symptoms.
If the symptoms go away, then your program just needed more memory. Otherwise, you have a memory leak. In this case, you need to do some memory profiling. In the JVM, the way this is usually done is to use jmap to generate a heap dump, then use a heap dump analyser (such as jhat or VisualVM) to look at it.
(Fair disclosure: I'm the creator of a jhat fork called fasthat.)

Most likely your tenure space is filling up triggering a full collection. At this time the GC uses all the CPUS for sometime seconds at time.
To diagnose why this is happening you need to look at your rate of promotion (how much data is moving from young generation to tenured space)
I would look at increasing the young generation size to decrease rate of promotion. You could also look at using CMS as this has shorter pause times (though it uses more CPU)

Things to try in order:
Reduce the heap size
Count the number of objects of each class, and see if the numbers makes sense
Do you have big byte[] that lives past generation 1?
Change or tune GC algorithm
Use high-availability, i.e. more than one JVM
Switch to Erlang
You have triggered a global GC. The GC time grows faster-than-linear depending on the amount of memory, so actually reducing the heap space will trigger the global GC more often and make it faster.
You can also experiment with changing GC algorithm. We had a system where the global GC went down from 200s (happened 1-2 times per 24 hours) to 12s. Yes, the system was at a complete stand still for 3 minutes, no the users were not happy :-) You could try -XX:+UseConcMarkSweepGC
http://www.fasterj.com/articles/oraclecollectors1.shtml
You will always have stops like this for JVM and similar; it is more about how often you will get it, and how fast the global GC will be. You should make a heap dump and get the count of the different objects of each class. Most likely, you will see that you have millions of one of them, somehow, you are keeping a pointer to them unnecessary in a ever growing cache or sessions or similar.
http://docs.oracle.com/javase/8/docs/technotes/guides/troubleshoot/memleaks001.html#CIHCAEIH
You can also start using a high-availability solution with at least 2 nodes, so that when one node is busy doing GC, the other node will have to handle the total load for a time. Hopefully, you will not get the global GC on both systems at the same time.
Big binary objects like byte[] and similar is a real problem. Do you have those?
At some time, these needs to be compacted by the global GC, and this is a slow operation. Many of the data-processing JVM based solution actually avoid to store all data as plain POJOs on the heap, and implement heaps themselves in order to overcome this problem.
Another solution is to switch from JVM to Erlang. Erlang is near real time, and they got by not having the concept of a global GC of the whole heap. Erlang has many small heaps. You can read a little about it at
https://hamidreza-s.github.io/erlang%20garbage%20collection%20memory%20layout%20soft%20realtime/2015/08/24/erlang-garbage-collection-details-and-why-it-matters.html
Erlang is slower than JVM, since it copies data, but the performance is much more predictable. It is difficult to have both. I have a websocket Erlang based solution, and it really works well.
So you run into a problem that is expected and normal for JVM, Microsoft CLR and similar. It will get worse and more common during the next couple of years when heap sizes grows.

Java code's CPU usage changing

I ran into this observation when I was trying to give a rough estimate of my code's runtime: The CPU usage stayed at around 10% for the first part of the code, which involved traversing 2 strings and tossing things into a hashset. When the code reached the second part (2 nested for loops, plus a gigantic array allocation), the CPU usage skyrocketed to 50%.
I don't think I did anything related to multithreading. Also if I slightly modify the first part (something really simple like adding another for loop), the CPU usage varies a lot too. Why is this happening? A bit curious about who's utilizing the CPU consumption.

This isn't necessarily a direct comparison between your code being ran and the CPU usage as the JVM will fluctuate CPU usage based on what is in the heap at that point in time while it manages garbage collection. If it suddenly went up to 50% then back down this may be coincidence as at the same time garbage collection kicks in.

You can get more detailed information by running jvisualvm.exe, which is located in your jdk bin folder. I'm not sure how much CPU usage info is available out of the box, but there are at least plugins that will give you CPU usage info. This is also a graphical way to see heap usage, as per david99world's answer.
http://visualvm.java.net/index.html

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.