I am running a big Minecraft(Spigot) server with a proprietary, heavily obfuscated AntiCheat plugin as well as our custom set of battle tested plugins. Server is creating and unloading new worlds many times, thus after unload, leaving CraftPlayer and CraftWorld objects for GC.
We are running on Java 13 (same happens Java 12) with G1GC and so-called "aikar gc flags"
java -XX:+UseLargePagesInMetaspace -XX:+UseG1GC
-XX:+UnlockExperimentalVMOptions -XX:MaxGCPauseMillis=100
-XX:TargetSurvivorRatio=90 -XX:G1NewSizePercent=50
-XX:G1MaxNewSizePercent=80 -XX:G1MixedGCLiveThresholdPercent=35
-XX:+ParallelRefProcEnabled -jar Spigot.jar
AntiCheat keeps weak reference to CraftPlayers, and yet they leak, and with the leak big enough the server's CPU is spinning very heavily to ultimately die with OutOfMemoryError. Manually running System.gc() is not cleaning these up as well.
When AntiCheat is disabled, the leak is gone.
The ONLY gc-root reference in the heapdump to all of the leaked CraftWorlds and CraftPlayer is the entry in WeakHashMap, key being CraftPlayer. (CraftPlayer and CraftWorld cross reference each other before being normally GCd).
There's a way you can make a leak with a WeakHashMap: the stale "expired" entries won't be deleted if you aren't actively using get()/set() which triggers expungeStaleEntries(). Maybe that would be the cause if AntiCheat was using something like Map<String, WeakHashMap<...>>, but that's not the case, it is using one major WeakHashMap.
And there's a catch: in this kind of leak, you could see that WeakHashMap's queue would be non-empty! And yet it's empty, as I checked! I also noticed the WeakHashMap is wrapped by synchronizedMap, but doesn't seem to me like that could be the problem.
Is this really a GC bug caused by one of the flags?
This pattern repeats every time and in every heapdump:
Related
My Java application consumer more ram and old objects still in the memory
need tool to get all objects still in memory and how to remove it.
need tool for testing performance too
Have you tried forcing the Garbage Collector to run with System.gc()? If it's not properly clearing out your items it is likely you have a lot of memory leaks going.
I recommend running a profiler on your application to start finding where you have left objects that are no longer used but cannot be released by the Garbage Collector due to lingering references.
We're running a Jersey (1.x) based service in Tomcat on AWS in an array of ~20 instances Periodically an instance "goes bad": over the course of about 4 hours its heap and CPU usage increase until the heap is exhausted and the CPU is pinned. At that point it gets automatically removed from the load balancer and eventually killed.
Examining heap dumps from these instances, ~95% of the memory has been used up by an instance of java.lang.ref.Finalizer which is holding onto all sorts of stuff, but most or all of it is related to HTTPS connections sun.net.www.protocol.https.HttpsURLConnectionImpl, sun.security.ssl.SSLSocketImpl, various crypto objects). These are connections that we're making to an external webservice using Jersey's client library. A heap dump from a "healthy" instance doesn't indicate any sort of issue.
Under relatively low load instances run for days or weeks without issue. As load increases, so does the frequency of instance failure (several per day by the time average CPU gets to ~40%).
Our JVM args are:
-XX:+UseG1GC -XX:MaxPermSize=256m -Xmx1024m -Xms1024m
I'm in the process of adding JMX logging for garbage collection metrics, but I'm not entirely clear what I should be looking for. At this point I'm primarily looking for ideas of what could kick off this sort of failure or additional targets for investigation.
Is it possibly a connection leak? I'm assuming you have checked for that?
I've had similar issues with GC bugs. Depending on your JVM version is looks like you are using an experimental (and potentially buggy) feature. You can try disabling G1 and use the default garbage collector. Also depending on your version, you might be running into a garbage collection overhead where it bails and doesn't properly GC stuff because it is taking too long to calculate what can and can't be trashed. The -XX:-UseGCOverheadLimit might help if available in your JVM.
Java uses a single finalizer thread to clean up dead objects. Your machine's symptoms are consistent with a pileup of backlogged finalizations. If the finalizer thread slows down too much (because some object takes a long time to finalize), the resulting accumulation of finalizer queue entries could cause the finalizer thread to fall further and further behind the incoming objects until everything grinds to a halt.
You may find profiling useful in determining what objects are slowing the finalizer thread.
This ultimately turned out to be caused by a JVM bug (unfortunately I've lost the link to the specific one we tracked it down to). Upgrading to a newer version of OpenJDK (we ended up with OpenJDK 1.7.0_50) solved the issue without us making any changes to our code.
I'm running a Java server using Apache Thrift and profiling it I found memory (Old Gen) is always growing, as shown by this graph:
The sharp drop at the end of the graph is because I clicked "Perform GC".
I understand there's a memory leak here. So I ran a leak detector (MAT) and it reported as follows:
One instance of "com.sun.jmx.remote.internal.ArrayNotificationBuffer"
loaded by "" occupies 7,844,208 (77.22%) bytes.
I never use this class myself, so I assume Apache Thrift uses this internally. I also found that ArrayNotificationBuffer memory leak this actually is an old known fixed JDK bug.
So I have a some questions about this:
Why when I click "Perform GC" there's such a drop in the allocated memory? Isn't the GC ran automatically the same? Why it doesn't garbage-collect this memory then?
I use OpenJDK (7u55-2.4.7-1ubuntu1~0.12.04.2) and as far as I can see all bugs relating to ArrayNotificationBuffer are quite old and fixed, so why is this happening? How to fix it?
The fact that the allocation was cleared when you ran GC() just means a legitimate chunk of memory that would eventually have been released. If your heap is large and other allocation requests do not fail, old gen could be deferred for a while.
As for the buffer, I would speculate that a JMX notification listener was registered but is not handling emitted notifications in a timely manner, but it's hard to say.
Yesterday I deployed my first Grails (2.3.6) app to a dev server and began monitoring it. I just got an automated monitor stating that CPU was pinned on this machine, and so I SSHed into it. I ran top and discovered that it was my Java app's PID that was pinning the server. I also noticed memory was at 40%. After a few seconds, the CPU stopped pinning, went down to a normal level, and memory went back down into the ~20% range. Classic major GC.
While it was collecting, I did a heap dump. After the GC, I then opened the dump in JVisualVM and saw that most of the memory was being allocated for an org.codehaus.groovy.runtime.metaclass.MetaMethodIndex.Entry class. There were almost 250,000 instances of these in total, eating up about 25 MB of memory.
I googled this class and took a look at it's ultra helpful Javadocs. So I still have no idea what this class does.
But googling it also brought up about a dozen or so related articles (some of them SO questions) involving this class and a PermGen/classloader leak with Grails/Groovy apps. And while it seems that my app did in fact clean up these 250K instance with a GC, it still is troubling that there were so many instances of it, and that the GC pinned CPU for over 5 minutes.
My questions:
What is this class and what is Groovy doing with it?
Can someone explain this answer to me? Why would -XX:+CMSClassUnloadingEnabled -XX:+CMSPermGenSweepingEnabled help this particular problem?
Why is this class particularly troublesome for the PermGen?
Groovy is a dynamic language, every method call is dispatched dynamically. To optimise that Groovy creates a MetaClass for every java.lang.Class in the MetaClassRegistry. These MetaClass instances are created on-demand and stored using Weak references.
The reason you see a lot of org.codehaus.groovy.runtime.metaclass.MetaMethodIndex.Entry is because Groovy is storing a map of classes and methods in memory so that they can be quickly dispatched by the runtime. Depending on the size of the application this can be as you have discovered thousands of classes as each class can have dozens sometimes hundreds of methods.
However, there is no "memory leak" in Groovy and Grails, what you are seeing is normal behaviour. Your application is running low on memory, probably because it hasn't been allocated enough memory, this in turn causes MetaClass instances to be garbage collected. Now say for example you have a loop:
for(str in strings) {
println str.toUpperCase()
}
In this case we are calling a method on the String class. If you are running low on memory what will happen is that for each iteration of the loop the MetaClass will be garbage collected and then recreated again for the next iteration. This can dramatically slow down an application and lead to the CPU being pinned as you have seen. This state is commonly referred to as "metaclass churn" and is a sign your application is running low on heap memory.
If Groovy was not garbage collecting these MetaClass instances then yes that would mean there is a memory leak in Groovy, but the fact that it is garbage collecting these classes is a sign that all is well, except for the fact that you have not allocated enough heap memory in the first place. That is not to say that there may be a memory leak in another part of the application that is eating up all the available memory and leaving not enough for Groovy to operate correctly.
As for the other answer you refer to, adding class unloading and PermGen tweaks won't actually do anything to resolve your memory issues unless you dynamically parsing classes at runtime. PermGen space is used by the JVM to store dynamically created classes. Groovy allows you to compile classes at runtime using GroovyClassLoader.parseClass or GroovyShell.evaluate. If you are continuously parsing classes then yes adding class unloading flags can help. See also this post:
Locating code that is filling PermGen with dead Groovy code
However, a typical Grails application does not dynamically compile classes at runtime and hence tweaking PermGen and class unloading settings won't actually achieve anything.
You should verify if you have allocated enough heap memory using the -Xmx flag and if not allocate more.
I have such problem that jvm is not able to perform gc in time and application freezes. "Solution" for that is to connect to application using jconsole and suggest jvm to make garbage collections. I do not have to say that it is very poor behavior of application. Are there some option for jvm to suggest to it to perform gc sooner/more often? Maybe there are some other real solution to this problem?
The problem appears not to be not enough of memory but that gc is not able to do collection in time before new data is send to application. It is so because gc appears to start to collect data to late. If is is suggested early enough by System.gc() button of jconsole then problem does not occur.
Young generation is collected by 'PS Scavenge' which is parallel collector.
Old generation is collected by 'PS MarkSweep' which is parallel mark and sweep collector.
You should check for memory leaks.
I'm pretty sure you won't get OutOfMemoryException unless there's no memory to be released and no more available memory.
There is System.gc() that does exactly what you described: It suggests to the JVM that a garbage collection should take place. (There are also command-line arguments for the JVM that can serve as directives for the memory manager.)
However, if you're running out of memory during an allocation, it typically means that the JVM did attempt a garbage collection first and it failed to release the necessary memory. In that case, you probably have memory leaks (in the sense of keeping unnecessary references) and you should get a memory profiler to check that. This is important because if you have memory leaks, then more frequent garbage collections will not solve your problem - except that maybe they will postpone its manifestation, giving you a false sense of security.
From the Java specification:
OutOfMemoryError: The Java Virtual Machine implementation has run out
of either virtual or physical memory, and the automatic storage
manager was unable to reclaim enough memory to satisfy an object
creation request.
You can deploy java melody on your server and add your application on it, it will give you detailed report of your memory leaks and memory usage. With this you will be able to optimize your system and code correctly.
I guess, either your application requires more memory to run efficiently, try tuning your JVM by setting parameters like -Xms512M -Xmx1024M.
Or,
There is memory leak which is exhausting the memory.
You should check the memory consumption pattern of your application. e.g. what memory it is occupying when it is processing more vs remain idle.
If you observe a constant surge in memory peaks, it could suggest towards a possible memory leak.
One of the best thread on memory leak issue is How to find a Java Memory Leak
Another good one is http://www.ibm.com/developerworks/library/j-leaks/
Additionally,
you may receive an OOME if you're loading a lot of classes (let's say, all classes present in your rt.jar). Since loaded classes reside in PermGen rather than heap memory, you may also want to increase your PermGen size using -XX:MaxPermSize switch.
And, of course, you're free to choose a garbage collector – ParallelGC, ConcMarkSweepGC (CMS) or G1GC (G1).
Please be aware that there're APIs in Java that may cause memory leaks by themselves (w/o any programmer's error) -- e. g. java.lang.String#substring() (see here)
If your application freezes, but gets unfrozen by a forced GC, then your problem is very probably not the memory, but some other resource leak, which is alleviated by running finalizers on dead objects. Properly written code must never rely on finalizers to do the cleanup, so try to find any unclosed resources in your application.
You can start the jvm with more memory
java -Xms512M -Xmx1024M
will start the jvm with 512Mb of memory, allowing it to grow to a gigabyte.
You can use System.gc() to suggest to the VM to run the garbage collector. There is no guarantee that it will run immediately.
I doubt if that will help, but it might work. Another thing you could look at is increasing the maximum memory size of the JVM. You can do this by giving the command line argument -Xmx512m. This would give 512 megabytes of heap size instead of the default 128.
You can use JConsole to view the memory usage of your application. This can help to see how the memory usage develops which is useful in detecting memory leaks.