Java process memory growing indefinitely. Memory leak?

Java process memory growing indefinitely. Memory leak? - java

We have a java process running on Solaris 10 serving about 200-300 concurrent users. The administrators have reported that memory used by process increases significantly over time. It reaches 2GB in few days and never stops growing.
We have dumped the heap and analysed it using Eclipse Memory Profiler, but weren't able to see anything out of the ordinary there. The heap size was very small.
After adding memory stat logging, to our application we have found discrepancy between memory usage reported by "top" utility, used by the administrator, and the usage reported by MemoryMXBean and Runtime libraries.
Here is an output from both.
Memory usage information
From the Runtime library
Free memory: 381MB
Allocated memory: 74MB
Max memory: 456MB
Total free memory: 381MB
From the MemoryMXBean library.
Heap Committed: 136MB
Heap Init: 64MB
Heap Used: 74MB
Heap Max: 456MB
Non Heap Committed: 73MB
Non Heap Init: 4MB
Non Heap Used: 72MB
Current idle threads: 4
Current total threads: 13
Current busy threads: 9
Current queue size: 0
Max threads: 200
Min threads: 8
Idle Timeout: 60000
PID USERNAME NLWP PRI NICE SIZE RES STATE TIME CPU COMMAND
99802 axuser 115 59 0 2037M 1471M sleep 503:46 0.14% java
How can this be? top command reports so much more usage. I was expecting that RES should be close to heap+non-heap.
pmap -x , however, reports most of the memory in the heap:
Address Kbytes RSS Anon Locked Mode Mapped File
*102000 56 56 56 - rwx---- [ heap ]
*110000 3008 3008 2752 - rwx---- [ heap ]
*400000 1622016 1621056 1167568 - rwx---- [ heap ]
*000000 45056 45056 45056 - rw----- [ anon ]
Can anyone please shed some light on this? I'm completely lost.
Thanks.
Update
This does not appear to be an issue on Linux.
Also, based on the Peter Lawrey's response the "heap" reported by pmap is native heap not Java heap.

I have encountered a similar problem and found a resolution:
Solaris 11
JDK10
REST application using HTTPS (jetty server)
There was a significant increase of c-heap (observed via pmap) over time
I decided to do some stress tests with libumem.
So i started the proces with
UMEM_DEBUG=default UMEM_LOGGING=transaction LD_PRELOAD=libumem.so.1
and stressed the application with https requests.
After a while I connected to the process with mdb.
In mdb I used the command ::findleaks and it showed this as a leak:
libucrypto.so.1`ucrypto_digest_init
So it seems than the JCA (Java Cryptography Architecture) implementation OracleUcrypto has some issues on Solaris.
The problem was resolved by updating of the $JAVA_HOME/conf/security/java.security file -
I changed the priority of OracleUcrypto to 3 and the SUN implementation to 1
security.provider.3=OracleUcrypto
security.provider.2=SunPKCS11 ${java.home}/conf/security/sunpkcs11-solaris.cfg
security.provider.1=SUN
After this the problem dissapeared.
This also explains why there is no problem on linux - since there are different implememntations of JCA providers in play

In garbage collected environments, holding on to unused pointers amounts to "failure to leak" and prevents the GC from doing its job. It's really easy
to accidentally keep pointers around.
A common culprit is hashtables. Another is arrays or vectors which are
logically cleared (by setting the reuse index to 0) but where the actual
contents of the array (above the use index) is still pointing to something.

Related

How to set max non-heap memory in a Java 8 (Spring Boot) application?

I have 20 Spring Boot (2.3) embedded Tomcat applications running on a Linux machine with 8GB. All applications are Java 1.8 apps. The machine was running out of memory and Linux started killing some of my app processes as a result.
Using Linux top and Spring Boot admin, I noticed that the max memory heap was set to 2GB:
java -XX:+PrintFlagsFinal -version | grep HeapSize
As a result, each of the 20 apps are trying to get 2GB of heap size (1/4th of physical mem). Using Spring Boot admin I could see only ~128 MB is being used. So I reduced the max heap size to 512 via java -Xmx512m ... Now, Spring Boot admin shows:
1.33 GB is allocated to non-heap space but only 121 MB is being used. Why is so much being allocated to non-heap space? How can I reduce?
Update
According to top each Java process is taking around 2.4GB (VIRT):
KiB Mem : 8177060 total, 347920 free, 7127736 used, 701404 buff/cache
KiB Swap: 1128444 total, 1119032 free, 9412 used. 848848 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2547 admin 20 0 2.418g 0.372g 0.012g S 0.0 4.8 27:14.43 java
.
.
.
Update 2
I ran jcmd 7505 VM.native_memory for one of the processes and it reported:
7505:
Native Memory Tracking:
Total: reserved=1438547KB, committed=296227KB
- Java Heap (reserved=524288KB, committed=123808KB)
(mmap: reserved=524288KB, committed=123808KB)
- Class (reserved=596663KB, committed=83423KB)
(classes #15363)
(malloc=2743KB #21177)
(mmap: reserved=593920KB, committed=80680KB)
- Thread (reserved=33210KB, committed=33210KB)
(thread #32)
(stack: reserved=31868KB, committed=31868KB)
(malloc=102KB #157)
(arena=1240KB #62)
- Code (reserved=254424KB, committed=27120KB)
(malloc=4824KB #8265)
(mmap: reserved=249600KB, committed=22296KB)
- GC (reserved=1742KB, committed=446KB)
(malloc=30KB #305)
(mmap: reserved=1712KB, committed=416KB)
- Compiler (reserved=1315KB, committed=1315KB)
(malloc=60KB #277)
(arena=1255KB #9)
- Internal (reserved=2695KB, committed=2695KB)
(malloc=2663KB #19903)
(mmap: reserved=32KB, committed=32KB)
- Symbol (reserved=20245KB, committed=20245KB)
(malloc=16817KB #167011)
(arena=3428KB #1)
- Native Memory Tracking (reserved=3407KB, committed=3407KB)
(malloc=9KB #110)
(tracking overhead=3398KB)
- Arena Chunk (reserved=558KB, committed=558KB)
(malloc=558KB)

First of all - no, 1.33GB is not allocated. On the screenshot you have 127MB of nonheap memory allocated. The 1.33GB is the max limit.
I see your metaspace is about 80MB which should not pose a problem. The rest of the memory can be composed by a lot of things. Compressed classes, code cache, native buffers etc...
To get the detailed view of what is eating up the offheap memory, you can query the MBean java.lang:type=MemoryPool,name=*, for example via VisualVM with an MBean plugin.
However, your apps may simply be eating too much native memory. For example many I/O buffers from Netty may be the culprit (used up by the java.nio.DirectByteBuffer). If that's the culprit, you can for example limit the caching of the DirectByteBuffers with the flag -Djdk.nio.maxCachedBufferSize, or place a limit with -XX:MaxDirectMemorySize.
For a definitive answer of what exactly is eating your RAM, you'd have to create a heap dump and analyze it.
So to answer your question "Why is so much being allocated to non-heap space? How can I reduce?" There's not a lot allocated to non-heap space. Most of it is native buffers for I/O, and JVM internals. There is no universal switch or flag to limit all the different caches and pools at once.
Now to adress the elephant in the room. I think your real issue stems from simply having very little RAM. You've said you are running 20 instances of JVM limited to 512MB of heap space on 8GB machine. That is unsustainable. 20 x 512MB = 10GB of heap, which is more than you can accommodate with 8GB of total RAM. And that is before you even count in the off-heap/native memory. You need to either provide more HW resources, decrease the JVM count or further decrease the heap/metaspace and other limits (which I strongly advise not to).

In addition to what has already been stated, here's a very good article about the Metaspace in the JVM which by defaults reserves about 1GB (though it may not actually use that much). So that's another thing you can tune using the flag -XX:MaxMetaspaceSize if you have many small apps and want to decrease the amount of memory used/reserved.

Accounting for Java memory consumption

We are running a Java spring boot application on AWS. The platform we use is Tomcat 8.5 with Java 8 running on 64bit Amazon Linux/3.3.6. The machines are 4GB machines. We run this Java application with JVM args -XMX and -XMS as 1536m. The problem we are facing is that these instances quite frequently goes in to warning state due to 90%+ memory usage. Now we are trying to account for memory usage process by process.
To start with we just ran the top command on these machines. Here is the part of output.
top - 11:38:13 up 4:39, 0 users, load average: 0.90, 0.84, 0.90Tasks: 101 total, 1 running,
73 sleeping, 0 stopped, 0 zombieCpu(s): 31.8%us, 3.7%sy, 5.6%ni, 57.2%id, 0.3%wa, 0.0%hi, 1.5%si, 0.0%st
Mem: 3824468k total, 3717908k used, 106560k free, 57460k buffersSwap: 0k total, 0k used, 0k free, 300068k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2973 tomcat 20 0 5426m 2.2g 0 S 37.1 60.6 173:54.98 java
As you can see Java is taking 2.2GB of memory. We have given XMX as 1.5GB. Although, we are aware that by using XMX we are just restricting heap, we wanted to analyse where exactly this extra 0.7GB is going towards. Towards that end, we decided to use NewRelic. And here is the graph on non heap memory usage.
The total memory non heap memory usage we could see comes around ~200MB. So with this 200MB and 1.5GB heap memory, we expect the total memory to be consumed by Java to be 1.7GB. This 1.7GB figure is also confirmed from NewRelic graphs as below:
As I mentioned earlier, the top command is telling us the Java is taking 2.2GB of memory. However we could only account for 1.7GB using NewRelic. How can we reconcile this extra 0.5GB of memory?

There's more than you see on the NewRelic's non-heap memory usage graph.
E.g. there are also Thread stacks which can occupy up to 1MB per thread.
There's a JVM feature called Native Memory Tracking that you can use to track some of the non-heap memory usage.
There can still be native allocations that aren't tracked at all.
I suggest you look at these excellent resources from #apangin:
Java using much more memory than heap size (or size correctly Docker memory limit)
Memory footprint of a Java process by Andrei Pangin: https://www.youtube.com/watch?v=c755fFv1Rnk&list=PLRsbF2sD7JVqPgMvdC-bARnJ9bALLIM3Q&index=6

JVM leaking memory outside heap and buffer pools

We have a java application as a long running service (actual up-time for this JVM 31 days 3 hrs 35 min)
Due to Windows taskmanager the process uses 1,075,384,320 B - nearly one GB.
Heap size of the JVM is restricted to 256 MB (-Xmx256m)
Memory-Data
Memory:
Size: 268,435,456 B
Max: 268,435,456 B
Used: 100,000,000 up to 200,000,000 B
- no leak here
Buffer Pools
Direct:
Count: 137
Memory Used and Total Capacity: 1,348,354 B
Mapped:
Count: 0
Memory Used and Total Capacity: 0 B
- no leak here
my question: where does the JVM uses the additional memory?
Additional informations:
Java: version 1.8.0_74 32 bit (Oracle)
Classes:
Total loaded: 17,248
Total unloaded: 35,761
Threads:
Live: 273
Live peak: 285
Daemon: 79
Total started: 486,282
After a restart it takes some days for the process size to grow, so of course regular restart would help, and maybe using a newer java version also may solve the problem, but I would like to have an explanation for this behaviour, e. g. known bug in 1.8.0 before 111, fixed in ... - I did not find anything, yet.
We use about 350 of such installations in different places so changing is not so easy.

Don't forget to run your JVM in server mode on long running tasks!
The reason for this kind of memory leak was the JVM running in client mode.
Our solutions runs in a couple of chain stores on old windows xp 32-bit PCs.
Default for JVMs on this platform is client mode.
In most cases we run JRE 1.8.0_74-32bit. With our application this JVM leaks memory in "Thread Arena Space" - seems to be nothing returned ever.
After switching to server mode by setting the parameter -server on JVM start the problems disappeared.

There are two common reasons for off-heap memory consumption:
Your application or one of the libraries you are using (e.g. JDBC driver) perform something natively (a module invoked via JNI)
Your application or one of the libraries are using off-heap memory in other ways, i.e. with direct buffers or with some "clever" use of sun.misc.Unsafe.
You can verify this tracking the native memory usage with jcmd as explained here (you'll need to restart the application).

Difference between Resident Set Size (RSS) and Java total committed memory (NMT) for a JVM running in Docker container

Scenario:
I have a JVM running in a docker container. I did some memory analysis using two tools: 1) top 2) Java Native Memory Tracking. The numbers look confusing and I am trying to find whats causing the differences.
Question:
The RSS is reported as 1272MB for the Java process and the Total Java Memory is reported as 790.55 MB. How can I explain where did the rest of the memory 1272 - 790.55 = 481.44 MB go?
Why I want to keep this issue open even after looking at this question on SO:
I did see the answer and the explanation makes sense. However, after getting output from Java NMT and pmap -x , I am still not able to concretely map which java memory addresses are actually resident and physically mapped. I need some concrete explanation (with detailed steps) to find whats causing this difference between RSS and Java Total committed memory.
Top Output
Java NMT
Docker memory stats
Graphs
I have a docker container running for most than 48 hours. Now, when I see a graph which contains:
Total memory given to the docker container = 2 GB
Java Max Heap = 1 GB
Total committed (JVM) = always less than 800 MB
Heap Used (JVM) = always less than 200 MB
Non Heap Used (JVM) = always less than 100 MB.
RSS = around 1.1 GB.
So, whats eating the memory between 1.1 GB (RSS) and 800 MB (Java Total committed memory)?

You have some clue in "
Analyzing java memory usage in a Docker container" from Mikhail Krestjaninoff:
(And to be clear, in May 2019, three years later, the situation does improves with openJDK 8u212 )
Resident Set Size is the amount of physical memory currently allocated and used by a process (without swapped out pages). It includes the code, data and shared libraries (which are counted in every process which uses them)
Why does docker stats info differ from the ps data?
Answer for the first question is very simple - Docker has a bug (or a feature - depends on your mood): it includes file caches into the total memory usage info. So, we can just avoid this metric and use ps info about RSS.
Well, ok - but why is RSS higher than Xmx?
Theoretically, in case of a java application
RSS = Heap size + MetaSpace + OffHeap size
where OffHeap consists of thread stacks, direct buffers, mapped files (libraries and jars) and JVM code itse
Since JDK 1.8.40 we have Native Memory Tracker!
As you can see, I’ve already added -XX:NativeMemoryTracking=summary property to the JVM, so we can just invoke it from the command line:
docker exec my-app jcmd 1 VM.native_memory summary
(This is what the OP did)
Don’t worry about the “Unknown” section - seems that NMT is an immature tool and can’t deal with CMS GC (this section disappears when you use an another GC).
Keep in mind, that NMT displays “committed” memory, not "resident" (which you get through the ps command). In other words, a memory page can be committed without considering as a resident (until it directly accessed).
That means that NMT results for non-heap areas (heap is always preinitialized) might be bigger than RSS values.
(that is where "Why does a JVM report more committed memory than the linux process resident set size?" comes in)
As a result, despite the fact that we set the jvm heap limit to 256m, our application consumes 367M. The “other” 164M are mostly used for storing class metadata, compiled code, threads and GC data.
First three points are often constants for an application, so the only thing which increases with the heap size is GC data.
This dependency is linear, but the “k” coefficient (y = kx + b) is much less then 1.
More generally, this seems to be followed by issue 15020 which reports a similar issue since docker 1.7
I'm running a simple Scala (JVM) application which loads a lot of data into and out of memory.
I set the JVM to 8G heap (-Xmx8G). I have a machine with 132G memory, and it can't handle more than 7-8 containers because they grow well past the 8G limit I imposed on the JVM.
(docker stat was reported as misleading before, as it apparently includes file caches into the total memory usage info)
docker stat shows that each container itself is using much more memory than the JVM is supposed to be using. For instance:
CONTAINER CPU % MEM USAGE/LIMIT MEM % NET I/O
dave-1 3.55% 10.61 GB/135.3 GB 7.85% 7.132 MB/959.9 MB
perf-1 3.63% 16.51 GB/135.3 GB 12.21% 30.71 MB/5.115 GB
It almost seems that the JVM is asking the OS for memory, which is allocated within the container, and the JVM is freeing memory as its GC runs, but the container doesn't release the memory back to the main OS. So... memory leak.

Disclaimer: I am not an expert
I had a production incident recently when under heavy load, pods had a big jump in RSS and Kubernetes killed the pods. There was no OOM error exception, but Linux stopped the process in the most hardcore way.
There was a big gap between RSS and total reserved space by JVM. Heap memory, native memory, threads, everything looked ok, however RSS was big.
It was found out that it is due to the fact how malloc works internally. There are big gaps in the memory where malloc takes chunks of memory from. If there are a lot of cores on your machine, malloc tries to adapt and give every core each own space to take free memory from to avoid resource contention. Setting up export MALLOC_ARENA_MAX=2 solved the issue. You can find more about this situation here:
Growing resident memory usage (RSS) of Java Process
https://devcenter.heroku.com/articles/tuning-glibc-memory-behavior
https://www.gnu.org/software/libc/manual/html_node/Malloc-Tunable-Parameters.html
https://github.com/jeffgriffith/native-jvm-leaks
P.S. I don't know why there was a jump in RSS memory. Pods are built on Spring Boot + Kafka.

How can a track down a non-heap JVM memory leak in Jboss AS 5.1?

After upgrading to JBoss AS 5.1, running JRE 1.6_17, CentOS 5 Linux, the JRE process runs out of memory after about 8 hours (hits 3G max on a 32-bit system). This happens on both servers in the cluster under moderate load. Java heap usage settles down, but the overall JVM footprint just continues to grow. Thread count is very stable and maxes out at 370 threads with a thread stack size set at 128K.
The footprint of the JVM reaches 3G, then it dies with:
java.lang.OutOfMemoryError: requested 32756 bytes for ChunkPool::allocate. Out of swap space?
Internal Error (allocation.cpp:117), pid=8443, tid=1667668880
Error: ChunkPool::allocate
Current JVM memory args are:
-Xms1024m -Xmx1024m -XX:MaxPermSize=256m -XX:ThreadStackSize=128
Given these settings, I would expect the process footprint to settle in around 1.5G. Instead, it just keeps growing until it hits 3G.
It seems none of the standard Java memory tools can tell me what in the native side of the JVM is eating all this memory. (Eclipse MAT, jmap, etc). Pmap on the PID just gives me a bunch of [ anon ] allocations which don't really help much. This memory problem occurs when I have no JNI nor java.nio classes loaded, as far as I can tell.
How can I troubleshoot the native/internal side of the JVM to find out where all the non-heap memory is going?
Thank you! I am rapidly running out of ideas and restarting the app servers every 8 hours is not going to be a very good solution.

As #Thorbjørn suggested, profile your application.
If you need more memory, you could go for a 64bit kernel and JVM.

Attach with Jvisualvm in the JDK to get an idea on what goes on. jvisualvm can attach to a running process.

Walton:
I had similar issue, posted my question/finding in https://community.jboss.org/thread/152698 .
Please try adding -Djboss.vfs.forceCopy=false to java start up parameter to see if it helps.
WARN: even if it cut down process size, you need to test more to make sure everything all right.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.