Java process getting killed likely due to Linux OOM killer

Java process getting killed likely due to Linux OOM killer - java

My java process is getting killed after sometime. The heap settings are min - 2 Gb and max 3 Gb with parallel GC. From pmap command, it is showing more than 40 64Mb anonymos blocks which seems to be causing linux OOM killer.
Error:
There is insufficient memory for the Java Runtime Environment to
continue. Native memory allocation (mmap) failed to map 71827456
bytes for committing reserved memory. Possible reasons: The system
is out of physical RAM or swap space In 32 bit mode, the process
size limit was hit Possible solutions: Reduce memory load on the
system
Increase physical memory or swap space
Check if swap backing store is full Use 64 bit Java on a 64 bit OS Decrease Java heap size (-Xmx/-Xms) Decrease number of Java
threads Decrease Java thread stack sizes (-Xss) Set larger code
cache with -XX:ReservedCodeCacheSize= This output file may be
truncated or incomplete.
Out of Memory Error (os_linux.cpp:2673), pid=21171,
tid=140547280430848
JRE version: Java(TM) SE Runtime Environment (8.0_51-b16) (build
1.8.0_51-b16) Java VM: Java HotSpot(TM) 64-Bit Server VM (25.51-b03 mixed mode linux-amd64 compressed oops) Failed to write core dump.
Core dumps have been disabled. To enable core dumping, try “ulimit -c
unlimited” before starting Java again
Tried reducing the heap to min 512 Mb and max 2 Gb along with G1GC, we see limited number of 64 Mb blocks around 18 and the process does not get killed.
But with heap min 2Gb and max 3Gb, and G1GC , we see high number of 64 Mb blocks.
As per documentation, the max number of 64Mb blocks (malloc arenas) for a 64 bit system with 2 cores can be 2*8 = 16 but we see more than 16.

This answer tries to deal with your observations about memory blocks, the MALLOC_ARENA_MAX and so on. I'm not an expert on native memory allocators. This is based on the Malloc Internals page in the Glibc Wiki.
You have read PrestoDB issue 8993 as implying that glibc malloc will allocate at most MALLOC_ARENA_MAX x NOS_THREADS blocks of memory for the native heap. According to "Malloc Internals", this is not necessarily true.
If the application requests a large enough node, the implementation will call mmap directly rather than using an arena. (The threshold is given by the M_MMAP_THRESHOLD option.)
If an existing arena fills up and compaction fails, the implementation will attempt to grow the arena by calling sbrk or mmap.
These factors mean that MALLOC_ARENA_MAX does not limit the number of mmap'd blocks.
Note that the purpose of arenas is to reduce contention when there are lots of threads calling malloc and free. But it comes with the risk that more memory will be lost due to fragmentation. The goal of MALLOC_ARENA_MAX tuning is to reduce memory fragmentation.
So far, you haven't shown us any clear evidence that you memory problems are due to fragmentation. Other possible explanations are:
your application has a native memory leak, or
your application is simply using a lot of native memory.
Either way, it looks like MALLOC_ARENA_MAX tuning has not helped.

That doesn't look like the Linux OOM killer.
The symptoms you describe indicate that you have run out of physical memory and swap space. In fact, the error message says exactly that:
There is insufficient memory for the Java Runtime Environment to continue. Native memory allocation (mmap) failed to map 71827456 bytes for committing reserved memory. Possible reasons:
The system is out of physical RAM or swap space
In 32 bit mode, the process size limit was hit
A virtual memory system works by mapping the virtual address space to a combination of physical RAM pages, and disk pages. At any given time, the live page may live in RAM or on disk. If an application asks for more virtual memory (e.g using an mmap call), the OS may have to say "can't". That is what has happened.
The solutions are as the message says:
get more RAM,
increase the size of swap space, or
limit the amount of memory that the application asks for ... in various ways.
The G1GC parameters (apart from the max heap size) are largely irrelevant. My understanding is that the max heap size is the total amount of (virtual) memory that the Java heap is allowed to occupy.
So if this is not the Linux OOM killer, what is that?
In fact the OOM killer is a mechanism that identifies applications that are causing dangerous performance problems by doing too much paging. As I mentioned at the start, virtual memory consists of pages that either live in RAM or on disk. In general, the application doesn't know whether any VM page is RAM resident or not. The operating system just takes care of it.
If the application tries to use (read from or a write to) a page that is not RAM resident, a "page fault" occurs. The OS handles this by:
suspending the application thread
finding a spare RAM page
loading reading the disk page into the RAM page
resuming the application thread ... which can then access the memory at the address.
In addition, the operating system needs to maintain a pool of "clean" pages; i.e. pages where the RAM and disk versions are the same. This is done by scanning for mages that have been modified by the application and writing them to disk.
If an application is behaving "nicely", then the amount of paging activity is relatively modest, and thread don't get suspend often. But if there is a lot of paging, you can get to the point where the paging I/O is a bottleneck. In the worst case, the whole system will lock up.
The OOM killer's purpose is to identify processes that are causing the dangerously high paging rates, and .... kill them.
If a JVM process is killed by the OOM killer, it it doesn't get a chance to print an error message (like you got). The process gets a "SIGKILL": instant death.
But ... if you look in the system logfiles, you should see a message that says that such and such process has been killed by the OOM killer.
There are lots of resources that explain the OOM killer:
What killed my process and why?
"How to configure the Linux Out-ofMemory Killer"
"Out of memory management"

Related

OutOfMemoryException in Java process, but Used Heap is about half of Used Size

We are running a process that has a cache that consumes a lot of memory.
But the amount of objects in that cache keeps stable during execution, while memory usage is growing with no limit.
We have run Java Flight Recorder in order to try to guess what is happening.
In that report, we can see that UsedHeap is about half of UsedSize, and I cannot find any explanation for that.
JVM exits and dumps a report of OutOfMemory that you can find here:
https://frojasg1.com/stackOverflow/20210423.outOfMemory/hs_err_pid26210.log
Here it is the whole Java Flight Recorder report:
https://frojasg1.com/stackOverflow/20210423.outOfMemory/test.7z
Does anybody know why this outOfMemory is arising?
May be I would have to change the question ... and ask: Why are there almost 10 GB of used memory that is not used in heap?

The log file says this:
# Native memory allocation (mmap) failed to map 520093696 bytes
for committing reserved memory.
So what has happened is that the JVM has requested a ~500MB chunk of memory from the OS via an mmap system call and the OS has refused.
When I looked at more of the log file, it is clear that G1GC itself is requesting more memory, and it looks like it is doing it while trying to expand the heap1.
I can think of a couple of possible reasons for the mmap failure:
The OS may be out of swap space to back the memory allocation.
Your JVM may have hit the per-process memory limit. (On UNIX / Linux this is implemented as a ulimit.)
If your JVM is running in a Docker (or similar) container, you may have exceeded the container's memory limit.
This is not a "normal" OOME. It is actually a mismatch between the memory demands of the JVM and what is available from the OS.
It can be addressed at the OS level; i.e. by removing or increasing the limit, or adding more swap space (or possibly more RAM).
It could also be addressed by reducing the JVM's maximum heap size. This will stop the GC from trying to expand the heap to an unsustainable size2. Doing this may also result in the GC running more often, but that is better than the application dying prematurely from an avoidable OOME.
1- Someone with more experience in G1GC diagnosis may be able to discern more from the crash dump, but it looks like normal heap expansion behavior to me. There is no obvious sign of a "huge" object being created.
2 - Working out what the sustainable size actually would involve analyzing the memory usage for the entire system, and looking at the available RAM and swap resources and the limits. That is a system administration problem, not a programming problem.

May be I would have to change the question ... and ask: Why are there almost 10 GB of used memory that is not used in heap?
What you are seeing is the difference between memory that is currently allocated to to the heap, and the heap limit that you have set. The JVM doesn't actually request all of the heap memory from the OS up front. Instead, it requests more memory incrementally ... if required ... at the end of a major GC run.
So while the total heap size appears to be ~24GB, the actual memory allocated is substantially less than that.
Normally, that is fine. The GC asks the OS for more memory and adds it to the relevant pools for the memory allocators to use. But in this case, the OS cannot oblige, and G1GC pulls the plug.

What happens to java process if the physical memory is very low on system

I have a Java process running doing some tasks, after a couple of hours there are multiple other applications opened on the system causing a very low physical memory available on the system.
So, if the system has no physical memory/very less memory left, how would my java process respond to such a situation? Would it throw a 'out of memory' exception?

When RAM is exhausted the OS will usually use swap or pagefile to provide virtual memory:
RAM is a limited resource, whereas for most practical purposes, virtual memory is unlimited. There can be many processes, and each process has its own 2 GB of private virtual address space. When the memory being used by all the existing processes exceeds the available RAM, the operating system moves pages (4-KB pieces) of one or more virtual address spaces to the computer’s hard disk. This frees that RAM frame for other uses. In Windows systems, these “paged out” pages are stored in one or more files (Pagefile.sys files) in the root of a partition.
Paging usually results in a severe performance penalty because even modern SSD storage is not as fast as RAM. If the memory scarcity continues, the system may start thrashing.
Assuming that the JVM does not require more memory, e.g. it is already constrained by -Xmx and allocated all allowed memory, it will continue to run. Usually when the memory is exhausted OS will not allow new processes to start e.g. attempting to start new JVM process will result in following error:
Error occurred during initialization of VM
Could not reserve enough space for object heap
Could not create the Java virtual machine.
At the end of the day it depends on the OS configuration. Usually this situation is not something you want to spend time investigating because RAM is much cheaper than developer's time.

Java 8 JVM hangs, but does not crash/ heap dump when out of memory

When running out of memory, Java 8 running Tomcat 8 never stops after a heap dump. Instead it just hangs as it max out memory. The server becomes very slow and non-responsive because of extensive GC as it slowly approaches max memory. The memory graph in JConsole flat lines after hitting max. 64 bit linux/ java version "1.8.0_102"/ Tomcat 8. Jconsole
I have set -XX:HeapDumpOnOutOfMemoryError and -XX:HeapDumpPath. Anyone know how to force heap dump instead of JVM getting into unresponsive/ very slow response mode?

Anyone know how to force heap dump instead of JVM getting into unresponsive/ very slow response mode?
You need to use -XX:+UseGCOverheadLimit. This tells the GC to throw an OOME (or dump the heap if you have configured that) when the percentage time spent garbage collecting gets too high. This should be enabled by default for a recent JVM ... but you might have disabled it.
You can adjust the "overheads" thresholds for the collector giving up using -XX:GCTimeLimit=... and -XX:GCHeapFreeLimit=...; see https://docs.oracle.com/javase/8/docs/technotes/guides/vm/gc-ergonomics.html
The effect of "overhead" limits is that your application gets the GC failures earlier. Hopefully, this avoids the "death spiral" effect as the GC uses a larger and larger proportion of time to collect smaller and smaller amounts of actual garbage.
The other possibility is that your JVM is taking a very long time to dump the heap. That might occur if the real problem is that your JVM is causing virtual memory thrashing because Java's memory usage is significantly greater than the amount of physical memory.

jmap is the utility that will create a heap dump for any running jvm. This will allow you to create a heap dump before a crash
https://docs.oracle.com/javase/8/docs/technotes/guides/troubleshoot/tooldescr014.html
It will be a matter of timing, though, to know when you should create it. You can take subsequent heaps and use tools to compare the heaps. I highly recommend Eclipse Memory Access Tool and it's dominator tree view to identify potential memory issues (https://www.eclipse.org/mat/)

How JVM inccurs paging in/out even though I limit the maximum heap space size?

I am a newbie on Java.
I run an application on top of distributed framework implemented in Java.
The application is disk and network I/O intensive job.
Each machine has 32 GB memory. I run 4 workers per machine and assign 7 GB maximum heap space for each of them. So in total, there are 28 GB memory space reserved for JVMs. The remaining 4 GB is reserved for OS (Cent OS 7). There were no heavy programs that are concurrently running.
Surprisingly, when I monitor the system resource usage by dstat, there are significant amounts of paging are occurring.
How can it be possible? I restricted the memory usage of JVMs!
I appreciate your helps, thanks

The JVM does not page out memory. The operating system does. How and when the OS chooses which pages to evict depends on configuration.
And setting -Xmx only configures the upper limit for the managed heap within the JVM. It does not restrict file mappings, direct memory allocations, native libraries or the page caches kept in memory whenever you do IO.
So you have not really "reserved" 28GB for JVMs, because the OS knows nothing about that and the JVMs know nothing of the other JVMs.

Java virtual memory size larger than requested (or required)

I am running a number of jobs on a computing cluster, and they are killed when they go over a requested resource usage - one of these uses is virtual memory size.
In my java startup command I use -Xmx8000m to indicate an initial stack size of 8GB, I have not yet seen my program's real memory usage go above 4GB, but wanted to be on the safe side.
However, when I use the top command I am seeing a virtual memory size for my java process of 12GB - which is right at the limit of the requested virtual memory space. I can't increase my requested VM size as the jobs are already submitted and the more I ask for the longer they take to be scheduled.
Does Java consistently request more VM heap space than is specified? Is this a constant amount, or a constant % or random? Can the heap space grow above a) the requested VM size (8GB) or b) the allocated VM size (12GB).
Edit:
Using jre-1.7.0-openjdk on Linux

This article gives a good analysis of the problem: Why does my Java process consume more memory than Xmx And its author offers this approximate formula:
Max memory = [-Xmx] + [-XX:MaxPermSize] + number_of_threads * [-Xss]
But besides the memory consumed by your application, the JVM itself
also needs some elbow room.
- Garbage collection.
- JIT optimization.
- Off-heap allocations.
- JNI code.
- Metaspace.
But be carefull as it may depend on both the platform and the JVM vendor/version.

This could be due to the change in malloc behavior in glibc 2.10+, where malloc now creates per-thread memory pools (arenas). The arena size on 64-bit is 64MB. After using 8 arenas on 64-bit, malloc sets the number of arenas to be number_of_cpus * 8. So if you are using a machine with many processor cores, the virtual size is set to a large amount very quickly, even though the actual memory used (resident size) is much smaller.
Since you are seeing top show 12GB virtual size, you are probably using a 64-bit machine with 24 cores or HW threads, giving 24 * 8 * 64MB = 12GB. The amount of virtual memory allocated varies with number of cores, and the amount will change depending on the number of cores on the machine your job gets sent to run on, so this check is not meaningful.
If you are using hadoop or yarn and get the warning, set yarn.nodemanager.vmem-check-enabled in yarn-site.xml to false.
References:
See #6 on this page:
http://blog.cloudera.com/blog/2014/04/apache-hadoop-yarn-avoiding-6-time-consuming-gotchas/
which links to more in-depth discussion on this page:
https://www.ibm.com/developerworks/community/blogs/kevgrig/entry/linux_glibc_2_10_rhel_6_malloc_may_show_excessive_virtual_memory_usage
Note this is already partially answered on this stackoverflow page:
Container is running beyond memory limits

You you are really keen to investigate the issue and you are on Linux then check PID of your jvm process and view the file /proc/<PID>/smaps. There you will find the whole OS process memory map - as seen by the kernel. You will see how mach heap(OS heap) process uses, which memory regions are mapped from files(libraries), ... etc.
PS: you can also files various tools, to analyze the smaps file, on the Internet.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.