Why Java Container in Kubernetes takes more Memory as Limits? - java

I am running different Java Containers in Kubernetes with OpenJDK 11.0.8 and Payara-Micro and Wildfly 20. Resources are defined as followed:
resources:
requests:
memory: "512Mi"
limits:
memory: "1Gi"
OpenJDK is running with default memory settings which means a Heap Ratio of 25%.
So I assume that 1G is the upper limit of memory the container can consume. But after some days the upper limits are exceeded as the containers memory consumption increases slowly but steadily. In particular, it is the Non-Heap memory which is increasing and not the heap memory.
So I have two questions: why is the non-heap memory increasing and why is the container/jdk/gc ignoring the container memory limits?
Example Measurement
This is some example data about the pod and the memory consumption to illustrate the situation.
POD information (I skipped irrelevant data here):
$ kubectl describe pod app-5df66f48b8-4djbs -n my-namespace
Containers:
app:
...
State: Running
Started: Sun, 27 Sep 2020 13:06:44 +0200
...
Limits:
memory: 1Gi
Requests:
memory: 512Mi
...
QoS Class: Burstable
Check Memory usage with kubectl top:
-$ kubectl top pod app-5df66f48b8-4djbs -n my-namespace
NAME CPU(cores) MEMORY(bytes)
app-587894cd8c-dpldq 72m 1218Mi
Checking memory limit inside the pod:
[root#app-5df66f48b8-4djbs jboss]# cat /sys/fs/cgroup/memory/memory.limit_in_bytes
1073741824
VM Flags inside the pod/jvm:
[root#app-5df66f48b8-4djbs jboss]# jinfo -flags <PID>
VM Flags:
-XX:CICompilerCount=2 -XX:InitialHeapSize=16777216 -XX:MaxHeapSize=268435456 -XX:MaxNewSize=89456640 -XX:MinHeapDeltaBytes=196608 -XX:NewSize=5570560 -XX:NonNMethodCodeHeapSize=5825164 -XX:NonProfiledCodeHeapSize=122916538 -XX:OldSize=11206656 -XX:ProfiledCodeHeapSize=122916538 -XX:ReservedCodeCacheSize=251658240 -XX:+SegmentedCodeCache -XX:+UseCompressedClassPointers -XX:+UseCompressedOops -XX:+UseSerialGC
Heap Info inside the POD/jvm:
[root#app-5df66f48b8-4djbs jboss]# jcmd <PID> GC.heap_info
68:
def new generation total 78656K, used 59229K [0x00000000f0000000, 0x00000000f5550000, 0x00000000f5550000)
eden space 69952K, 77% used [0x00000000f0000000, 0x00000000f34bfa10, 0x00000000f4450000)
from space 8704K, 59% used [0x00000000f4cd0000, 0x00000000f51e7ba8, 0x00000000f5550000)
to space 8704K, 0% used [0x00000000f4450000, 0x00000000f4450000, 0x00000000f4cd0000)
tenured generation total 174784K, used 151511K [0x00000000f5550000, 0x0000000100000000, 0x0000000100000000)
the space 174784K, 86% used [0x00000000f5550000, 0x00000000fe945e58, 0x00000000fe946000, 0x0000000100000000)
Metaspace used 122497K, capacity 134911K, committed 135784K, reserved 1165312K
class space used 15455K, capacity 19491K, committed 19712K, reserved 1048576K
VM Metaspace Info:
$ jcmd 68 VM.metaspace
68:
Total Usage - 1732 loaders, 24910 classes (1180 shared):
Non-Class: 5060 chunks, 113.14 MB capacity, 104.91 MB ( 93%) used, 7.92 MB ( 7%) free, 5.86 KB ( <1%) waste, 316.25 KB ( <1%) overhead, deallocated: 4874 blocks with 1.38 MB
Class: 2773 chunks, 19.04 MB capacity, 15.11 MB ( 79%) used, 3.77 MB ( 20%) free, 256 bytes ( <1%) waste, 173.31 KB ( <1%) overhead, deallocated: 1040 blocks with 412.14 KB
Both: 7833 chunks, 132.18 MB capacity, 120.01 MB ( 91%) used, 11.69 MB ( 9%) free, 6.11 KB ( <1%) waste, 489.56 KB ( <1%) overhead, deallocated: 5914 blocks with 1.78 MB
Virtual space:
Non-class space: 114.00 MB reserved, 113.60 MB (>99%) committed
Class space: 1.00 GB reserved, 19.25 MB ( 2%) committed
Both: 1.11 GB reserved, 132.85 MB ( 12%) committed
Chunk freelists:
Non-Class:
specialized chunks: 43, capacity 43.00 KB
small chunks: 92, capacity 368.00 KB
medium chunks: (none)
humongous chunks: (none)
Total: 135, capacity=411.00 KB
Class:
specialized chunks: 18, capacity 18.00 KB
small chunks: 64, capacity 128.00 KB
medium chunks: (none)
humongous chunks: (none)
Total: 82, capacity=146.00 KB
Waste (percentages refer to total committed size 132.85 MB):
Committed unused: 128.00 KB ( <1%)
Waste in chunks in use: 6.11 KB ( <1%)
Free in chunks in use: 11.69 MB ( 9%)
Overhead in chunks in use: 489.56 KB ( <1%)
In free chunks: 557.00 KB ( <1%)
Deallocated from chunks in use: 1.78 MB ( 1%) (5914 blocks)
-total-: 14.62 MB ( 11%)
MaxMetaspaceSize: unlimited
CompressedClassSpaceSize: 1.00 GB
InitialBootClassLoaderMetaspaceSize: 4.00 MB
Answer
The reason for the high values was an incorrect output of the Metric data received from Project kube-prometheus. After uninstalling the kube-projemet and installing instead the metric-server all data was display correctly using kubctl top. It shows now the same values as docker stats. I do not know why kube-prometheus did compute wrong data. In fact it was providing the double values for all memory data. I will investigate in this issue.

Related

JAXB unmarshalling a 2.7gb xml file

I have an Adobe Premiere Pro project file that I've unzipped so it is now an xml file detailing my film edit project.
The file is now 2.7gb...
I created an XSD schema from the xml file with Intellij's function for this, then again, using Intellij's JAXB tools, generated a package full of classes representing the xml hierarchy, from the .xsd.
My code for unmarshalling is as follows:
import mypackage.PremiereDataType;
import javax.xml.bind.JAXBContext;
import javax.xml.bind.JAXBElement;
import javax.xml.bind.JAXBException;
import javax.xml.bind.Unmarshaller;
import java.io.File;
public class Main {
public static void main(String[] args) throws JAXBException {
// write your code here
JAXBContext jaxbContext;
jaxbContext = JAXBContext.newInstance("mypackage");
Unmarshaller unmarshaller = jaxbContext.createUnmarshaller();
JAXBElement<PremiereDataType> premiereDataJAXB
= (JAXBElement<PremiereDataType>) unmarshaller.unmarshal(
new File("20200108_WLM_MW_v706_JGal_review_R2_markers_CONSOLIDATE.xml"));
System.out.println(premiereDataJAXB);
}
}
My machine has 32gb of Ram.
I'm not up to scratch on how to fine tune memory usage in Intellij but I suspect I'm running out of memory?
After fifteen minutes or so the program crashes with a EXCEPTION_ACCESS_VIOLATION.
# A fatal error has been detected by the Java Runtime Environment:
#
# EXCEPTION_ACCESS_VIOLATION (0xc0000005) at pc=0x00000000713e4f85, pid=27780, tid=0x000000000000394c
#
# JRE version: Java(TM) SE Runtime Environment (8.0_251-b08) (build 1.8.0_251-b08)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.251-b08 mixed mode windows-amd64 compressed oops)
# Problematic frame:
# V [jvm.dll+0x104f85]
#
# Failed to write core dump. Minidumps are not enabled by default on client versions of Windows
#
# An error report file with more information is saved as:
# C:\Users\malcolm\IdeaProjects\pprojConsolidator\hs_err_pid27780.log
#
# If you would like to submit a bug report, please visit:
# http://bugreport.java.com/bugreport/crash.jsp
#
Process finished with exit code 1
The log file for the crash is below (truncated so this question doesn't violate the character limit).
Would the lines indicating "object space" had reached "99% used" be an indication of the problem?
Can I increase a limit somewhere so there is more object space?
Any help getting a handle on this would be greatly appreciated!
Thanks!
# A fatal error has been detected by the Java Runtime Environment:
#
# EXCEPTION_ACCESS_VIOLATION (0xc0000005) at pc=0x00000000713e4f85, pid=27780, tid=0x000000000000394c
#
# JRE version: Java(TM) SE Runtime Environment (8.0_251-b08) (build 1.8.0_251-b08)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.251-b08 mixed mode windows-amd64 compressed oops)
# Problematic frame:
# V [jvm.dll+0x104f85]
#
# Failed to write core dump. Minidumps are not enabled by default on client versions of Windows
#
# If you would like to submit a bug report, please visit:
# http://bugreport.java.com/bugreport/crash.jsp
#
--------------- T H R E A D ---------------
Current thread (0x0000000002eec800): GCTaskThread [stack: 0x0000000015020000,0x0000000015120000] [id=14668]
siginfo: ExceptionCode=0xc0000005, reading address 0xffffffffffffffff
Registers:
RAX=0x000000001d4b0000, RBX=0x0000000071b0af90, RCX=0x0000000000000001, RDX=0x1fffffffe2399d81
RSP=0x000000001511f820, RBP=0x00000000292d0eb0, RSI=0xffffffffda000000, RDI=0x0000000000000001
R8 =0x007fffffff88e676, R9 =0x1fffffffffffffff, R10=0x00000000000024f7, R11=0x0912412248244900
R12=0x00000006138cec10, R13=0x000000000000000a, R14=0x00000006138cebd8, R15=0x000000000000000a
RIP=0x00000000713e4f85, EFLAGS=0x0000000000010202
Top of Stack: (sp=0x000000001511f820)
0x000000001511f820: 00000005c35048e8 0000000000000003
0x000000001511f830: 00000005c35048f0 00000000713f5269
0x000000001511f840: 00000006038cec10 00000000716dc62b
0x000000001511f850: 0000000071b0af90 ffffffffda000000
0x000000001511f860: 00000000292d0eb0 000000007141b6f3
0x000000001511f870: ffffffffda000000 00000000713ed2fc
0x000000001511f880: 00000006038cec10 00000006138cebec
0x000000001511f890: 00000006138cebe8 0000000000000004
0x000000001511f8a0: 00000006038cec10 000000007141b5ea
0x000000001511f8b0: 15c5cf35000024f7 00000000713f54a6
0x000000001511f8c0: 00000006138cebd8 00000006138cebd4
0x000000001511f8d0: 00000000292d0eb0 000000007141b6f3
0x000000001511f8e0: 0000000000000000 0000000002ee9d00
0x000000001511f8f0: 0000000002eceb90 0000000000000000
0x000000001511f900: 15c5cf35000024f7 00000000716dff80
0x000000001511f910: 00000000292d0eb0 15c5cf36000024f7
Instructions: (pc=0x00000000713e4f85)
0x00000000713e4f65: 48 89 74 24 10 57 48 83 ec 20 48 8b 41 20 4c 8b
0x00000000713e4f75: c2 0f b6 ca 49 c1 e8 06 80 e1 3f bf 01 00 00 00
0x00000000713e4f85: 4a 8b 1c c0 4a 8d 34 c0 48 d3 e7 48 8b c3 48 0b
0x00000000713e4f95: c7 48 3b c3 74 27 0f 1f 44 00 00 4c 8b c3 48 8b
Register to memory mapping:
RAX=0x000000001d4b0000 is an unknown value
RBX=0x0000000071b0af90 is an unknown value
RCX=0x0000000000000001 is an unknown value
RDX=0x1fffffffe2399d81 is an unknown value
RSP=0x000000001511f820 is an unknown value
RBP=0x00000000292d0eb0 is an unknown value
RSI=0xffffffffda000000 is an unknown value
RDI=0x0000000000000001 is an unknown value
R8 =0x007fffffff88e676 is an unknown value
R9 =0x1fffffffffffffff is an unknown value
R10=0x00000000000024f7 is an unknown value
R11=0x0912412248244900 is an unknown value
R12=0x00000006138cec10 is an oop
javax.xml.bind.JAXBElement
- klass: 'javax/xml/bind/JAXBElement'
R13=0x000000000000000a is an unknown value
R14=0x00000006138cebd8 is an oop
[Ljava.lang.Object;
- klass: 'java/lang/Object'[]
- length: 10
R15=0x000000000000000a is an unknown value
Stack: [0x0000000015020000,0x0000000015120000], sp=0x000000001511f820, free space=1022k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V [jvm.dll+0x104f85]
V [jvm.dll+0x3fc62b]
V [jvm.dll+0x10d2fc]
V [jvm.dll+0x13b5ea]
V [jvm.dll+0x3fff80]
V [jvm.dll+0x3fd8ff]
V [jvm.dll+0x3fad6e]
V [jvm.dll+0x2a001a]
C [msvcr100.dll+0x21d9f]
C [msvcr100.dll+0x21e3b]
C [KERNEL32.DLL+0x17bd4]
C [ntdll.dll+0x6ce51]
--------------- P R O C E S S ---------------
[..]
Other Threads:
0x0000000026ec6000 VMThread [stack: 0x000000002bfb0000,0x000000002c0b0000] [id=25136]
0x00000000295dc800 WatcherThread [stack: 0x000000002d7b0000,0x000000002d8b0000] [id=29304]
=>0x0000000002eec800 (exited) GCTaskThread [stack: 0x0000000015020000,0x0000000015120000] [id=14668]
VM state:at safepoint (normal execution)
VM Mutex/Monitor currently owned by a thread: ([mutex/lock_event])
[0x0000000002eb0d50] Threads_lock - owner thread: 0x0000000026ec6000
[0x0000000002eb0b50] Heap_lock - owner thread: 0x0000000002eb3800
heap address: 0x00000005c1c00000, size: 8164 MB, Compressed Oops mode: Zero based, Oop shift amount: 3
Narrow klass base: 0x0000000000000000, Narrow klass shift: 3
Compressed class space size: 1073741824 Address: 0x00000007c0000000
Heap:
PSYoungGen total 1679360K, used 750592K [0x0000000715f00000, 0x00000007ba300000, 0x00000007c0000000)
eden space 750592K, 100% used [0x0000000715f00000,0x0000000743c00000,0x0000000743c00000)
from space 928768K, 0% used [0x0000000743c00000,0x0000000743c00000,0x000000077c700000)
to space 928768K, 0% used [0x0000000781800000,0x0000000781800000,0x00000007ba300000)
ParOldGen total 5573632K, used 5573217K [0x00000005c1c00000, 0x0000000715f00000, 0x0000000715f00000)
object space 5573632K, 99% used [0x00000005c1c00000,0x0000000715e98680,0x0000000715f00000)
Metaspace used 14801K, capacity 14959K, committed 16000K, reserved 1062912K
class space used 1695K, capacity 1741K, committed 1920K, reserved 1048576K
Card table byte_map: [0x0000000012370000,0x0000000013370000] byte_map_base: 0x000000000f562000
Marking Bits: (ParMarkBitMap*) 0x0000000071b0af90
Begin Bits: [0x0000000015520000, 0x000000001d4b0000)
End Bits: [0x000000001d4b0000, 0x0000000025440000)
Polling page: 0x0000000000e50000
CodeCache: size=245760Kb used=12460Kb max_used=12807Kb free=233299Kb
bounds [0x0000000002fb0000, 0x0000000003c50000, 0x0000000011fb0000]
total_blobs=3495 nmethods=3202 adapters=201
compilation: enabled
Compilation events (10 events):
Event: 1014.137 Thread 0x00000000294db800 3322 4 java.lang.reflect.Constructor::newInstance (87 bytes)
Event: 1014.137 Thread 0x000000002953e000 3323 % ! 4 com.sun.xml.internal.bind.v2.runtime.unmarshaller.StructureLoader::startElement # 136 (433 bytes)
Event: 1014.137 Thread 0x0000000029541000 3324 4 sun.reflect.DelegatingConstructorAccessorImpl::newInstance (9 bytes)
Event: 1014.138 Thread 0x0000000029541000 nmethod 3324 0x0000000003b6c990 code [0x0000000003b6cac0, 0x0000000003b6cb58]
Event: 1014.138 Thread 0x000000002953f800 3325 ! 3 sun.reflect.GeneratedConstructorAccessor152::newInstance (49 bytes)
Event: 1014.138 Thread 0x00000000294d5800 3326 ! 4 com.sun.xml.internal.bind.v2.runtime.ClassBeanInfoImpl::createInstance (91 bytes)
Event: 1020.495 Thread 0x000000002953f800 nmethod 3325 0x000000000331e7d0 code [0x000000000331e9a0, 0x000000000331f018]
Event: 1026.864 Thread 0x00000000294db800 nmethod 3322 0x000000000339b3d0 code [0x000000000339b560, 0x000000000339b870]
Event: 1026.864 Thread 0x000000002953c000 3327 ! 3 sun.reflect.GeneratedConstructorAccessor153::newInstance (49 bytes)
Event: 1033.175 Thread 0x000000002953c000 nmethod 3327 0x000000000331d2d0 code [0x000000000331d480, 0x000000000331d928]
GC Heap History (10 events):
Event: 1064.811 GC heap after
Heap after GC invocations=180 (full 161):
PSYoungGen total 1679360K, used 750591K [0x0000000715f00000, 0x00000007ba300000, 0x00000007c0000000)
eden space 750592K, 99% used [0x0000000715f00000,0x0000000743bffc70,0x0000000743c00000)
from space 928768K, 0% used [0x0000000743c00000,0x0000000743c00000,0x000000077c700000)
to space 928768K, 0% used [0x0000000781800000,0x0000000781800000,0x00000007ba300000)
ParOldGen total 5573632K, used 5573217K [0x00000005c1c00000, 0x0000000715f00000, 0x0000000715f00000)
object space 5573632K, 99% used [0x00000005c1c00000,0x0000000715e98680,0x0000000715f00000)
Metaspace used 14801K, capacity 14959K, committed 16000K, reserved 1062912K
class space used 1695K, capacity 1741K, committed 1920K, reserved 1048576K
}
Event: 1064.811 GC heap before
{Heap before GC invocations=181 (full 162):
PSYoungGen total 1679360K, used 750592K [0x0000000715f00000, 0x00000007ba300000, 0x00000007c0000000)
eden space 750592K, 100% used [0x0000000715f00000,0x0000000743c00000,0x0000000743c00000)
from space 928768K, 0% used [0x0000000743c00000,0x0000000743c00000,0x000000077c700000)
to space 928768K, 0% used [0x0000000781800000,0x0000000781800000,0x00000007ba300000)
ParOldGen total 5573632K, used 5573217K [0x00000005c1c00000, 0x0000000715f00000, 0x0000000715f00000)
object space 5573632K, 99% used [0x00000005c1c00000,0x0000000715e98680,0x0000000715f00000)
Metaspace used 14801K, capacity 14959K, committed 16000K, reserved 1062912K
class space used 1695K, capacity 1741K, committed 1920K, reserved 1048576K
Event: 1071.135 GC heap after
Heap after GC invocations=181 (full 162):
PSYoungGen total 1679360K, used 750591K [0x0000000715f00000, 0x00000007ba300000, 0x00000007c0000000)
eden space 750592K, 99% used [0x0000000715f00000,0x0000000743bffe50,0x0000000743c00000)
from space 928768K, 0% used [0x0000000743c00000,0x0000000743c00000,0x000000077c700000)
to space 928768K, 0% used [0x0000000781800000,0x0000000781800000,0x00000007ba300000)
ParOldGen total 5573632K, used 5573217K [0x00000005c1c00000, 0x0000000715f00000, 0x0000000715f00000)
object space 5573632K, 99% used [0x00000005c1c00000,0x0000000715e98680,0x0000000715f00000)
Metaspace used 14801K, capacity 14959K, committed 16000K, reserved 1062912K
class space used 1695K, capacity 1741K, committed 1920K, reserved 1048576K
}
Event: 1071.135 GC heap before
{Heap before GC invocations=182 (full 163):
PSYoungGen total 1679360K, used 750592K [0x0000000715f00000, 0x00000007ba300000, 0x00000007c0000000)
eden space 750592K, 100% used [0x0000000715f00000,0x0000000743c00000,0x0000000743c00000)
from space 928768K, 0% used [0x0000000743c00000,0x0000000743c00000,0x000000077c700000)
to space 928768K, 0% used [0x0000000781800000,0x0000000781800000,0x00000007ba300000)
ParOldGen total 5573632K, used 5573217K [0x00000005c1c00000, 0x0000000715f00000, 0x0000000715f00000)
object space 5573632K, 99% used [0x00000005c1c00000,0x0000000715e98680,0x0000000715f00000)
Metaspace used 14801K, capacity 14959K, committed 16000K, reserved 1062912K
class space used 1695K, capacity 1741K, committed 1920K, reserved 1048576K
Event: 1077.405 GC heap after
Heap after GC invocations=182 (full 163):
PSYoungGen total 1679360K, used 750591K [0x0000000715f00000, 0x00000007ba300000, 0x00000007c0000000)
eden space 750592K, 99% used [0x0000000715f00000,0x0000000743bfff28,0x0000000743c00000)
from space 928768K, 0% used [0x0000000743c00000,0x0000000743c00000,0x000000077c700000)
to space 928768K, 0% used [0x0000000781800000,0x0000000781800000,0x00000007ba300000)
ParOldGen total 5573632K, used 5573217K [0x00000005c1c00000, 0x0000000715f00000, 0x0000000715f00000)
object space 5573632K, 99% used [0x00000005c1c00000,0x0000000715e98680,0x0000000715f00000)
Metaspace used 14801K, capacity 14959K, committed 16000K, reserved 1062912K
class space used 1695K, capacity 1741K, committed 1920K, reserved 1048576K
}
Event: 1077.406 GC heap before
{Heap before GC invocations=183 (full 164):
PSYoungGen total 1679360K, used 750592K [0x0000000715f00000, 0x00000007ba300000, 0x00000007c0000000)
eden space 750592K, 100% used [0x0000000715f00000,0x0000000743c00000,0x0000000743c00000)
from space 928768K, 0% used [0x0000000743c00000,0x0000000743c00000,0x000000077c700000)
to space 928768K, 0% used [0x0000000781800000,0x0000000781800000,0x00000007ba300000)
ParOldGen total 5573632K, used 5573217K [0x00000005c1c00000, 0x0000000715f00000, 0x0000000715f00000)
object space 5573632K, 99% used [0x00000005c1c00000,0x0000000715e98680,0x0000000715f00000)
Metaspace used 14801K, capacity 14959K, committed 16000K, reserved 1062912K
class space used 1695K, capacity 1741K, committed 1920K, reserved 1048576K
Event: 1083.711 GC heap after
Heap after GC invocations=183 (full 164):
PSYoungGen total 1679360K, used 750591K [0x0000000715f00000, 0x00000007ba300000, 0x00000007c0000000)
eden space 750592K, 99% used [0x0000000715f00000,0x0000000743bfff90,0x0000000743c00000)
from space 928768K, 0% used [0x0000000743c00000,0x0000000743c00000,0x000000077c700000)
to space 928768K, 0% used [0x0000000781800000,0x0000000781800000,0x00000007ba300000)
ParOldGen total 5573632K, used 5573217K [0x00000005c1c00000, 0x0000000715f00000, 0x0000000715f00000)
object space 5573632K, 99% used [0x00000005c1c00000,0x0000000715e98680,0x0000000715f00000)
Metaspace used 14801K, capacity 14959K, committed 16000K, reserved 1062912K
class space used 1695K, capacity 1741K, committed 1920K, reserved 1048576K
}
Event: 1083.711 GC heap before
{Heap before GC invocations=184 (full 165):
PSYoungGen total 1679360K, used 750592K [0x0000000715f00000, 0x00000007ba300000, 0x00000007c0000000)
eden space 750592K, 100% used [0x0000000715f00000,0x0000000743c00000,0x0000000743c00000)
from space 928768K, 0% used [0x0000000743c00000,0x0000000743c00000,0x000000077c700000)
to space 928768K, 0% used [0x0000000781800000,0x0000000781800000,0x00000007ba300000)
ParOldGen total 5573632K, used 5573217K [0x00000005c1c00000, 0x0000000715f00000, 0x0000000715f00000)
object space 5573632K, 99% used [0x00000005c1c00000,0x0000000715e98680,0x0000000715f00000)
Metaspace used 14801K, capacity 14959K, committed 16000K, reserved 1062912K
class space used 1695K, capacity 1741K, committed 1920K, reserved 1048576K
Event: 1090.012 GC heap after
Heap after GC invocations=184 (full 165):
PSYoungGen total 1679360K, used 750591K [0x0000000715f00000, 0x00000007ba300000, 0x00000007c0000000)
eden space 750592K, 99% used [0x0000000715f00000,0x0000000743bfffc8,0x0000000743c00000)
from space 928768K, 0% used [0x0000000743c00000,0x0000000743c00000,0x000000077c700000)
to space 928768K, 0% used [0x0000000781800000,0x0000000781800000,0x00000007ba300000)
ParOldGen total 5573632K, used 5573217K [0x00000005c1c00000, 0x0000000715f00000, 0x0000000715f00000)
object space 5573632K, 99% used [0x00000005c1c00000,0x0000000715e98680,0x0000000715f00000)
Metaspace used 14801K, capacity 14959K, committed 16000K, reserved 1062912K
class space used 1695K, capacity 1741K, committed 1920K, reserved 1048576K
}
Event: 1090.013 GC heap before
{Heap before GC invocations=185 (full 166):
PSYoungGen total 1679360K, used 750592K [0x0000000715f00000, 0x00000007ba300000, 0x00000007c0000000)
eden space 750592K, 100% used [0x0000000715f00000,0x0000000743c00000,0x0000000743c00000)
from space 928768K, 0% used [0x0000000743c00000,0x0000000743c00000,0x000000077c700000)
to space 928768K, 0% used [0x0000000781800000,0x0000000781800000,0x00000007ba300000)
ParOldGen total 5573632K, used 5573217K [0x00000005c1c00000, 0x0000000715f00000, 0x0000000715f00000)
object space 5573632K, 99% used [0x00000005c1c00000,0x0000000715e98680,0x0000000715f00000)
Metaspace used 14801K, capacity 14959K, committed 16000K, reserved 1062912K
class space used 1695K, capacity 1741K, committed 1920K, reserved 1048576K
Deoptimization events (10 events):
Event: 1.946 Thread 0x0000000002eb3800 Uncommon trap: reason=unstable_if action=reinterpret pc=0x0000000003927838 method=com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallingContext$State.<init>(Lcom/sun/xml/internal/bind/v2/runtime/unmarshaller/UnmarshallingContext;Lcom/sun/xml/inte
Event: 1.946 Thread 0x0000000002eb3800 Uncommon trap: reason=unstable_if action=reinterpret pc=0x0000000003945470 method=com.sun.org.apache.xerces.internal.util.SymbolTable.addSymbol([CII)Ljava/lang/String; # 64
Event: 1.947 Thread 0x0000000002eb3800 Uncommon trap: reason=unstable_if action=reinterpret pc=0x00000000039038e8 method=com.sun.org.apache.xerces.internal.util.SymbolTable.addSymbol([CII)Ljava/lang/String; # 64
Event: 2.090 Thread 0x0000000002eb3800 Uncommon trap: reason=unstable_if action=reinterpret pc=0x0000000003922ef4 method=com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.read([CII)I # 131
Event: 2.131 Thread 0x0000000002eb3800 Uncommon trap: reason=unstable_if action=reinterpret pc=0x0000000003ac741c method=com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next()I # 1171
Event: 2.496 Thread 0x0000000002eb3800 Uncommon trap: reason=unstable_if action=reinterpret pc=0x0000000003b44128 method=java.io.BufferedInputStream.read1([BII)I # 22
Event: 2.497 Thread 0x0000000002eb3800 Uncommon trap: reason=unstable_if action=reinterpret pc=0x0000000003a847e8 method=java.io.BufferedInputStream.read1([BII)I # 22
Event: 2.801 Thread 0x0000000002eb3800 Uncommon trap: reason=null_check action=make_not_entrant pc=0x00000000039ba58c method=com.sun.xml.internal.bind.v2.ClassFactory.create0(Ljava/lang/Class;)Ljava/lang/Object; # 31
Event: 2.802 Thread 0x0000000002eb3800 Uncommon trap: reason=null_check action=make_not_entrant pc=0x0000000003984854 method=com.sun.xml.internal.bind.v2.ClassFactory.create0(Ljava/lang/Class;)Ljava/lang/Object; # 31
Event: 2.802 Thread 0x0000000002eb3800 Uncommon trap: reason=null_check action=make_not_entrant pc=0x000000000397c384 method=com.sun.xml.internal.bind.v2.ClassFactory.create0(Ljava/lang/Class;)Ljava/lang/Object; # 31
Classes redefined (0 events):
No events
Internal exceptions (10 events):
Event: 0.407 Thread 0x0000000002eb3800 Exception <a 'java/lang/ArrayIndexOutOfBoundsException'> (0x00000007186c7798) thrown at [C:\jenkins\workspace\8-2-build-windows-amd64-cygwin\jdk8u251\737\hotspot\src\share\vm\runtime\sharedRuntime.cpp, line 605]
Event: 0.423 Thread 0x0000000002eb3800 Exception <a 'java/lang/ArrayIndexOutOfBoundsException': 42> (0x00000007187e0ad8) thrown at [C:\jenkins\workspace\8-2-build-windows-amd64-cygwin\jdk8u251\737\hotspot\src\share\vm\interpreter\interpreterRuntime.cpp, line 368]
Event: 0.461 Thread 0x0000000002eb3800 Exception <a 'java/lang/ClassCastException': sun.reflect.generics.reflectiveObjects.WildcardTypeImpl cannot be cast to java.lang.Class> (0x00000007194c3120) thrown at [C:\jenkins\workspace\8-2-build-windows-amd64-cygwin\jdk8u251\737\hotspot\src\share\vm\i
Event: 1.028 Thread 0x0000000029452800 Implicit null exception at 0x00000000034196fd to 0x0000000003419b59
Event: 1.562 Thread 0x0000000002eb3800 Implicit null exception at 0x000000000360b3e1 to 0x000000000360b711
Event: 1.562 Thread 0x0000000002eb3800 Implicit null exception at 0x00000000037f0c52 to 0x00000000037f2085
Event: 1.639 Thread 0x0000000002eb3800 Implicit null exception at 0x00000000033a305e to 0x00000000033a3219
Event: 2.801 Thread 0x0000000002eb3800 Implicit null exception at 0x00000000039b7f51 to 0x00000000039ba551
Event: 2.802 Thread 0x0000000002eb3800 Implicit null exception at 0x00000000039839be to 0x0000000003984831
Event: 2.802 Thread 0x0000000002eb3800 Implicit null exception at 0x000000000397b78e to 0x000000000397c379
Events (10 events):
Event: 1064.811 Executing VM operation: ParallelGCFailedAllocation done
Event: 1064.811 Executing VM operation: ParallelGCFailedAllocation
Event: 1071.135 Executing VM operation: ParallelGCFailedAllocation done
Event: 1071.135 Executing VM operation: ParallelGCFailedAllocation
Event: 1077.405 Executing VM operation: ParallelGCFailedAllocation done
Event: 1077.405 Executing VM operation: ParallelGCFailedAllocation
Event: 1083.711 Executing VM operation: ParallelGCFailedAllocation done
Event: 1083.711 Executing VM operation: ParallelGCFailedAllocation
Event: 1090.012 Executing VM operation: ParallelGCFailedAllocation done
Event: 1090.012 Executing VM operation: ParallelGCFailedAllocation
Dynamic libraries:
0x00007ff764880000 - 0x00007ff7648b7000 C:\Program Files\Java\jdk1.8.0_251\bin\java.exe
0x00007ffb76aa0000 - 0x00007ffb76c90000 C:\WINDOWS\SYSTEM32\ntdll.dll
0x00007ffb12250000 - 0x00007ffb12263000 C:\Program Files\AVAST Software\Avast\aswhook.dll
0x00007ffb75370000 - 0x00007ffb75422000 C:\WINDOWS\System32\KERNEL32.DLL
0x00007ffb74820000 - 0x00007ffb74ac4000 C:\WINDOWS\System32\KERNELBASE.dll
0x00007ffb75d20000 - 0x00007ffb75dc3000 C:\WINDOWS\System32\ADVAPI32.dll
0x00007ffb767b0000 - 0x00007ffb7684e000 C:\WINDOWS\System32\msvcrt.dll
0x00007ffb766d0000 - 0x00007ffb76767000 C:\WINDOWS\System32\sechost.dll
0x00007ffb755f0000 - 0x00007ffb75710000 C:\WINDOWS\System32\RPCRT4.dll
0x00007ffb76850000 - 0x00007ffb769e4000 C:\WINDOWS\System32\USER32.dll
0x00007ffb746a0000 - 0x00007ffb746c1000 C:\WINDOWS\System32\win32u.dll
0x00007ffb76780000 - 0x00007ffb767a6000 C:\WINDOWS\System32\GDI32.dll
0x00007ffb741d0000 - 0x00007ffb74364000 C:\WINDOWS\System32\gdi32full.dll
0x00007ffb745b0000 - 0x00007ffb7464e000 C:\WINDOWS\System32\msvcp_win.dll
0x00007ffb744b0000 - 0x00007ffb745aa000 C:\WINDOWS\System32\ucrtbase.dll
0x00007ffb69610000 - 0x00007ffb69894000 C:\WINDOWS\WinSxS\amd64_microsoft.windows.common-controls_6595b64144ccf1df_6.0.18362.836_none_e6c4b943130f18ed\COMCTL32.dll
0x00007ffb75dd0000 - 0x00007ffb76106000 C:\WINDOWS\System32\combase.dll
0x00007ffb74ad0000 - 0x00007ffb74b50000 C:\WINDOWS\System32\bcryptPrimitives.dll
0x00007ffb75710000 - 0x00007ffb7573e000 C:\WINDOWS\System32\IMM32.DLL
0x0000000071b90000 - 0x0000000071c62000 C:\Program Files\Java\jdk1.8.0_251\jre\bin\msvcr100.dll
0x00000000712e0000 - 0x0000000071b8b000 C:\Program Files\Java\jdk1.8.0_251\jre\bin\server\jvm.dll
0x00007ffb76770000 - 0x00007ffb76778000 C:\WINDOWS\System32\PSAPI.DLL
0x00007ffb63260000 - 0x00007ffb63269000 C:\WINDOWS\SYSTEM32\WSOCK32.dll
0x00007ffb769f0000 - 0x00007ffb76a5f000 C:\WINDOWS\System32\WS2_32.dll
0x00007ffb704a0000 - 0x00007ffb704c4000 C:\WINDOWS\SYSTEM32\WINMM.dll
0x00007ffb6cc90000 - 0x00007ffb6cc9a000 C:\WINDOWS\SYSTEM32\VERSION.dll
0x00007ffb70470000 - 0x00007ffb7049d000 C:\WINDOWS\SYSTEM32\WINMMBASE.dll
0x00007ffb74650000 - 0x00007ffb7469a000 C:\WINDOWS\System32\cfgmgr32.dll
0x00007ffb6ac90000 - 0x00007ffb6ac9f000 C:\Program Files\Java\jdk1.8.0_251\jre\bin\verify.dll
0x00007ffb686d0000 - 0x00007ffb686f9000 C:\Program Files\Java\jdk1.8.0_251\jre\bin\java.dll
0x00007ffb684e0000 - 0x00007ffb68503000 C:\Program Files\Java\jdk1.8.0_251\jre\bin\instrument.dll
0x00007ffb686b0000 - 0x00007ffb686c6000 C:\Program Files\Java\jdk1.8.0_251\jre\bin\zip.dll
0x00007ffb74c70000 - 0x00007ffb75354000 C:\WINDOWS\System32\SHELL32.dll
0x00007ffb75980000 - 0x00007ffb75a29000 C:\WINDOWS\System32\shcore.dll
0x00007ffb73a30000 - 0x00007ffb741ae000 C:\WINDOWS\System32\windows.storage.dll
0x00007ffb73a00000 - 0x00007ffb73a23000 C:\WINDOWS\System32\profapi.dll
0x00007ffb73970000 - 0x00007ffb739ba000 C:\WINDOWS\System32\powrprof.dll
0x00007ffb73960000 - 0x00007ffb73970000 C:\WINDOWS\System32\UMPDC.dll
0x00007ffb74b50000 - 0x00007ffb74ba2000 C:\WINDOWS\System32\shlwapi.dll
0x00007ffb739e0000 - 0x00007ffb739f1000 C:\WINDOWS\System32\kernel.appcore.dll
0x00007ffb741b0000 - 0x00007ffb741c7000 C:\WINDOWS\System32\cryptsp.dll
0x00007ffb67c40000 - 0x00007ffb67c5a000 C:\Users\malcolm\AppData\Local\JetBrains\IntelliJ IDEA 2020.1.2\bin\breakgen64.dll
0x00007ffb435e0000 - 0x00007ffb435fa000 C:\Program Files\Java\jdk1.8.0_251\jre\bin\net.dll
0x00007ffb731d0000 - 0x00007ffb73237000 C:\WINDOWS\system32\mswsock.dll
0x00007ffb6bb20000 - 0x00007ffb6bd14000 C:\WINDOWS\SYSTEM32\dbghelp.dll
VM Arguments:
jvm_args: -javaagent:C:\Users\malcolm\AppData\Local\JetBrains\IntelliJ IDEA 2020.1.2\lib\idea_rt.jar=61834:C:\Users\malcolm\AppData\Local\JetBrains\IntelliJ IDEA 2020.1.2\bin -Dfile.encoding=UTF-8
java_command: com.qhf.Main
java_class_path (initial): C:\Program Files\Java\jdk1.8.0_251\jre\lib\charsets.jar;C:\Program Files\Java\jdk1.8.0_251\jre\lib\deploy.jar;C:\Program Files\Java\jdk1.8.0_251\jre\lib\ext\access-bridge-64.jar;C:\Program Files\Java\jdk1.8.0_251\jre\lib\ext\cldrdata.jar;C:\Program Files\Java\jdk1.8.0_251\jre\lib\ext\dnsns.jar;C:\Program Files\Java\jdk1.8.0_251\jre\lib\ext\jaccess.jar;C:\Program Files\Java\jdk1.8.0_251\jre\lib\ext\jfxrt.jar;C:\Program Files\Java\jdk1.8.0_251\jre\lib\ext\localedata.jar;C:\Program Files\Java\jdk1.8.0_251\jre\lib\ext\nashorn.jar;C:\Program Files\Java\jdk1.8.0_251\jre\lib\ext\sunec.jar;C:\Program Files\Java\jdk1.8.0_251\jre\lib\ext\sunjce_provider.jar;C:\Program Files\Java\jdk1.8.0_251\jre\lib\ext\sunmscapi.jar;C:\Program Files\Java\jdk1.8.0_251\jre\lib\ext\sunpkcs11.jar;C:\Program Files\Java\jdk1.8.0_251\jre\lib\ext\zipfs.jar;C:\Program Files\Java\jdk1.8.0_251\jre\lib\javaws.jar;C:\Program Files\Java\jdk1.8.0_251\jre\lib\jce.jar;C:\Program Files\Java\jdk1.8.0_251\jre\lib\jfr.jar;C:\Program Files\Java\jdk1.8.0_251\jre\lib\jfxswt.jar;C:\Program Files\Java\jdk1.8.0_251\jre\lib\jsse.jar;C:\Program Files\Java\jdk1.8.0_251\jre\lib\management-agent.jar;C:\Program Files\Java\jdk1.8.0_251\jre\lib\plugin.jar;C:\Program Files\Java\jdk1.8.0_251\jre\lib\resources.jar;C:\Program Files\Java\jdk1.8.0_251\jre\lib\rt.jar;C:\Users\malcolm\IdeaProjects\pprojConsolidator\out\production\pprojConsolidator;C:\Users\malcolm\AppData\Local\JetBrains\IntelliJ IDEA 2020.1.2\lib\idea_rt.jar
Launcher Type: SUN_STANDARD
Environment Variables:
PATH=C:\Program Files (x86)\Common Files\Oracle\Java\javapath;C:\Program Files (x86)\Common Files\Intel\Shared Libraries\redist\intel64\compiler;C:\ProgramData\Oracle\Java\javapath;;C:\Program Files (x86)\HighPoint Technologies, Inc\HighPoint RocketStor Manager\Service;C:\WINDOWS\system32;C:\WINDOWS;C:\WINDOWS\System32\Wbem;C:\WINDOWS\System32\WindowsPowerShell\v1.0\;C:\WINDOWS\System32\OpenSSH\;C:\Program Files (x86)\Windows Kits\10\Windows Performance Toolkit\;C:\Program Files (x86)\Windows Kits\10\Microsoft Application Virtualization\Sequencer\;C:\Program Files\NVIDIA Corporation\NVIDIA NvDLISR;C:\Program Files (x86)\NVIDIA Corporation\PhysX\Common;C:\WINDOWS\system32;C:\WINDOWS;C:\WINDOWS\System32\Wbem;C:\WINDOWS\System32\WindowsPowerShell\v1.0\;C:\WINDOWS\System32\OpenSSH\;C:\Users\malcolm\AppData\Local\Programs\Python\Python36\Scripts\;C:\Users\malcolm\AppData\Local\Programs\Python\Python36\;C:\Users\malcolm\AppData\Local\Microsoft\WindowsApps;C:\FFmpeg\bin;;C:\Users\malcolm\AppData\Local\Programs\Microsoft VS Code\bin;C:\Users\malcolm\AppData\Local\JetBrains\IntelliJ IDEA 2020.1.2\bin;
USERNAME=malcolm
OS=Windows_NT
PROCESSOR_IDENTIFIER=AMD64 Family 23 Model 1 Stepping 1, AuthenticAMD
[...]
That's a large file. Set VM startup settings -Xms and -Xmx large as possible.
It is a crash rather than OutOfMemoryException so it might be worth trying JDK9+. Then also the String tables are smaller (if you have latin chars) and that might also give better chance of unmarshalling the XML. If you are stuck, using XML SAX handlers may allow you to extract it in sub-sections but with a lot more coding effort.
I switched to Java 13 and jumped through the hoops to get JAXB working with that release.
No more EXCEPTION_ACCESS_VIOLATION!

How do I decide on a suitable TLABSIZE setting for a Java application?

My Java application on an single cpu arm7 (32bit) device using Java 14 is occasionally crashing
after running under load for a number of hours, and is always failing in ThreadLocalAllocBuffer::resize()
A fatal error has been detected by the Java Runtime Environment:
#
SIGSEGV (0xb) at pc=0xb6cd515e, pid=1725, tid=1733
#
JRE version: OpenJDK Runtime Environment (14.0+36) (build 14+36)
Java VM: OpenJDK Client VM (14+36, mixed mode, serial gc, linux-arm)
Problematic frame:
V
#
No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
If you would like to submit a bug report, please visit:
https://bugreport.java.com/bugreport/crash.jsp
#
--------------- S U M M A R Y ------------
Command Line: -Duser.home=/mnt/app/share/log -Djdk.lang.Process.launchMechanism=vfork -Xms150m -Xmx900m -Dcom.mchange.v2.log.MLog=com.mchange.v2.log.jdk14logging.Jdk14MLog -Dorg.jboss.logging.provider=jdk -Djava.util.logging.config.class=com.jthink.songkong.logging.StandardLogging --add-opens=java.base/java.lang=ALL-UNNAMED lib/songkong-6.9.jar -r
Host: Marvell PJ4Bv7 Processor rev 1 (v7l), 1 cores, 1G, Buildroot 2014.11-rc1
Time: Fri Apr 24 19:36:54 2020 BST elapsed time: 37456 seconds (0d 10h 24m 16s)
--------------- T H R E A D ---------------
Current thread (0xb6582a30): VMThread "VM Thread" [stack: 0x7b716000,0x7b796000] [id=3625] _threads_hazard_ptr=0x7742f140
Stack: [0x7b716000,0x7b796000], sp=0x7b7946b0, free space=505k
Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code)
V [libjvm.so+0x48015e] ThreadLocalAllocBuffer::resize()+0x85
[error occurred during error reporting (printing native stack), id 0xb, SIGSEGV (0xb) at pc=0xb6b4ccae]
Now this must surely be bug in JVM, but as its not one of the standard Java platforms and I dont have a simple test case I cannot see it getting fixed anytime soon, so I am trying to workaround it. Its also worth noting that it crashed with ThreadLocalAllocBuffer::accumulate_statistics_before_gc() when I used Java 11 which is why I moved to Java 14 to try and resolve the issue.
As the the issue is with TLABs one solution is to disable TLABS with -XX:-UseTLAB but that makes the code run slower on an already slow machine.
So I think another solution is to disable resizing with -XX:-ResizeTLAB, but then I need to know work out a suitable size and specify that using -XX:TLABSize=N. But I am not sure what N actually represents and what would be a suitable size to set
I tried setting -XX:TLABSize=1000000 which seems to me to be quite large ?
I have some logging set with
-Xlog:tlab*=debug,tlab*=trace:file=gc.log:time:filecount=7,filesize=8M
but I don't really understand the output.
[2020-05-19T15:43:43.836+0100] ThreadLocalAllocBuffer::compute_size(132) returns 250132
[2020-05-19T15:43:43.837+0100] TLAB: fill thread: 0x0026d548 [id: 871] desired_size: 976KB slow allocs: 0 refill waste: 15624B alloc: 0.25725 1606KB refills: 1 waste 0.0% gc: 0B slow: 0B fast: 0B
[2020-05-19T15:43:43.853+0100] ThreadLocalAllocBuffer::compute_size(6) returns 250006
[2020-05-19T15:43:43.854+0100] TLAB: fill thread: 0xb669be48 [id: 32635] desired_size: 976KB slow allocs: 0 refill waste: 15624B alloc: 0.00002 0KB refills: 1 waste 0.0% gc: 0B slow: 0B fast: 0B
[2020-05-19T15:43:43.910+0100] ThreadLocalAllocBuffer::compute_size(4) returns 250004
[2020-05-19T15:43:43.911+0100] TLAB: fill thread: 0x76c1d6f8 [id: 917] desired_size: 976KB slow allocs: 0 refill waste: 15624B alloc: 0.91261 8085KB refills: 1 waste 0.0% gc: 0B slow: 0B fast: 0B
[2020-05-19T15:43:43.962+0100] ThreadLocalAllocBuffer::compute_size(2052) returns 252052
[2020-05-19T15:43:43.962+0100] TLAB: fill thread: 0x76e06f10 [id: 534] desired_size: 976KB slow allocs: 4 refill waste: 15688B alloc: 0.13977 1612KB refills: 2 waste 0.2% gc: 0B slow: 4520B fast: 0B
[2020-05-19T15:43:43.982+0100] ThreadLocalAllocBuffer::compute_size(28878) returns 278878
[2020-05-19T15:43:43.983+0100] TLAB: fill thread: 0x76e06f10 [id: 534] desired_size: 976KB slow allocs: 4 refill waste: 15624B alloc: 0.13977 1764KB refills: 3 waste 0.3% gc: 0B slow: 10424B fast: 0B
[2020-05-19T15:43:44.023+0100] ThreadLocalAllocBuffer::compute_size(4) returns 250004
[2020-05-19T15:43:44.023+0100] TLAB: fill thread: 0x7991df20 [id: 32696] desired_size: 976KB slow allocs: 0 refill waste: 15624B alloc: 0.00132 19KB refills: 1 waste 0.0% gc: 0B slow: 0B fast: 0B
Update
I reran with -XX:+HeapDumpOnOutOfMemoryError option added, and this time it showed:
java.lang.OutOfMemoryError: Java heap space
Dumping heap to java_pid1600.hprof ...
but then the dump itself failed with
#
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0xb6a81b9a, pid=1600, tid=1606
#
# JRE version: OpenJDK Runtime Environment (14.0+36) (build 14+36)
# Java VM: OpenJDK Client VM (14+36, mixed mode, serial gc, linux-arm)
# Problematic frame:
# V [libjvm.so+0x22eb9a] DumperSupport::dump_field_value(DumpWriter*, char, oopDesc*, int)+0x91
#
# No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# /mnt/system/config/Apps/SongKong/songkong/hs_err_pid1600.log
#
# If you would like to submit a bug report, please visit:
# https://bugreport.java.com/bugreport/crash.jsp
I am not clear if the dump failed because of ulimit or soemthing else, but
java_pid1600.hprof was created but was empty
I was also monitoring the process with jstat -gc, and jstat -gcutil. I paste the end of the putput here, to me it does not look like there was a particular memory problem before the crash, although I am only checking every 5 seconds so maybe that is the issue ?
[root#N1-0247 bin]# ./jstat -gc 1600 5s
S0C S1C S0U S1U EC EU OC OU MC MU CCSC CCSU YGC YGCT FGC FGCT CGC CGCT GCT
........
30720.0 30720.0 0.0 0.0 245760.0 236647.2 614400.0 494429.2 50136.0 49436.9 0.0 0.0 5084 3042.643 155 745.523 - - 3788.166
30720.0 30720.0 0.0 28806.1 245760.0 244460.2 614400.0 506541.7 50136.0 49436.9 0.0 0.0 5085 3043.887 156 745.523 - - 3789.410
30720.0 30720.0 28760.4 0.0 245760.0 245760.0 614400.0 514809.7 50136.0 49437.2 0.0 0.0 5086 3044.895 157 751.204 - - 3796.098
30720.0 30720.0 0.0 231.1 245760.0 234781.8 614400.0 514809.7 50136.0 49437.2 0.0 0.0 5087 3044.895 157 755.042 - - 3799.936
30720.0 30720.0 0.0 0.0 245760.0 190385.5 614400.0 519650.7 50136.0 49449.6 0.0 0.0 5087 3045.905 159 758.890 - - 3804.795
30720.0 30720.0 0.0 0.0 245760.0 190385.5 614400.0 519650.7 50136.0 49449.6 0.0 0.0 5087 3045.905 159 758.890 - - 3804.795
[root#N1-0247 bin]# ./jstat -gc 1600 5s
S0 S1 E O M CCS YGC YGCT FGC FGCT CGC CGCT GCT
..............
99.70 0.00 100.00 75.54 98.56 - 5080 3037.321 150 724.674 - - 3761.995
0.00 29.93 99.30 75.55 98.56 - 5081 3038.403 151 728.584 - - 3766.987
0.00 100.00 99.30 75.94 98.56 - 5081 3039.405 152 728.584 - - 3767.989
100.00 0.00 99.14 76.14 98.56 - 5082 3040.366 153 734.088 - - 3774.454
0.00 96.58 99.87 78.50 98.57 - 5083 3041.366 154 737.960 - - 3779.325
56.99 0.00 100.00 78.50 98.58 - 5084 3041.366 154 741.880 - - 3783.246
0.00 0.00 96.29 80.47 98.61 - 5084 3042.643 155 745.523 - - 3788.166
0.00 93.77 99.47 82.44 98.61 - 5085 3043.887 156 745.523 - - 3789.410
93.62 0.00 100.00 83.79 98.61 - 5086 3044.895 157 751.204 - - 3796.098
0.00 0.76 95.53 83.79 98.61 - 5087 3044.895 157 755.042 - - 3799.936
0.00 0.00 77.47 84.58 98.63 - 5087 3045.905 159 758.890 - - 3804.795
0.00 0.00 77.47 84.58 98.63 - 5087 3045.905 159 758.890 - - 3804.795
Update Latest run
Configured gclogging, i get many
Pause Young (Allocation Failure)
errors, does this indicate I need to make the eden space larger?
[2020-05-29T14:00:22.668+0100] GC(44) Pause Young (GCLocker Initiated GC)
[2020-05-29T14:00:22.739+0100] GC(44) DefNew: 43230K(46208K)->4507K(46208K) Eden: 41088K(41088K)->0K(41088K) From: 2142K(5120K)->4507K(5120K)
[2020-05-29T14:00:22.739+0100] GC(44) Tenured: 50532K(102400K)->50532K(102400K)
[2020-05-29T14:00:22.740+0100] GC(44) Metaspace: 40054K(40536K)->40054K(40536K)
[2020-05-29T14:00:22.740+0100] GC(44) Pause Young (GCLocker Initiated GC) 91M->53M(145M) 72.532ms
[2020-05-29T14:00:22.741+0100] GC(44) User=0.07s Sys=0.00s Real=0.07s
[2020-05-29T14:00:25.196+0100] GC(45) Pause Young (Allocation Failure)
[2020-05-29T14:00:25.306+0100] GC(45) DefNew: 45595K(46208K)->2150K(46208K) Eden: 41088K(41088K)->0K(41088K) From: 4507K(5120K)->2150K(5120K)
[2020-05-29T14:00:25.306+0100] GC(45) Tenured: 50532K(102400K)->53861K(102400K)
[2020-05-29T14:00:25.307+0100] GC(45) Metaspace: 40177K(40664K)->40177K(40664K)
[2020-05-29T14:00:25.307+0100] GC(45) Pause Young (Allocation Failure) 93M->54M(145M) 111.252ms
[2020-05-29T14:00:25.308+0100] GC(45) User=0.08s Sys=0.02s Real=0.11s
[2020-05-29T14:00:29.248+0100] GC(46) Pause Young (Allocation Failure)
[2020-05-29T14:00:29.404+0100] GC(46) DefNew: 43238K(46208K)->4318K(46208K) Eden: 41088K(41088K)->0K(41088K) From: 2150K(5120K)->4318K(5120K)
[2020-05-29T14:00:29.405+0100] GC(46) Tenured: 53861K(102400K)->53861K(102400K)
[2020-05-29T14:00:29.405+0100] GC(46) Metaspace: 40319K(40792K)->40319K(40792K)
[2020-05-29T14:00:29.406+0100] GC(46) Pause Young (Allocation Failure) 94M->56M(145M) 157.614ms
[2020-05-29T14:00:29.406+0100] GC(46) User=0.07s Sys=0.00s Real=0.16s
[2020-05-29T14:00:36.466+0100] GC(47) Pause Young (Allocation Failure)
[2020-05-29T14:00:36.661+0100] GC(47) DefNew: 45406K(46208K)->5120K(46208K) Eden: 41088K(41088K)->0K(41088K) From: 4318K(5120K)->5120K(5120K)
[2020-05-29T14:00:36.662+0100] GC(47) Tenured: 53861K(102400K)->55125K(102400K)
[2020-05-29T14:00:36.662+0100] GC(47) Metaspace: 40397K(40920K)->40397K(40920K)
[2020-05-29T14:00:36.663+0100] GC(47) Pause Young (Allocation Failure) 96M->58M(145M) 196.531ms
[2020-05-29T14:00:36.663+0100] GC(47) User=0.09s Sys=0.01s Real=0.19s
[2020-05-29T14:00:40.523+0100] GC(48) Pause Young (Allocation Failure)
[2020-05-29T14:00:40.653+0100] GC(48) DefNew: 44274K(46208K)->2300K(46208K) Eden: 39154K(41088K)->0K(41088K) From: 5120K(5120K)->2300K(5120K)
[2020-05-29T14:00:40.653+0100] GC(48) Tenured: 55125K(102400K)->59965K(102400K)
[2020-05-29T14:00:40.654+0100] GC(48) Metaspace: 40530K(41048K)->40530K(41048K)
[2020-05-29T14:00:40.654+0100] GC(48) Pause Young (Allocation Failure) 97M->60M(145M) 131.365ms
[2020-05-29T14:00:40.655+0100] GC(48) User=0.11s Sys=0.01s Real=0.14s
[2020-05-29T14:00:43.936+0100] GC(49) Pause Young (Allocation Failure)
[2020-05-29T14:00:44.100+0100] GC(49) DefNew: 43388K(46208K)->5120K(46208K) Eden: 41088K(41088K)->0K(41088K) From: 2300K(5120K)->5120K(5120K)
Updated with gc analysis done by gceasy
Okay so this is useful I uploaded log to gceasy.org and it clearly shows that shortly before it crashed heap size was significantly higher and approaching the 900mb limit,even after a number of full gcs, so I think basically it ran out of heap space.
What is a little frustrating is I have the
-XX:+HeapDumpOnOutOfMemoryError
option enabled, but when it crashes it reports an issue trying to do create the dump file so I cannot get one.
And when I process the same file on Windows with the same setting for heap size it suceeds without failure, But Im goinf to run again ewith gclogging enabled and see if it reaches simailr levels even if it doesnt actually fall over.
Ran again (this is building on chnages made in previous run and doesnt show start of run) but to me the memory usage is higher but looks quite normal (sawtooth pattern) with no particular differenc ebefore the crash.
Update
With last run I reduced max heap from 900MB to 600MB, but I also monitored with vmstat, Yo can see clearly below where the applciation crashed but It doesn't seem we were approaching particularly ow memory at this point.
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
3 0 0 57072 7812 1174128 0 0 5360 0 211 558 96 4 0 0 0
1 0 0 55220 7812 1176184 0 0 2048 0 203 467 79 21 0 0 0
3 0 0 61296 7812 1169096 0 0 2036 44 193 520 96 4 0 0 0
2 0 0 59808 7812 1171144 0 0 2048 32 212 522 96 4 0 0 0
1 0 0 59436 7812 1171144 0 0 0 0 180 307 83 17 0 0 0
1 0 0 59436 7812 1171144 0 0 0 0 179 173 100 0 0 0 0
1 0 0 59436 7812 1171128 0 0 0 0 179 184 100 0 0 0 0
2 1 0 51764 7816 1158452 0 0 4124 52 190 490 80 20 0 0 0
3 0 0 63428 7612 1146388 0 0 20472 48 251 533 86 14 0 0 0
2 0 0 63428 7616 1146412 0 0 4 0 196 508 99 1 0 0 0
2 0 0 84136 7616 1146400 0 0 0 0 186 461 84 16 0 0 0
2 0 0 61436 7608 1148960 0 0 24601 0 325 727 77 23 0 0 0
4 0 0 60196 7648 1150204 0 0 1160 76 232 611 98 2 0 0 0
4 0 0 59204 7656 1151052 0 0 52 376 305 570 80 20 0 0 0
3 0 0 59204 7656 1151052 0 0 0 0 378 433 96 4 0 0 0
1 0 0 762248 7768 1151420 0 0 106 0 253 660 74 26 0 0 0
0 0 0 859272 8188 1151892 0 0 417 0 302 550 9 26 64 1 0
0 0 0 859272 8188 1151892 0 0 0 0 111 132 0 0 100 0 0
Based on your jstat data and their explanation here: https://docs.oracle.com/en/java/javase/11/tools/jstat.html#GUID-5F72A7F9-5D5A-4486-8201-E1D1BA8ACCB5
I would not expect OutOfMemoryError just yet from the HeapSpace based on the slow and steady rate of the Old Generation filling up and the small size of the from and to space (not that I know whether your application might allocate a huge array anytime soon) unless:
initial heap size (-Xms) is smaller than the max (-Xmx) and
Linux has overcomitted virtual memory
If you do overcommit (and who doesn't) maybe you should keep an eye on Linux with vmstat 1 or gathering data frequently for sar
But I do wonder why you are not using Garbage Collection logging with -Xlog:gc*:stderr or to a file with -Xlog:gc*:file= and maybe analyze that with https://gceasy.io/ as it is very low overhead (unless writing to the logfile is slow) and very precise. For more information on the logging syntax see: https://openjdk.java.net/jeps/158 and https://openjdk.java.net/jeps/271
java -Xlog:gc*:stderr -jar yourapp.jar
and analyze those logs with great ease with tools like these:
https://gceasy.io/
JClarity Censum
This should give similar information as jstack and more in realtime (as far as I know)
I think you may already be on the wrong track:
It is more likely that your process has a general problem with allocating memory than that there are two different bugs in two different Java versions.
Have you already checked whether the process has enough memory? A segmentation fault can also occur when the process runs out of memory. I would also check the configuration of the swap file. Years ago I got inexplicable segfaults with Java 8 also somewhere in a resize or allocation method. In my case the size of the OS's swap file was set to zero.
What error do you see on top of the error log file? You only copied the information of the single thread.
UPDATE
You definitely do not have a problem with GC. If GC would be overloaded you would some when get an java.lang.OutOfMemoryError with the message:
GC Overhead limit exceeded
GC tries to collect garbage but it also has CPU constraints. Concrete behavior depends on the actual GC implementation but usually garbage will accumulate (see your big OldGen) before the GC uses more CPU cycles. So an increased heap usage is completely normal as long as you do not get the mentioned OOM error.
The segmentation faults in the native code are an indicator that there's something wrong with accessing native memory. You even get segmentation faults when the JVM tries to generate a dump. This is an additional indicator for a general problem with accessing native memory.
What's still unanswered is whether you really have enough native memory for all the processes running on your host.
Linux's overcommitment of memory usually triggers the OOM killer. But there are situations where the OOM killer is not triggered (see the kernel documentation for details). In such cases it is possible that a process may die with a SIGSEGV. Like other native applications also the JVM makes use of mmap. Also the man pages of mmap mention that depending on the used parameters a SIGSEGV may occur upon a write if no physical memory is available.

JVM with -Xmx500m actually consumes 1GB of memory

I'm trying to host Swing Java application for a lengthy period on VPS at "always-on" mode and am willing to fit it into 1GB shape (Ubuntu as host OS).
Application is started with "-Xmx500m -XX:+UseConcMarkSweepGC" and it seems reasonable that it should fit (with all other supplementary staff) into 1GB of total RAM, but ... after running application for 2-3 days top says that java application alone eats almost 1GB (see USED column) - it is twice more than specified "-Xmx500m":
KiB Mem : 1009136 total, 66084 free, 867128 used, 75924 buff/cache
KiB Swap: 716796 total, 9388 free, 707408 used. 28472 avail Mem
PID VIRT RES SWAP USED SHR S %CPU %MEM TIME+ COMMAND
2401 3076084 529648 467832 997480 1900 S 31.0 52.5 916:05.68 java
2218 285544 37548 74720 112268 8800 S 6.9 3.7 269:12.60 Xvnc4
2388 709104 11448 9744 21192 7760 S 6.9 1.1 24:45.88 mate-terminal
883 643820 10932 2540 13472 5624 S 0.0 1.1 1:48.43 do-agent
2327 544648 4092 5992 10084 2436 S 3.4 0.4 6:38.31 clock-applet
Actual 'Heap usage' within application is around 350MB (displayed by application itself).
From jstat -gc 2401 I see usage only of around 630MB. Where are other ~360MB? What I'm missing? Is it possible to reduce memory usage via some JVM options?
S0C S1C S0U S1U EC EU OC OU
8512.0 8512.0 0.0 2161.2 68160.0 67668.5 426816.0 249483.9
MC MU CCSC CCSU YGC YGCT FGC FGCT GCT
112464.0 108027.1 14160.0 13222.6 48701 1484.856 124 590.113 2074.968
https://docs.oracle.com/javase/8/docs/technotes/tools/unix/jstat.html
S0C - Current survivor space 0 - 8MB
S1C - Current survivor space 1 - 8MB
EU - Current eden space capacity - 68 MB
OC - Current old space capacity - 426 MB
MC - Metaspace capacity - 112 MB
CCSC - Compressed class space capacity - 14 MB
--> 636 MB
USED is the sum of RES + SWAP.
You only have +500Mb in your real memory, and +400Mb in the swap so it's the normal behaviour.
Here is the manual of top and more explanations about the swap.

Java code run time difference b/w two different platforms

I have deployed Java code on two different servers.The code is doing File Writing operations.
On the local server ,parameters are :
uname -a
SunOS snmi5001 5.10 Generic_120011-14 sun4u sparc SUNW,SPARC-Enterprise
ulimit -a
time(seconds) unlimited
file(blocks) unlimited
data(kbytes) unlimited
stack(kbytes) 389296
coredump(blocks) unlimited
nofiles(descriptors) 20000
vmemory(kbytes) unlimited
Java Version:
java version "1.5.0_12"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_12-b04)
Java HotSpot(TM) Server VM (build 1.5.0_12-b04, mixed mode)
On a Different(lets say MIT) server :
uname -a
SunOS au11qapcwbtels2 5.10 Generic_147440-05 sun4u sparc SUNW,Sun-Fire-15000
ulimit -a
time(seconds) unlimited
file(blocks) unlimited
data(kbytes) unlimited
stack(kbytes) 8192
coredump(blocks) unlimited
nofiles(descriptors) 256
vmemory(kbytes) unlimited
java -version
java version "1.5.0_32"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_32-b05)
Java HotSpot(TM) Server VM (build 1.5.0_32-b05, mixed mode)
The problem is that the code is running signficatly slower on the MIT server.
Because of the difference in nofiles and stack for the two OS's ,i thought if i change the ulimit -s and ulimit -n it would make a difference.
I cannot change the parameters on MIT server without confirming the problem,so the decreased the ulimit parameters for the local server and retested.But code finished execution is same time.
I have no idea what difference between the OS parameters which could be causing this.
Any help is appreciated.I will post more paramters if anyone tells me what to look for.
EDIT:
For MIT Server
No of CPU: psrinfo -p
24
psrinfo -pv
The physical processor has 2 virtual processors (0 4)
UltraSPARC-IV+ (portid 0 impl 0x19 ver 0x24 clock 1800 MHz)
The physical processor has 2 virtual processors (1 5)
UltraSPARC-IV+ (portid 1 impl 0x19 ver 0x24 clock 1800 MHz)
The physical processor has 2 virtual processors (2 6)
UltraSPARC-IV+ (portid 2 impl 0x19 ver 0x24 clock 1800 MHz)
The physical processor has 2 virtual processors (3 7)
UltraSPARC-IV+ (portid 3 impl 0x19 ver 0x24 clock 1800 MHz)
The physical processor has 2 virtual processors (32 36)
UltraSPARC-IV+ (portid 32 impl 0x19 ver 0x24 clock 1800 MHz)
The physical processor has 2 virtual processors (33 37)
UltraSPARC-IV+ (portid 33 impl 0x19 ver 0x24 clock 1800 MHz)
The physical processor has 2 virtual processors (34 38)
UltraSPARC-IV+ (portid 34 impl 0x19 ver 0x24 clock 1800 MHz)
The physical processor has 2 virtual processors (35 39)
UltraSPARC-IV+ (portid 35 impl 0x19 ver 0x24 clock 1800 MHz)
The physical processor has 2 virtual processors (64 68)
UltraSPARC-IV+ (portid 64 impl 0x19 ver 0x24 clock 1800 MHz)
The physical processor has 2 virtual processors (65 69)
UltraSPARC-IV+ (portid 65 impl 0x19 ver 0x24 clock 1800 MHz)
The physical processor has 2 virtual processors (66 70)
UltraSPARC-IV+ (portid 66 impl 0x19 ver 0x24 clock 1800 MHz)
The physical processor has 2 virtual processors (67 71)
UltraSPARC-IV+ (portid 67 impl 0x19 ver 0x24 clock 1800 MHz)
The physical processor has 2 virtual processors (96 100)
UltraSPARC-IV+ (portid 96 impl 0x19 ver 0x24 clock 1800 MHz)
The physical processor has 2 virtual processors (97 101)
UltraSPARC-IV+ (portid 97 impl 0x19 ver 0x24 clock 1800 MHz)
The physical processor has 2 virtual processors (98 102)
UltraSPARC-IV+ (portid 98 impl 0x19 ver 0x24 clock 1800 MHz)
The physical processor has 2 virtual processors (99 103)
UltraSPARC-IV+ (portid 99 impl 0x19 ver 0x24 clock 1800 MHz)
The physical processor has 2 virtual processors (128 132)
UltraSPARC-IV+ (portid 128 impl 0x19 ver 0x24 clock 1800 MHz)
The physical processor has 2 virtual processors (129 133)
UltraSPARC-IV+ (portid 129 impl 0x19 ver 0x24 clock 1800 MHz)
The physical processor has 2 virtual processors (130 134)
UltraSPARC-IV+ (portid 130 impl 0x19 ver 0x24 clock 1800 MHz)
The physical processor has 2 virtual processors (131 135)
UltraSPARC-IV+ (portid 131 impl 0x19 ver 0x24 clock 1800 MHz)
The physical processor has 2 virtual processors (224 228)
UltraSPARC-IV+ (portid 224 impl 0x19 ver 0x24 clock 1800 MHz)
The physical processor has 2 virtual processors (225 229)
UltraSPARC-IV+ (portid 225 impl 0x19 ver 0x24 clock 1800 MHz)
The physical processor has 2 virtual processors (226 230)
UltraSPARC-IV+ (portid 226 impl 0x19 ver 0x24 clock 1800 MHz)
The physical processor has 2 virtual processors (227 231)
UltraSPARC-IV+ (portid 227 impl 0x19 ver 0x24 clock 1800 MHz)
kstat cpu_info :
module: cpu_info instance: 231
name: cpu_info231 class: misc
brand UltraSPARC-IV+
chip_id 227
clock_MHz 1800
core_id 231
cpu_fru hc:///component=SB7
cpu_type sparcv9
crtime 587.102844985
current_clock_Hz 1799843256
device_ID 9223937394446500460
fpu_type sparcv9
implementation UltraSPARC-IV+ (portid 227 impl 0x19 ver 0x24 clock 1800 MHz)
pg_id 48
snaptime 19846866.5310415
state on-line
state_begin 1334854522
For the Local server i could only get the kstat info :
module: cpu_info instance: 0
name: cpu_info0 class: misc
brand SPARC64-VI
chip_id 1024
clock_MHz 2150
core_id 0
cpu_fru hc:///component=/MBU_A/CPUM0
cpu_type sparcv9
crtime 288.5675516
device_ID 250691889836161
fpu_type sparcv9
implementation SPARC64-VI (portid 1024 impl 0x6 ver 0x93 clock 2150 MHz)
snaptime 207506.8330168
state on-line
state_begin 1354493257
module: cpu_info instance: 1
name: cpu_info1 class: misc
brand SPARC64-VI
chip_id 1024
clock_MHz 2150
core_id 0
cpu_fru hc:///component=/MBU_A/CPUM0
cpu_type sparcv9
crtime 323.4572206
device_ID 250691889836161
fpu_type sparcv9
implementation SPARC64-VI (portid 1024 impl 0x6 ver 0x93 clock 2150 MHz)
snaptime 207506.8336113
state on-line
state_begin 1354493292
Similarly total 59 instances .
Also the memory for local server : vmstat
kthr memory page disk faults cpu
r b w swap free re mf pi po fr de sr s0 s1 s4 s1 in sy cs us sy id
0 0 0 143845984 93159232 431 895 1249 30 29 0 2 6 0 -0 1 3284 72450 6140 11 3 86
The memory for the MIT server : vmstat
kthr memory page disk faults cpu
r b w swap free re mf pi po fr de sr m0 m1 m2 m3 in sy cs us sy id
0 0 0 180243376 184123896 81 786 248 15 15 0 0 3 14 -0 4 1854 7563 2072 1 1 98
df -h for MIT server:
Filesystem Size Used Available Capacity Mounted on
/dev/md/dsk/d0 7.9G 6.7G 1.1G 86% /
/devices 0K 0K 0K 0% /devices
ctfs 0K 0K 0K 0% /system/contract
proc 0K 0K 0K 0% /proc
mnttab 0K 0K 0K 0% /etc/mnttab
swap 171G 1.7M 171G 1% /etc/svc/volatile
objfs 0K 0K 0K 0% /system/object
sharefs 0K 0K 0K 0% /etc/dfs/sharetab
/platform/sun4u-us3/lib/libc_psr/libc_psr_hwcap2.so.1
7.9G 6.7G 1.1G 86% /platform/sun4u-us3/lib/libc_psr.so.1
/platform/sun4u-us3/lib/sparcv9/libc_psr/libc_psr_hwcap2.so.1
7.9G 6.7G 1.1G 86% /platform/sun4u-us3/lib/sparcv9/libc_psr.so.1
/dev/md/dsk/d3 7.9G 6.6G 1.2G 85% /var
swap 6.0G 56K 6.0G 1% /tmp
swap 171G 40K 171G 1% /var/run
swap 171G 0K 171G 0% /dev/vx/dmp
swap 171G 0K 171G 0% /dev/vx/rdmp
/dev/md/dsk/d5 2.0G 393M 1.5G 21% /home
/dev/vx/dsk/appdg/oravl
2.0G 17M 2.0G 1% /ora
/dev/md/dsk/d60 1.9G 364M 1.5G 19% /apps/stats
/dev/md/dsk/d4 16G 2.1G 14G 14% /var/crash
/dev/md/dsk/d61 1005M 330M 594M 36% /opt/controlm6
/dev/vx/dsk/appdg/oraproductvl
10G 2.3G 7.6G 24% /ora/product
/dev/md/dsk/d63 963M 1.0M 904M 1% /var/opt/app
/dev/vx/dsk/dmldg/appsdmlsvtvl
1.0T 130G 887G 13% /apps/dml/svt
/dev/vx/dsk/appdg/homeappusersvl
20G 19G 645M 97% /home/app/users
/dev/vx/dsk/dmldg/appsdmlmit2vl
20G 66M 20G 1% /apps/dml/mit2
/dev/vx/dsk/dmldg/datadmlmit2vl
1.9T 1.1T 773G 61% /data/dml/mit2
/dev/md/dsk/d62 9.8G 30M 9.7G 1% /usr/openv/netbackup/logs
df -h for local server :
Filesystem Size Used Available Capacity Mounted on
/dev/dsk/c0t0d0s0 20G 7.7G 12G 40% /
/devices 0K 0K 0K 0% /devices
ctfs 0K 0K 0K 0% /system/contract
proc 0K 0K 0K 0% /proc
mnttab 0K 0K 0K 0% /etc/mnttab
swap 140G 1.6M 140G 1% /etc/svc/volatile
objfs 0K 0K 0K 0% /system/object
fd 0K 0K 0K 0% /dev/fd
/dev/dsk/c0t0d0s5 9.8G 9.3G 483M 96% /var
swap 140G 504K 140G 1% /tmp
swap 140G 80K 140G 1% /var/run
swap 140G 0K 140G 0% /dev/vx/dmp
swap 140G 0K 140G 0% /dev/vx/rdmp
/dev/dsk/c0t0d0s6 9.8G 9.4G 403M 96% /opt
/dev/vx/dsk/eva8k/tlkhome
2.0G 66M 1.8G 4% /tlkhome
/dev/vx/dsk/eva8k/tlkuser4
48G 26G 20G 57% /tlkuser4
/dev/vx/dsk/eva8k/ST82
1.1G 17M 999M 2% /ST_A_82
/dev/vx/dsk/eva8k/tlkuser11
37G 37G 176M 100% /tlkuser11
/dev/vx/dsk/eva8k/oravl97
20G 12G 7.3G 63% /oravl97
/dev/vx/dsk/eva8k/tlkuser5
32G 23G 8.3G 74% /tlkuser5
/dev/vx/dsk/eva8k/mbtlkproj1
2.0G 18M 1.9G 1% /mbtlkproj1
/dev/vx/dsk/eva8k/Oravol98
38G 25G 12G 68% /oravl98
/dev/vx/dsk/eva8k_new/tlkuser15
57G 57G 0K 100% /tlkuser15
/dev/vx/dsk/eva8k/Oravol1
39G 16G 22G 42% /oravl01
/dev/vx/dsk/eva8k/Oravol99
30G 8.3G 20G 30% /oravl99
/dev/vx/dsk/eva8k/tlkuser9
18G 13G 4.8G 73% /tlkuser9
/dev/vx/dsk/eva8k/oravl08
32G 25G 6.3G 81% /oravl08
/dev/vx/dsk/eva8k/oravl07
46G 45G 1.2G 98% /oravl07
/dev/vx/dsk/eva8k/Oravol3
103G 90G 13G 88% /oravl03
/dev/vx/dsk/eva8k_new/tlkuser12
79G 79G 0K 100% /tlkuser12
/dev/vx/dsk/eva8k/Oravol4
88G 83G 4.3G 96% /oravl04
/dev/vx/dsk/eva8k/oravl999
10G 401M 9.0G 5% /oravl999
/dev/vx/dsk/eva8k_new/tlkuser14
54G 39G 15G 73% /tlkuser14
/dev/vx/dsk/eva8k/Oravol2
85G 69G 14G 84% /oravl02
/dev/vx/dsk/eva8k/sdkhome
1.0G 17M 944M 2% /sdkhome
/dev/vx/dsk/eva8k/tlkuser7
44G 36G 7.8G 83% /tlkuser7
/dev/vx/dsk/eva8k/tlkproj1
1.0G 17M 944M 2% /tlkproj1
/dev/vx/dsk/eva8k/tlkuser3
35G 29G 5.9G 84% /tlkuser3
/dev/vx/dsk/eva8k/tlkuser10
29G 29G 2.7M 100% /tlkuser10
/dev/vx/dsk/eva8k/oravl05
30G 29G 1.2G 97% /oravl05
/dev/vx/dsk/eva8k/oravl06
36G 34G 1.6G 96% /oravl06
/dev/vx/dsk/eva8k/tlkuser6
29G 27G 2.1G 93% /tlkuser6
/dev/vx/dsk/eva8k/tlkuser2
36G 30G 5.8G 84% /tlkuser2
/dev/vx/dsk/eva8k/tlkuser1
66G 49G 16G 75% /tlkuser1
/dev/vx/dsk/eva8k_new/tlkuser13
84G 77G 7.0G 92% /tlkuser13
/dev/vx/dsk/eva8k_new/tlkuser16
44G 37G 6.4G 86% /tlkuser16
/dev/vx/dsk/eva8k/db2
1.0G 593M 404M 60% /opt/db2V8.1
/dev/vx/dsk/eva8k/WebSphere6029
3.0G 2.2G 776M 75% /opt/WebSphere6029
/dev/vx/dsk/eva8k/websphere6
2.0G 88M 1.8G 5% /opt/websphere6
/dev/vx/dsk/eva8k/wli
4.0G 1.4G 2.5G 36% /opt/wli10gR3MP1
/dev/vx/dsk/eva8k/user
2.0G 19M 1.9G 1% /user/telstra/history
dvcinasdm3:/oracle_cdrom/data
576G 576G 206M 100% /oracle_cdrom
dvcinasdm2:/system_kits
822G 818G 4.2G 100% /system_kits
dvcinasdm2:/db_share 295G 283G 13G 96% /db_share
dvcinas2dm2:/system_data/data
315G 283G 32G 90% /system_data
dvcinas2dm2:/ossinfra/data
49G 18G 32G 36% /ossinfra
For local server the command : /usr/sbin/prtpicl -v | egrep "devfs-path|driver-name|subsystem-id" | nawk '/:subsystem-id/ { print $0; getline; print $0; getline; print $0; }' | nawk -F: '{ print $2 }' gives :
subsystem-id 0x13a1
devfs-path /pci#0,600000/pci#0/pci#8/pci#0/scsi#1
driver-name mpt
subsystem-id 0x1648
devfs-path /pci#0,600000/pci#0/pci#8/pci#0/network#2
driver-name bge
subsystem-id 0x1648
devfs-path /pci#0,600000/pci#0/pci#8/pci#0/network#2,1
driver-name bge
subsystem-id 0xfc11
devfs-path /pci#0,600000/pci#0/pci#8/pci#0,1/SUNW,emlxs#1
driver-name emlxs
subsystem-id 0x125e
devfs-path /pci#3,700000/network
driver-name e1000g
subsystem-id 0x125e
devfs-path /pci#3,700000/network
driver-name e1000g
subsystem-id 0x13a1
devfs-path /pci#10,600000/pci#0/pci#8/pci#0/scsi#1
driver-name mpt
subsystem-id 0x1648
devfs-path /pci#10,600000/pci#0/pci#8/pci#0/network
driver-name bge
subsystem-id 0x1648
devfs-path /pci#10,600000/pci#0/pci#8/pci#0/network
driver-name bge
subsystem-id 0xfc11
devfs-path /pci#10,600000/pci#0/pci#8/pci#0,1/SUNW,emlxs#1
driver-name emlxs
For MIT server it gives :
subsystem-id 0xfc00
devfs-path /pci#3d,600000/SUNW,emlxs#1
driver-name emlxs
subsystem-id 0xfc00
devfs-path /pci#3d,600000/SUNW,emlxs#1,1
driver-name emlxs
subsystem-id 0xfc00
devfs-path /pci#5d,600000/SUNW,emlxs#1
driver-name emlxs
subsystem-id 0xfc00
devfs-path /pci#5d,600000/SUNW,emlxs#1,1
driver-name emlxs
on the start of i/o consuming code,iostat -d c3t50001FE1502613A9d7 5 shows :
1161 37 134 0 0 0 0 0 0 329 24 2
3 2 3 0 0 0 0 0 0 554 71 10
195 26 6 0 0 0 0 0 0 853 108 19
37 6 4 0 0 0 0 0 0 1134 143 10
140 8 7 0 0 0 0 0 0 3689 86 7
173 24 85 0 0 0 0 0 0 9914 74 9
0 0 0 0 0 0 0 0 0 12323 114 2
13 9 41 0 0 0 0 0 0 10609 117 2
0 0 0 0 0 0 0 0 0 10746 72 2
sd0 sd1 sd4 ssd134
kps tps serv kps tps serv kps tps serv kps tps serv
1 0 3 0 0 0 0 0 0 11376 137 2
2 0 10 0 0 0 0 0 0 11980 157 3
231 39 14 0 0 0 0 0 0 10584 140 3
785 175 5 0 0 0 0 0 0 13503 170 2
9 4 32 0 0 0 0 0 0 11597 168 2
7 1 6 0 0 0 0 0 0 11555 106 2
On the MIT server iostat shows :
0.0 460.4 0.0 4029.2 0.4 0.6 0.9 1.2 2 11 c6t5006048452A79BD6d206
0.0 885.2 0.0 8349.3 0.5 0.8 0.6 0.9 3 24 c4t5006048452A79BD9d206
0.0 660.0 0.0 5618.8 0.5 0.7 0.7 1.0 2 18 c6t5006048452A79BD6d206
0.0 779.1 0.0 7408.6 0.3 0.7 0.4 0.8 2 21 c4t5006048452A79BD9d206
0.0 569.8 0.0 4893.9 0.3 0.5 0.5 1.0 2 15 c6t5006048452A79BD6d206
0.0 521.5 0.0 5433.6 0.2 0.5 0.3 0.9 1 16 c4t5006048452A79BD9d206
0.0 362.8 0.0 3134.8 0.2 0.4 0.6 1.1 1 10 c6t5006048452A79BD6d206
So,we can see that the kps for local server is much more than that of MIT server,during the time of max i/o operations.
Conclusions on the local and MIT server
A quick glance at your machines:
Local server is a small-chassis Sun Enterprise machine on SPARC VI, possibly a M4000. You are writing data on an external file system (called eva8k_new) over multipathed PCIe slots using a direct SCSI connection. This machine is 3-5 years old.
MIT server is a SunFire 15000 - an old, mainframe-class Solaris server. It has 12 dual-core UltraSPARC IV+ CPUs in the hardware partition that you are running in (the physical chassis can be logically split into several different hardware partitions which cannot see each other at all). You are writing to a SAN over a 1Gb/s or 2Gb/s fibre channel (the LUN might be called dmldg) on multipathed PCI slots. This machine is at least 7 years old, but the technology is 10 years old.
The storage system used on the local and MIT servers are both external. The performance of the storage is dependent on a number of factors including the I/O speed of the physical interface (PCI vs. PCIe) and the interconnect (1 or 2Gb/s fibre channel on the SunFire). This article explains how to get this information.
Theoretical performance problems
The performance of your application may be gated on one of several bottlenecks (assuming no code problems and network latencies/bottlenecks):
CPU: If your CPU were faster, you could get the application to go faster.
Single-threaded: Some applications are bottlenecked on a single thread, and so adding threads/cores does not improve performance.
Multi-thread capable: Sometimes, if the application is multi-threaded, adding more threads/cores can improve performance
Storage IO bandwidth or IOPS: The application is reading from or writing to storage system (including disks). Adding disks, changing RAID type, adding disk cache and other things may improve IO or IOPS; alternatively you might change to another storage subsystem.
IO bandwidth is the maximum amount of data that can pass in a given second, which may saturate first if streaming data to or from a disk
IOPS (IO operations per second) is the maximum number of IO commands (read or write) that can be processed per second. Typically this saturates first for processes that are searching for or in files, or (re)writing small chunks.
Looking at your issue, we can do a quick check:
If the issue is CPU, then:
You should see the CPU utilisation for the java process in top to be very high during program execution (90-99%)
The problem is not likely threading, because the SunFire MIT Server has a good number of cores available, therefore the problem is single-thread performance.
The UltraSPARC IV+ is quite a lot slower than the SPARC VI's. This is easily a noticeable drop, so this might be the reason the MIT server is slower
If the issue is IO, then:
You will see the CPU utilization for the java process in top to be low (probably 50% or lower, but possibly as high as 80% or so as a rule of thumb)
You will see the IO to the disk subsystem using iostat saturate - that is immediately rise to a fixed number and not really 'peak' over this number. The following options might be useful: iostat -d <disk> 5. The throughput value and number of operations/sec will be higher on the local server, and lower on the MIT server
You need to speak to the administrator to see if a faster storage system is available for the MIT server.
All the above is assuming that other processes on the servers are not interfering with the operation of your program - clearly another high-cpu process or one writing a lot to the same disk will affect the performance greatly.
Conclusions
From the CPU data you provide, there is no evidence of a CPU bottleneck.
From the iostat data you provide, as you comment, the IO on the SunFire is significantly below that of the local server. This is likely the result of the attached storage, namely at least one of:
Lower performance of PCI vs. PCIe in the local server
Probable 1Gb/s fibre channel slower than the (possibly faster) SCSI attached storage on the local server
Older and slower disks on the SunFire vs. the local attached storage
(Note that the same SAN appears connected to the local server, so this could be tested).
With clear evidence of a hardware being the cause of the performance difference, there is little that can be done.
Some things may improve the general performance of the application, though. It's a good idea to run a Java profiler on the application. Examples include Netbeans and JProfiler.
The profiler will identify which IO operations are the problem. You might be able to:
Generally improve the algorithm at the bottleneck
Use a caching layer to aggregate multiple write operations before writing once
If using the original Java I/O clases (in java.io), you could rewrite the application to use Java NIO
EDIT: Thoughts on a caching layer
Assumption: That the problematic IO operation is either repeatedly writing small chunks to disk and flushing them, or keeps performing random-access write-to-disk operations. Your application may already be streaming to disk efficiently, in which case caching would not be useful.
When you have an expensive or slow operation in an application, you will want to minimize the number of times it is invoked - ideally to the theoretical minimum which hopefully is 1. However your code may not be doing so - for example you are using an OutputStream and writing small chunks to it and flushing to disk. In this case, you may write each disk block (8k) many times, each time with just a little more data.
Instead, you could use a RAM cache to consolidate all the writes; when you know there will be no more writes to the block, then you write it exactly once to disk. For streaming, Java has the BufferedOutputStream for this for simplistic cases. When you obtain the FileOutputStream instance from the File, wrap the FileOutputStream in the BufferedOutputStream and use only the BufferedOutputStream.
If, however, you are performing true random-access writes (eg using a java.io.RandomAccessFile), and moving the file pointer with RandomAccessFile.seek(), you may want to consider writing a write cache in RAM. Precisely what this would look like depends wholly on your file data structure, but you might want to start with a block paging mechanism. Chapter 1 of Java NIO has an introduction to those concepts, but hopefully you either don't need to go there or you find a close match in the NIO API.
If you are concerned about performance, I wouldn't use such an old version of Java. It's quite likely that the OS calls and native code generated for one architecture is sub-optimal. I would expect the newer architecure to suffer.
Can you compare Java 7 between these machines?
The ulimit suggest the first machine has much more resources. Which model of CPUs and how much memory do the two machines have?

Why does 64 bit JVM throw Out Of Memory before xmx is reached?

I am wrestling with large memory requirements for a java app.
In order to address more memory I have switch to a 64 bit JVM and am using a large xmx.
However, when the xmx is above 2GB the app seems to run out of memory earlier than expected.
When running with an xmx of 2400M and looking at GC info from -verbosegc I get...
[Full GC 2058514K->2058429K(2065024K), 0.6449874 secs]
...and then it throws an out of memory exception. I would expect it to increase the heap above 2065024K before running out of memory.
In a trivial example i have a test program that allocates memory in a loop and prints out information from Runtime.getRuntime().maxMemory() and Runtime.getRuntime().totalMemory() until it eventually runs out of memory.
Running this over a range of xmx values it appears that Runtime.getRuntime().maxMemory() reports about 10% less than xmx and that total memory will not grow beyond 90% of Runtime.getRuntime().maxMemory().
I am using the following 64bit jvm:
java version "1.6.0_26"
Java(TM) SE Runtime Environment (build 1.6.0_26-b03)
Java HotSpot(TM) 64-Bit Server VM (build 20.1-b02, mixed mode)
Here is the code:
import java.util.ArrayList;
public class XmxTester {
private static String xmxStr;
private long maxMem;
private long usedMem;
private long totalMemAllocated;
private long freeMem;
private ArrayList list;
/**
* #param args
*/
public static void main(String[] args) {
xmxStr = args[0];
XmxTester xmxtester = new XmxTester();
}
public XmxTester() {
byte[] mem = new byte[(1024 * 1024 * 50)];
list = new ArrayList();
while (true) {
printMemory();
eatMemory();
}
}
private void eatMemory() {
// TODO Auto-generated method stub
byte[] mem = null;
try {
mem = new byte[(1024 * 1024)];
} catch (Throwable e) {
System.out.println(xmxStr + "," + ConvertMB(maxMem) + ","
+ ConvertMB(totalMemAllocated) + "," + ConvertMB(usedMem)
+ "," + ConvertMB(freeMem));
System.exit(0);
}
list.add(mem);
}
private void printMemory() {
maxMem = Runtime.getRuntime().maxMemory();
freeMem = Runtime.getRuntime().freeMemory();
totalMemAllocated = Runtime.getRuntime().totalMemory();
usedMem = totalMemAllocated - freeMem;
}
double ConvertMB(long bytes) {
int CONVERSION_VALUE = 1024;
return Math.round((bytes / Math.pow(CONVERSION_VALUE, 2)));
}
}
I use this batch file to run it over multiple xmx settings. Its includes references to a 32 bit JVM, I wanted a comparison to a 32bit jvm - obviously this call fails as soon as xmx is larger than about 1500M
#echo off
set java64=<location of 64bit JVM>
set java32=<location of 32bit JVM>
set xmxval=64
:start
SET /a xmxval = %xmxval% + 64
%java64% -Xmx%xmxval%m -XX:+UseCompressedOops -XX:+DisableExplicitGC XmxTester %xmxval%
%java32% -Xms28m -Xmx%xmxval%m XmxTester %xmxval%
if %xmxval% == 4500 goto end
goto start
:end
pause
This spits out a csv which when put into excel looks like this (apologies for my poor formatting here)
32 bit
XMX max mem total mem free mem %of xmx used before out of mem exception
128 127 127 125 2 98.4%
192 191 191 189 1 99.0%
256 254 254 252 2 99.2%
320 318 318 316 1 99.4%
384 381 381 379 2 99.5%
448 445 445 443 1 99.6%
512 508 508 506 2 99.6%
576 572 572 570 1 99.7%
640 635 635 633 2 99.7%
704 699 699 697 1 99.7%
768 762 762 760 2 99.7%
832 826 826 824 1 99.8%
896 889 889 887 2 99.8%
960 953 953 952 0 99.9%
1024 1016 1016 1014 2 99.8%
1088 1080 1080 1079 1 99.9%
1152 1143 1143 1141 2 99.8%
1216 1207 1207 1205 2 99.8%
1280 1270 1270 1268 2 99.8%
1344 1334 1334 1332 2 99.9%
64 bit
128 122 122 116 6 90.6%
192 187 187 180 6 93.8%
256 238 238 232 6 90.6%
320 285 281 275 6 85.9%
384 365 365 359 6 93.5%
448 409 409 402 6 89.7%
512 455 451 445 6 86.9%
576 512 496 489 7 84.9%
640 595 595 565 30 88.3%
704 659 659 629 30 89.3%
768 683 682 676 6 88.0%
832 740 728 722 6 86.8%
896 797 772 766 6 85.5%
960 853 832 825 6 85.9%
1024 910 867 860 7 84.0%
1088 967 916 909 6 83.5%
1152 1060 1060 1013 47 87.9%
1216 1115 1115 1068 47 87.8%
1280 1143 1143 1137 6 88.8%
1344 1195 1174 1167 7 86.8%
1408 1252 1226 1220 6 86.6%
1472 1309 1265 1259 6 85.5%
1536 1365 1317 1261 56 82.1%
1600 1422 1325 1318 7 82.4%
1664 1479 1392 1386 6 83.3%
1728 1536 1422 1415 7 81.9%
1792 1593 1455 1448 6 80.8%
1856 1650 1579 1573 6 84.8%
1920 1707 1565 1558 7 81.1%
1984 1764 1715 1649 66 83.1%
2048 1821 1773 1708 65 83.4%
2112 1877 1776 1769 7 83.8%
2176 1934 1842 1776 66 81.6%
2240 1991 1899 1833 65 81.8%
2304 2048 1876 1870 6 81.2%
2368 2105 1961 1955 6 82.6%
2432 2162 2006 2000 6 82.2%
Why does it happen?
Basically, there are two strategies that the JVM / GC can use to decide when to give up and throw an OOME.
It can keep going and going until there is simply not enough memory after garbage collection to allocate the next object.
It can keep going until the JVM is spending more than a given percentage of time running the garbage collector.
The first approach has the problem that for a typical application the JVM will spend a larger and larger percentage of its time running the GC, in an ultimately futile effort to complete the task.
The second approach has the problem that it might give up too soon.
The actual behaviour of the GC in this area is governed by JVM options (-XX:...). Apparently, the default behaviour differs between 32 and 64 bit JVMs. This kind of makes sense, because (intuitively) the "out of memory death spiral" effect for a 64 bit JVM will last longer and be more pronounced.
My advice would be to leave this issue alone. Unless you really need to fill every last byte of memory with stuff it is better for the JVM to die early and avoid wasting lots of time. You can then restart it with more memory and get the job done.
Clearly, your benchmark is atypical. Most real programs simply don't try to grab all of the heap. It is possible that your application is atypical too. But it is also possible that your application is suffering from a memory leak. If that is the case, you should be investigating the leak rather than trying to figure out why you can't use all of memory.
However my issue is mainly with why it does not honor my xmx setting.
It is honoring it! The -Xmx is the upper limit on the heap size, not the criterion for deciding when to give up.
I have set an XMX of 2432M but asking the JVM to return its understanding of max memory returns 2162M.
It is returning the max memory that it has used, not the max memory it is allowed to use.
Why does it 'think' the max memory is 11% less than the xmx?
See above.
Furthermore why when the heap hits 2006M does it not extend the heap to at least 2162 ?
I presume that it is because the JVM has hit the "too much time spent garbage collecting" threshold.
Does this mean in 64 bit JVMs one should fudge the XMX setting to be 11% higher than the intended maximum ?
Not in general. The fudge factor depends on your application. For instance, an application with a larger rate of object churn (i.e. more objects created and discarded per unit of useful work) is likely to die with an OOME sooner.
I can predict the requirments based on db size and have a wrapper that adjusts xmx, howeveri have the 11% problem whereby my montioring suggests the app needs 2 GB, so I set a 2.4GB xmx. however instead of having an expected 400MB of 'headroom' the jvm only allows the heap to grow to 2006M.
IMO, the solution is to simply add an extra 20% (or more) on top of what you are currently adding. Assuming that you have enough physical memory, giving the JVM a larger heap is going to reduce overall GC overheads and make your application run faster.
The other tricks that you could try is to set -Xmx and -Xms to the same value and adjusting the tuning parameter that sets the maximum "time spent garbage collecting" ratio.

Categories