Debugging JVM memory leak

Debugging JVM memory leak - java

I have a Java application that uses a native library for some of its functionality. It uses JNI to control the native library and also receives asynchronous callback from the library. You can think of it as a Java frontend and native backend that communicate with each other.
I am facing a memory leak. Shortly after I start the application, the memory slowly but steadily increases. So I tried to look what could cause the leak.
First, I tried replacing the Java frontend with a simple C++ text interface. That way, the application doesn't use Java in any way - and the leaks stopped. So the problem must be in Java frontend.
So I fired up the jvisualVM to see if the heap increases - and it turned out it doesn't. The Java heap size was fairly constant. I even launched the program with xmx32m, but the memory kept increasing well past 100m without any OutOfMemoryErrors. In fact, the jvisualVM showed Java heap at about 7m.
So I dug deeper into the program with WinDbg. I analyzed the heap patterns with !heap -s command and I got this:
Heaps on a freshly run program:
0:059> !heap -s
LFH Key : 0x382288b9
Termination on corruption : ENABLED
Heap Flags Reserv Commit Virt Free List UCR Virt Lock Fast
(k) (k) (k) (k) length blocks cont. heap
-----------------------------------------------------------------------------
00330000 00000002 2048 1704 2048 22 71 2 0 0 LFH
005b0000 00001002 1088 212 1088 68 3 2 0 0 LFH
00aa0000 00001002 1088 108 1088 15 7 2 0 0 LFH
004f0000 00001002 15424 12876 15424 1372 89 9 0 1 LFH
...
0:059> !heap -stat -h 004f0000
heap # 004f0000
group-by: TOTSIZE max-display: 20
size #blocks total ( %) (percent of total busy bytes)
2b110 20 - 562200 (60.36)
98 166e - d5150 (9.33)
6cd20 1 - 6cd20 (4.77)
...
Heaps on a program that has been running for about half an hour:
0:046> !heap -s
LFH Key : 0x5e47ba72
Termination on corruption : ENABLED
Heap Flags Reserv Commit Virt Free List UCR Virt Lock Fast
(k) (k) (k) (k) length blocks cont. heap
-----------------------------------------------------------------------------
006b0000 00000002 2048 1744 2048 46 92 2 0 0 LFH
00200000 00001002 1088 220 1088 68 3 2 0 0 LFH
00950000 00001002 1088 108 1088 15 7 2 0 0 LFH
001b0000 00001002 47808 31936 47808 1855 102 12 0 0 LFH
...
0:046> !heap -stat -h 001b0000
heap # 001b0000
group-by: TOTSIZE max-display: 20
size #blocks total ( %) (percent of total busy bytes)
98 59d1 - 355418 (36.67)
2b110 10 - 2b1100 (29.61)
6cd20 1 - 6cd20 (4.68)
...
Now it can be clearly seen that the leaks are caused by a growing number of blocks with size 98. But when I try to analyze one of the blocks with !heap -p -a, I get:
*** ERROR: Symbol file could not be found. Defaulted to export symbols for jvm.dll
without any stack trace. So the blocks are allocated somewhere inside the jvm.dll, and because there are no pdbs for JVM, I cannot debug the leak further.
I managed to pinpoint where the leak is occuring in my code. All callbacs to the Java frontend pass through one function:
void callback(JNIEnv *env, int stream, double value, char *callbackName){
jclass jni = env->FindClass("nativ/Callbacks");
jmethodID callbackMethodID = env->GetStaticMethodID(jni, callbackName, "(ID)V");
jvalue params[2];
params[0].i = (long)(stream);
params[1].d = value;
env->CallStaticVoidMethodA(jni, callbackMethodID, params); //commenting this out stops the leaks
}
When I comment out the last command, the leaks stop, but I get no feedback back to the frontend.
Could this be a JVM bug? How do I find out?

malloc() internally calls HeapAlloc(). I guess you need a 'Release' method to release the memory allocated by JVM, as long as your library hold reference to JVM's internal state.

Related

How do I decide on a suitable TLABSIZE setting for a Java application?

My Java application on an single cpu arm7 (32bit) device using Java 14 is occasionally crashing
after running under load for a number of hours, and is always failing in ThreadLocalAllocBuffer::resize()
A fatal error has been detected by the Java Runtime Environment:
#
SIGSEGV (0xb) at pc=0xb6cd515e, pid=1725, tid=1733
#
JRE version: OpenJDK Runtime Environment (14.0+36) (build 14+36)
Java VM: OpenJDK Client VM (14+36, mixed mode, serial gc, linux-arm)
Problematic frame:
V
#
No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
If you would like to submit a bug report, please visit:
https://bugreport.java.com/bugreport/crash.jsp
#
--------------- S U M M A R Y ------------
Command Line: -Duser.home=/mnt/app/share/log -Djdk.lang.Process.launchMechanism=vfork -Xms150m -Xmx900m -Dcom.mchange.v2.log.MLog=com.mchange.v2.log.jdk14logging.Jdk14MLog -Dorg.jboss.logging.provider=jdk -Djava.util.logging.config.class=com.jthink.songkong.logging.StandardLogging --add-opens=java.base/java.lang=ALL-UNNAMED lib/songkong-6.9.jar -r
Host: Marvell PJ4Bv7 Processor rev 1 (v7l), 1 cores, 1G, Buildroot 2014.11-rc1
Time: Fri Apr 24 19:36:54 2020 BST elapsed time: 37456 seconds (0d 10h 24m 16s)
--------------- T H R E A D ---------------
Current thread (0xb6582a30): VMThread "VM Thread" [stack: 0x7b716000,0x7b796000] [id=3625] _threads_hazard_ptr=0x7742f140
Stack: [0x7b716000,0x7b796000], sp=0x7b7946b0, free space=505k
Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code)
V [libjvm.so+0x48015e] ThreadLocalAllocBuffer::resize()+0x85
[error occurred during error reporting (printing native stack), id 0xb, SIGSEGV (0xb) at pc=0xb6b4ccae]
Now this must surely be bug in JVM, but as its not one of the standard Java platforms and I dont have a simple test case I cannot see it getting fixed anytime soon, so I am trying to workaround it. Its also worth noting that it crashed with ThreadLocalAllocBuffer::accumulate_statistics_before_gc() when I used Java 11 which is why I moved to Java 14 to try and resolve the issue.
As the the issue is with TLABs one solution is to disable TLABS with -XX:-UseTLAB but that makes the code run slower on an already slow machine.
So I think another solution is to disable resizing with -XX:-ResizeTLAB, but then I need to know work out a suitable size and specify that using -XX:TLABSize=N. But I am not sure what N actually represents and what would be a suitable size to set
I tried setting -XX:TLABSize=1000000 which seems to me to be quite large ?
I have some logging set with
-Xlog:tlab*=debug,tlab*=trace:file=gc.log:time:filecount=7,filesize=8M
but I don't really understand the output.
[2020-05-19T15:43:43.836+0100] ThreadLocalAllocBuffer::compute_size(132) returns 250132
[2020-05-19T15:43:43.837+0100] TLAB: fill thread: 0x0026d548 [id: 871] desired_size: 976KB slow allocs: 0 refill waste: 15624B alloc: 0.25725 1606KB refills: 1 waste 0.0% gc: 0B slow: 0B fast: 0B
[2020-05-19T15:43:43.853+0100] ThreadLocalAllocBuffer::compute_size(6) returns 250006
[2020-05-19T15:43:43.854+0100] TLAB: fill thread: 0xb669be48 [id: 32635] desired_size: 976KB slow allocs: 0 refill waste: 15624B alloc: 0.00002 0KB refills: 1 waste 0.0% gc: 0B slow: 0B fast: 0B
[2020-05-19T15:43:43.910+0100] ThreadLocalAllocBuffer::compute_size(4) returns 250004
[2020-05-19T15:43:43.911+0100] TLAB: fill thread: 0x76c1d6f8 [id: 917] desired_size: 976KB slow allocs: 0 refill waste: 15624B alloc: 0.91261 8085KB refills: 1 waste 0.0% gc: 0B slow: 0B fast: 0B
[2020-05-19T15:43:43.962+0100] ThreadLocalAllocBuffer::compute_size(2052) returns 252052
[2020-05-19T15:43:43.962+0100] TLAB: fill thread: 0x76e06f10 [id: 534] desired_size: 976KB slow allocs: 4 refill waste: 15688B alloc: 0.13977 1612KB refills: 2 waste 0.2% gc: 0B slow: 4520B fast: 0B
[2020-05-19T15:43:43.982+0100] ThreadLocalAllocBuffer::compute_size(28878) returns 278878
[2020-05-19T15:43:43.983+0100] TLAB: fill thread: 0x76e06f10 [id: 534] desired_size: 976KB slow allocs: 4 refill waste: 15624B alloc: 0.13977 1764KB refills: 3 waste 0.3% gc: 0B slow: 10424B fast: 0B
[2020-05-19T15:43:44.023+0100] ThreadLocalAllocBuffer::compute_size(4) returns 250004
[2020-05-19T15:43:44.023+0100] TLAB: fill thread: 0x7991df20 [id: 32696] desired_size: 976KB slow allocs: 0 refill waste: 15624B alloc: 0.00132 19KB refills: 1 waste 0.0% gc: 0B slow: 0B fast: 0B
Update
I reran with -XX:+HeapDumpOnOutOfMemoryError option added, and this time it showed:
java.lang.OutOfMemoryError: Java heap space
Dumping heap to java_pid1600.hprof ...
but then the dump itself failed with
#
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0xb6a81b9a, pid=1600, tid=1606
#
# JRE version: OpenJDK Runtime Environment (14.0+36) (build 14+36)
# Java VM: OpenJDK Client VM (14+36, mixed mode, serial gc, linux-arm)
# Problematic frame:
# V [libjvm.so+0x22eb9a] DumperSupport::dump_field_value(DumpWriter*, char, oopDesc*, int)+0x91
#
# No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# /mnt/system/config/Apps/SongKong/songkong/hs_err_pid1600.log
#
# If you would like to submit a bug report, please visit:
# https://bugreport.java.com/bugreport/crash.jsp
I am not clear if the dump failed because of ulimit or soemthing else, but
java_pid1600.hprof was created but was empty
I was also monitoring the process with jstat -gc, and jstat -gcutil. I paste the end of the putput here, to me it does not look like there was a particular memory problem before the crash, although I am only checking every 5 seconds so maybe that is the issue ?
[root#N1-0247 bin]# ./jstat -gc 1600 5s
S0C S1C S0U S1U EC EU OC OU MC MU CCSC CCSU YGC YGCT FGC FGCT CGC CGCT GCT
........
30720.0 30720.0 0.0 0.0 245760.0 236647.2 614400.0 494429.2 50136.0 49436.9 0.0 0.0 5084 3042.643 155 745.523 - - 3788.166
30720.0 30720.0 0.0 28806.1 245760.0 244460.2 614400.0 506541.7 50136.0 49436.9 0.0 0.0 5085 3043.887 156 745.523 - - 3789.410
30720.0 30720.0 28760.4 0.0 245760.0 245760.0 614400.0 514809.7 50136.0 49437.2 0.0 0.0 5086 3044.895 157 751.204 - - 3796.098
30720.0 30720.0 0.0 231.1 245760.0 234781.8 614400.0 514809.7 50136.0 49437.2 0.0 0.0 5087 3044.895 157 755.042 - - 3799.936
30720.0 30720.0 0.0 0.0 245760.0 190385.5 614400.0 519650.7 50136.0 49449.6 0.0 0.0 5087 3045.905 159 758.890 - - 3804.795
30720.0 30720.0 0.0 0.0 245760.0 190385.5 614400.0 519650.7 50136.0 49449.6 0.0 0.0 5087 3045.905 159 758.890 - - 3804.795
[root#N1-0247 bin]# ./jstat -gc 1600 5s
S0 S1 E O M CCS YGC YGCT FGC FGCT CGC CGCT GCT
..............
99.70 0.00 100.00 75.54 98.56 - 5080 3037.321 150 724.674 - - 3761.995
0.00 29.93 99.30 75.55 98.56 - 5081 3038.403 151 728.584 - - 3766.987
0.00 100.00 99.30 75.94 98.56 - 5081 3039.405 152 728.584 - - 3767.989
100.00 0.00 99.14 76.14 98.56 - 5082 3040.366 153 734.088 - - 3774.454
0.00 96.58 99.87 78.50 98.57 - 5083 3041.366 154 737.960 - - 3779.325
56.99 0.00 100.00 78.50 98.58 - 5084 3041.366 154 741.880 - - 3783.246
0.00 0.00 96.29 80.47 98.61 - 5084 3042.643 155 745.523 - - 3788.166
0.00 93.77 99.47 82.44 98.61 - 5085 3043.887 156 745.523 - - 3789.410
93.62 0.00 100.00 83.79 98.61 - 5086 3044.895 157 751.204 - - 3796.098
0.00 0.76 95.53 83.79 98.61 - 5087 3044.895 157 755.042 - - 3799.936
0.00 0.00 77.47 84.58 98.63 - 5087 3045.905 159 758.890 - - 3804.795
0.00 0.00 77.47 84.58 98.63 - 5087 3045.905 159 758.890 - - 3804.795
Update Latest run
Configured gclogging, i get many
Pause Young (Allocation Failure)
errors, does this indicate I need to make the eden space larger?
[2020-05-29T14:00:22.668+0100] GC(44) Pause Young (GCLocker Initiated GC)
[2020-05-29T14:00:22.739+0100] GC(44) DefNew: 43230K(46208K)->4507K(46208K) Eden: 41088K(41088K)->0K(41088K) From: 2142K(5120K)->4507K(5120K)
[2020-05-29T14:00:22.739+0100] GC(44) Tenured: 50532K(102400K)->50532K(102400K)
[2020-05-29T14:00:22.740+0100] GC(44) Metaspace: 40054K(40536K)->40054K(40536K)
[2020-05-29T14:00:22.740+0100] GC(44) Pause Young (GCLocker Initiated GC) 91M->53M(145M) 72.532ms
[2020-05-29T14:00:22.741+0100] GC(44) User=0.07s Sys=0.00s Real=0.07s
[2020-05-29T14:00:25.196+0100] GC(45) Pause Young (Allocation Failure)
[2020-05-29T14:00:25.306+0100] GC(45) DefNew: 45595K(46208K)->2150K(46208K) Eden: 41088K(41088K)->0K(41088K) From: 4507K(5120K)->2150K(5120K)
[2020-05-29T14:00:25.306+0100] GC(45) Tenured: 50532K(102400K)->53861K(102400K)
[2020-05-29T14:00:25.307+0100] GC(45) Metaspace: 40177K(40664K)->40177K(40664K)
[2020-05-29T14:00:25.307+0100] GC(45) Pause Young (Allocation Failure) 93M->54M(145M) 111.252ms
[2020-05-29T14:00:25.308+0100] GC(45) User=0.08s Sys=0.02s Real=0.11s
[2020-05-29T14:00:29.248+0100] GC(46) Pause Young (Allocation Failure)
[2020-05-29T14:00:29.404+0100] GC(46) DefNew: 43238K(46208K)->4318K(46208K) Eden: 41088K(41088K)->0K(41088K) From: 2150K(5120K)->4318K(5120K)
[2020-05-29T14:00:29.405+0100] GC(46) Tenured: 53861K(102400K)->53861K(102400K)
[2020-05-29T14:00:29.405+0100] GC(46) Metaspace: 40319K(40792K)->40319K(40792K)
[2020-05-29T14:00:29.406+0100] GC(46) Pause Young (Allocation Failure) 94M->56M(145M) 157.614ms
[2020-05-29T14:00:29.406+0100] GC(46) User=0.07s Sys=0.00s Real=0.16s
[2020-05-29T14:00:36.466+0100] GC(47) Pause Young (Allocation Failure)
[2020-05-29T14:00:36.661+0100] GC(47) DefNew: 45406K(46208K)->5120K(46208K) Eden: 41088K(41088K)->0K(41088K) From: 4318K(5120K)->5120K(5120K)
[2020-05-29T14:00:36.662+0100] GC(47) Tenured: 53861K(102400K)->55125K(102400K)
[2020-05-29T14:00:36.662+0100] GC(47) Metaspace: 40397K(40920K)->40397K(40920K)
[2020-05-29T14:00:36.663+0100] GC(47) Pause Young (Allocation Failure) 96M->58M(145M) 196.531ms
[2020-05-29T14:00:36.663+0100] GC(47) User=0.09s Sys=0.01s Real=0.19s
[2020-05-29T14:00:40.523+0100] GC(48) Pause Young (Allocation Failure)
[2020-05-29T14:00:40.653+0100] GC(48) DefNew: 44274K(46208K)->2300K(46208K) Eden: 39154K(41088K)->0K(41088K) From: 5120K(5120K)->2300K(5120K)
[2020-05-29T14:00:40.653+0100] GC(48) Tenured: 55125K(102400K)->59965K(102400K)
[2020-05-29T14:00:40.654+0100] GC(48) Metaspace: 40530K(41048K)->40530K(41048K)
[2020-05-29T14:00:40.654+0100] GC(48) Pause Young (Allocation Failure) 97M->60M(145M) 131.365ms
[2020-05-29T14:00:40.655+0100] GC(48) User=0.11s Sys=0.01s Real=0.14s
[2020-05-29T14:00:43.936+0100] GC(49) Pause Young (Allocation Failure)
[2020-05-29T14:00:44.100+0100] GC(49) DefNew: 43388K(46208K)->5120K(46208K) Eden: 41088K(41088K)->0K(41088K) From: 2300K(5120K)->5120K(5120K)
Updated with gc analysis done by gceasy
Okay so this is useful I uploaded log to gceasy.org and it clearly shows that shortly before it crashed heap size was significantly higher and approaching the 900mb limit,even after a number of full gcs, so I think basically it ran out of heap space.
What is a little frustrating is I have the
-XX:+HeapDumpOnOutOfMemoryError
option enabled, but when it crashes it reports an issue trying to do create the dump file so I cannot get one.
And when I process the same file on Windows with the same setting for heap size it suceeds without failure, But Im goinf to run again ewith gclogging enabled and see if it reaches simailr levels even if it doesnt actually fall over.
Ran again (this is building on chnages made in previous run and doesnt show start of run) but to me the memory usage is higher but looks quite normal (sawtooth pattern) with no particular differenc ebefore the crash.
Update
With last run I reduced max heap from 900MB to 600MB, but I also monitored with vmstat, Yo can see clearly below where the applciation crashed but It doesn't seem we were approaching particularly ow memory at this point.
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
3 0 0 57072 7812 1174128 0 0 5360 0 211 558 96 4 0 0 0
1 0 0 55220 7812 1176184 0 0 2048 0 203 467 79 21 0 0 0
3 0 0 61296 7812 1169096 0 0 2036 44 193 520 96 4 0 0 0
2 0 0 59808 7812 1171144 0 0 2048 32 212 522 96 4 0 0 0
1 0 0 59436 7812 1171144 0 0 0 0 180 307 83 17 0 0 0
1 0 0 59436 7812 1171144 0 0 0 0 179 173 100 0 0 0 0
1 0 0 59436 7812 1171128 0 0 0 0 179 184 100 0 0 0 0
2 1 0 51764 7816 1158452 0 0 4124 52 190 490 80 20 0 0 0
3 0 0 63428 7612 1146388 0 0 20472 48 251 533 86 14 0 0 0
2 0 0 63428 7616 1146412 0 0 4 0 196 508 99 1 0 0 0
2 0 0 84136 7616 1146400 0 0 0 0 186 461 84 16 0 0 0
2 0 0 61436 7608 1148960 0 0 24601 0 325 727 77 23 0 0 0
4 0 0 60196 7648 1150204 0 0 1160 76 232 611 98 2 0 0 0
4 0 0 59204 7656 1151052 0 0 52 376 305 570 80 20 0 0 0
3 0 0 59204 7656 1151052 0 0 0 0 378 433 96 4 0 0 0
1 0 0 762248 7768 1151420 0 0 106 0 253 660 74 26 0 0 0
0 0 0 859272 8188 1151892 0 0 417 0 302 550 9 26 64 1 0
0 0 0 859272 8188 1151892 0 0 0 0 111 132 0 0 100 0 0

Based on your jstat data and their explanation here: https://docs.oracle.com/en/java/javase/11/tools/jstat.html#GUID-5F72A7F9-5D5A-4486-8201-E1D1BA8ACCB5
I would not expect OutOfMemoryError just yet from the HeapSpace based on the slow and steady rate of the Old Generation filling up and the small size of the from and to space (not that I know whether your application might allocate a huge array anytime soon) unless:
initial heap size (-Xms) is smaller than the max (-Xmx) and
Linux has overcomitted virtual memory
If you do overcommit (and who doesn't) maybe you should keep an eye on Linux with vmstat 1 or gathering data frequently for sar
But I do wonder why you are not using Garbage Collection logging with -Xlog:gc*:stderr or to a file with -Xlog:gc*:file= and maybe analyze that with https://gceasy.io/ as it is very low overhead (unless writing to the logfile is slow) and very precise. For more information on the logging syntax see: https://openjdk.java.net/jeps/158 and https://openjdk.java.net/jeps/271
java -Xlog:gc*:stderr -jar yourapp.jar
and analyze those logs with great ease with tools like these:
https://gceasy.io/
JClarity Censum
This should give similar information as jstack and more in realtime (as far as I know)

I think you may already be on the wrong track:
It is more likely that your process has a general problem with allocating memory than that there are two different bugs in two different Java versions.
Have you already checked whether the process has enough memory? A segmentation fault can also occur when the process runs out of memory. I would also check the configuration of the swap file. Years ago I got inexplicable segfaults with Java 8 also somewhere in a resize or allocation method. In my case the size of the OS's swap file was set to zero.
What error do you see on top of the error log file? You only copied the information of the single thread.
UPDATE
You definitely do not have a problem with GC. If GC would be overloaded you would some when get an java.lang.OutOfMemoryError with the message:
GC Overhead limit exceeded
GC tries to collect garbage but it also has CPU constraints. Concrete behavior depends on the actual GC implementation but usually garbage will accumulate (see your big OldGen) before the GC uses more CPU cycles. So an increased heap usage is completely normal as long as you do not get the mentioned OOM error.
The segmentation faults in the native code are an indicator that there's something wrong with accessing native memory. You even get segmentation faults when the JVM tries to generate a dump. This is an additional indicator for a general problem with accessing native memory.
What's still unanswered is whether you really have enough native memory for all the processes running on your host.
Linux's overcommitment of memory usually triggers the OOM killer. But there are situations where the OOM killer is not triggered (see the kernel documentation for details). In such cases it is possible that a process may die with a SIGSEGV. Like other native applications also the JVM makes use of mmap. Also the man pages of mmap mention that depending on the used parameters a SIGSEGV may occur upon a write if no physical memory is available.

Who sends a SIGKILL to my process mysteriously on ubuntu server

UPDATES on Oct 25:
Now I found out what's causing the problem.
1) The child process kills itself, that's why strace/perf/auditctl cannot track it down.
2) The JNI call to create a process is triggered from a Java thread. When the thread eventually dies, it's also destroying the process that it creates.
3) In my code to fork and execve() a child process, I have the code to monitor parent process death and kill my child process with the following line: prctl( PR_SET_PDEATHSIG, SIGKILL ); My fault that I didn't pay special attention to this flag before b/c it's considered as a BEST PRACTICE for my other projects where child process is forked from the main thread.
4) If I comment out this line, the problem is gone. The original purpose is to kill the child process when the parent process is gone. Even w/o this flag, it's still the correct behavior. Seems like the ubuntu box default behavior.
5) Finally found it's a kernel bug, fixed in kernel version 3.4.0, my ubuntu box from AWS is kernel version 3.13.0-29-generic.
There are a couple of useful links to the issues:
a) http://www.linuxprogrammingblog.com/threads-and-fork-think-twice-before-using-them
b) prctl(PR_SET_PDEATHSIG, SIGNAL) is called on parent thread exit, not parent process exit.
c) https://bugzilla.kernel.org/show_bug.cgi?id=43300
UPDATES on Oct 15:
Thanks so much for all the suggestions. I am investigating from one area of the system to another area. It's hard 2 find a reason.
I am wondering 2 things.
1) why are powerful tools such as strace, auditctl and perf script not able to track down who caused the kill?
2) Is +++ killed by SIGKILL +++ really means its killed from signal?
ORIGINAL POST
I have a long running C process launched from a Java application server in Ubuntu 12 through the JNI interface. The reason I use JNI interface to start a process instead of through Java's process builder is b/c of the performance reasons. It's very inefficient for java process builder to do IPC especially b/c extra buffering introduces very long delay.
Periodically it is terminated by SIGKILL mysteriously. The way I found out is through strace, which says: "+++ killed by SIGKILL +++"
I checked the following:
It's not a crash.
It's not a OOM. Nothing in dmesg. My process uses only 3.3% of 1Gbytes of memory.
Java layer didn't kill the process. I put a log in the JNI code if the code terminates the process, but no log was written to indicate that.
It's not a permission issue. I tried to run as sudo or a different user, both cases causes the process to be killed.
If I run the process locally in a shell, everything works fine. What's more, in my C code for my long-running process, I ignore the signal SIGHUP. Only when it's running as a child process of Java server, it gets killed.
The process is very CPU intensive. It's using 30% of the CPU. There are lots of voluntary context switch and nonvoluntary_ctxt_switches.
(NEW UPDATE) One IMPORTANT thing very likely related to why my process is killed. If the process do some heavy lifting, it won't be killed, however, sometimes it's doing little CPU intensive work. When that happens, after a while, roughly 1 min, it is killed. It's status is always S(Sleeping) instead of R(Running). It seems that the OS decides to kill the process if it was idle most of the time, and not kill the process if it was busy.
I suspect Java's GC is the culprit, however, Java will NEVER garbage collect a singleton object associated with JNI. (My JNI object is tied to that singleton).
I am puzzled by the reason why it's terminated. Does anyone has a good suggestion how to track it down?
p.s.
On my ubuntu limit -a result is:
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 7862
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 65535
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 7862
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
I tried to increase the limits, and still does not solve the issue.
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) unlimited
max locked memory (kbytes, -l) unlimited
max memory size (kbytes, -m) unlimited
open files (-n) 65535
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) unlimited
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) unlimited
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
Here is proc status when I run cat /proc/$$$/status
Name: mimi_coso
State: S (Sleeping)
Tgid: 2557
Ngid: 0
Pid: 2557
PPid: 2229
TracerPid: 0
Uid: 0 0 0 0
Gid: 0 0 0 0
FDSize: 256
Groups: 0
VmPeak: 146840 kB
VmSize: 144252 kB
VmLck: 0 kB
VmPin: 0 kB
VmHWM: 36344 kB
VmRSS: 34792 kB
VmData: 45728 kB
VmStk: 136 kB
VmExe: 116 kB
VmLib: 23832 kB
VmPTE: 292 kB
VmSwap: 0 kB
Threads: 1
SigQ: 0/7862
SigPnd: 0000000000000000
ShdPnd: 0000000000000000
SigBlk: 0000000000000004
SigIgn: 0000000000011001
SigCgt: 00000001c00064ee
CapInh: 0000000000000000
CapPrm: 0000001fffffffff
CapEff: 0000001fffffffff
CapBnd: 0000001fffffffff
Seccomp: 0
Cpus_allowed: 7fff
Cpus_allowed_list: 0-14
Mems_allowed: 00000000,00000001
Mems_allowed_list: 0
voluntary_ctxt_switches: 16978
nonvoluntary_ctxt_switches: 52120
strace shows:
$ strace -p 22254 -s 80 -o /tmp/debug.lighttpd.txt
read(0, "SGI\0\1\0\0\0\1\0c\0\0\0\t\0\0T\1\2248\0\0\0\0'\1\0\0(\0\0"..., 512) = 113
read(0, "SGI\0\1\0\0\0\1\0\262\1\0\0\10\0\1\243\1\224L\0\0\0\0/\377\373\222D\231\214"..., 512) = 448
sendto(3, "<15>Oct 10 18:34:01 MixCoder[271"..., 107, MSG_NOSIGNAL, NULL, 0) = 107
write(1, "SGO\0\0\0\0 \272\1\0\0\t\0\1\253\1\243\273\0\0\0\0'\1\0\0\0\0\0\1\242"..., 454) = 454
sendto(3, "<15>Oct 10 18:34:01 MixCoder[271"..., 107, MSG_NOSIGNAL, NULL, 0) = 107
write(1, "SGO\0\0\0\0 \341\0\0\0\10\0\0\322\1\254Z\0\0\0\0/\377\373R\4\0\17\21!"..., 237) = 237
read(0, "SGI\0\1\0\0\0\1\0)\3\0\0\t\0\3\32\1\224`\0\0\0\0'\1\0\0\310\0\0"..., 512) = 512
read(0, "\344u\233\16\257\341\315\254\272\300\351\302\324\263\212\351\225\365\1\241\225\3+\276J\273\37R\234R\362z"..., 512) = 311
read(0, "SGI\0\1\0\0\0\1\0\262\1\0\0\10\0\1\243\1\224f\0\0\0\0/\377\373\222d[\210"..., 512) = 448
sendto(3, "<15>Oct 10 18:34:01 MixCoder[271"..., 107, MSG_NOSIGNAL, NULL, 0) = 107
write(1, "SGO\0\0\0\0 %!\0\0\t\0\0+\1\243\335\0\0\0\0\27\0\0\0\0\1B\300\36"..., 8497) = 8497
sendto(3, "<15>Oct 10 18:34:01 MixCoder[271"..., 107, MSG_NOSIGNAL, NULL, 0) = 107
write(1, "SGO\0\0\0\0 \341\0\0\0\10\0\0\322\1\254t\0\0\0\0/\377\373R\4\0\17\301\31"..., 237) = 237
read(0, "SGI\0\1\0\0\0\1\0\262\1\0\0\10\0\1\243\1\224\200\0\0\0\0/\377\373\222d/\200"..., 512) = 448
sendto(3, "<15>Oct 10 18:34:01 MixCoder[271"..., 107, MSG_NOSIGNAL, NULL, 0) = 107
write(1, "SGO\0\0\0\0 \341\0\0\0\10\0\0\322\1\254\216\0\0\0\0/\377\373R\4\0\17\361+"..., 237) = 237
read(0, "SGI\0\1\0\0\0\1\0\221\0\0\0\t\0\0\202\1\224\210\0\0\0\0'\1\0\0P\0\0"..., 512) = 159
read(0, unfinished ...)
+++ killed by SIGKILL +++

Assuming that you have root access on your machine, you can enable audit on kill(2) syscall to gather such information.
root # auditctl -a exit,always -F arch=b64 -S kill -F a1=9
root # auditctl -l
LIST_RULES: exit,always arch=3221225534 (0xc000003e) a1=9 (0x9) syscall=kill
root # sleep 99999 &
[2] 11688
root # kill -9 11688
root # ausearch -sc kill
time->Tue Oct 14 00:38:44 2014
type=OBJ_PID msg=audit(1413272324.413:441376): opid=11688 oauid=52872 ouid=0 oses=20 ocomm="sleep"
type=SYSCALL msg=audit(1413272324.413:441376): arch=c000003e syscall=62 success=yes exit=0 a0=2da8 a1=9 a2=0 a3=0 items=0 ppid=6107 pid=6108 auid=52872 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsg
id=0 tty=pts2 ses=20 comm="bash" exe="/bin/bash" key=(null)
The other way is to set up kernel tracing which may be an over-kill when audit system can do same work.

Finally I figured out the reason why.
The child process kills itself and it's a linux kernel bug.
Details:
1) The child process kills itself, that's why strace/perf/auditctl cannot track it down.
2) The JNI call to create a process is triggered from a Java thread. When the thread eventually dies, it's also destroying the process that it creates.
3) In my code to fork and execve() a child process, I have the code to monitor parent process death and kill my child process with the following line: prctl( PR_SET_PDEATHSIG, SIGKILL ); I didn't pay special attention to this flag before b/c it's considered as a BEST PRACTICE for my other projects where child process is forked from the main thread.
4) If I comment out this line, the problem is gone. The original purpose is to kill the child process when the parent process is gone. Even w/o this flag, it's still the correct behavior. Seems like the ubuntu box default behavior.
5) From this article, https://bugzilla.kernel.org/show_bug.cgi?id=43300. it's a kernel bug, fixed in kernel version 3.4.0, my ubuntu box from AWS is kernel version 3.13.0-29-generic.
My machine configuration:
===>Ubuntu 14.04 LTS
===>3.13.0-29-generic
Some useful links to the issues:
a) http://www.linuxprogrammingblog.com/threads-and-fork-think-twice-before-using-them
b) prctl(PR_SET_PDEATHSIG, SIGNAL) is called on parent thread exit, not parent process exit
c) https://bugzilla.kernel.org/show_bug.cgi?id=43300

As I mentioned earlier, the other choice is to use kernel trace, which can be done by perf tool.
# apt-get install linux-tools-3.13.0-35-generic
# perf list | grep kill
syscalls:sys_enter_kill [Tracepoint event]
syscalls:sys_exit_kill [Tracepoint event]
syscalls:sys_enter_tgkill [Tracepoint event]
syscalls:sys_exit_tgkill [Tracepoint event]
syscalls:sys_enter_tkill [Tracepoint event]
syscalls:sys_exit_tkill [Tracepoint event]
# perf record -a -e syscalls:sys_enter_kill sleep 10
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 1.634 MB perf.data (~71381 samples) ]
// Open a new shell to kill.
$ sleep 9999 &
[1] 2387
$ kill -9 2387
[1]+ Killed sleep 9999
$ echo $$
9014
// Dump the trace in your original shell.
# perf script
...
bash 9014 [001] 1890350.544971: syscalls:sys_enter_kill: pid: 0x00000953, sig: 0x00000009

Troubleshooting unbounded Java Resident Set Size(RSS) growth

I have a standalone Java application which has:
-Xmx1024m -Xms1024m -XX:MaxPermSize=256m -XX:PermSize=256m
Over the course of time it hogs more and more memory, starts to swap(and slow down) and eventually died a number of times(not OOM+dump, just died, nothing on /var/log/messages).
What I've tried so far:
Heap dumps: live objects take 200-300Mb out of 1G heap --> ok with heap
Number of live threads is rather constant(~60-70) --> ok with thread stacks
JMX stops answering at some point(mb it answers but timeout is lower)
Turn off swap - it dies faster
strace - seems everything slows down a bit, app still haven't died, and not sure for which things look there
Checking top: VIRT grows to 5.5Gb, RSS to 3.7 Gb
Checking vmstat(obviously we start to swap):
--------------------------procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
Sun Jul 22 16:10:26 2012: r b swpd free buff cache si so bi bo in cs us sy id wa st
Sun Jul 22 16:48:41 2012: 0 0 138652 2502504 40360 706592 1 0 169 21 1047 206 20 1 74 4 0
. . .
Sun Jul 22 18:10:59 2012: 0 0 138648 24816 58600 1609212 0 0 124 669 913 24436 43 22 34 2 0
Sun Jul 22 19:10:22 2012: 33 1 138644 33304 4960 1107480 0 0 100 536 810 19536 44 22 23 10 0
Sun Jul 22 20:10:28 2012: 54 1 213916 26928 2864 578832 3 360 100 710 639 12702 43 16 30 11 0
Sun Jul 22 21:10:43 2012: 0 0 629256 26116 2992 467808 84 176 278 1320 1293 24243 50 19 29 3 0
Sun Jul 22 22:10:55 2012: 4 0 772168 29136 1240 165900 203 94 435 1188 1278 21851 48 16 33 2 0
Sun Jul 22 23:10:57 2012: 0 1 2429536 26280 1880 169816 6875 6471 7081 6878 2146 8447 18 37 1 45 0
sar also shows steady system% growth = swapping:
15:40:02 CPU %user %nice %system %iowait %steal %idle
17:40:01 all 51.00 0.00 7.81 3.04 0.00 38.15
19:40:01 all 48.43 0.00 18.89 2.07 0.00 30.60
20:40:01 all 43.93 0.00 15.84 5.54 0.00 34.70
21:40:01 all 46.14 0.00 15.44 6.57 0.00 31.85
22:40:01 all 44.25 0.00 20.94 5.43 0.00 29.39
23:40:01 all 18.24 0.00 52.13 21.17 0.00 8.46
12:40:02 all 22.03 0.00 41.70 15.46 0.00 20.81
Checking pmap gaves the following largest contributors:
000000005416c000 1505760K rwx-- [ anon ]
00000000b0000000 1310720K rwx-- [ anon ]
00002aaab9001000 2079748K rwx-- [ anon ]
Trying to correlate addresses I've got from pmap from stuff dumped by strace gave me no matches
Adding more memory is not practical(just make problem appear later)
Switching JVM's is not possible(env is not under our control)
And the question is:
What else can I try to track down the problem's cause or try to work around it?

Something in your JVM is using an "unbounded" amount of non-Heap memory. Some possible candidates are:
Thread stacks.
Native heap allocated by some native code library.
Memory-mapped files.
The first possibility will show up as a large (and increasing) number of threads when you take a thread stack dump. (Just check it ... OK?)
The second one you can (probably) eliminate if your application (or some 3rd part library it uses) doesn't use any native libraries.
The third one you can eliminate if your application (or some 3rd part library it uses) doesn't use memory mapped files.
I would guess that the reason that you are not seeing OOME's is that your JVM is being killed by the Linux OOM killer. It is also possible that the JVM is bailing out in native code (e.g. due to a malloc failure not being handled properly), but I'd have thought that a JVM crash dump would be the more likely outcome ...

Problem was in a profiler library attached - it recorded CPU calls/allocation sites, thus required memory to store that.
So, human factor here :)

There is a known problem with Java and glibc >= 2.10 (includes Ubuntu >= 10.04, RHEL >= 6).
The cure is to set this env. variable:
export MALLOC_ARENA_MAX=4
There is an IBM article about setting MALLOC_ARENA_MAX
https://www.ibm.com/developerworks/community/blogs/kevgrig/entry/linux_glibc_2_10_rhel_6_malloc_may_show_excessive_virtual_memory_usage?lang=en
This blog post says
resident memory has been known to creep in a manner similar to a
memory leak or memory fragmentation.
search for MALLOC_ARENA_MAX on Google or SO for more references.
You might want to tune also other malloc options to optimize for low fragmentation of allocated memory:
# tune glibc memory allocation, optimize for low fragmentation
# limit the number of arenas
export MALLOC_ARENA_MAX=2
# disable dynamic mmap threshold, see M_MMAP_THRESHOLD in "man mallopt"
export MALLOC_MMAP_THRESHOLD_=131072
export MALLOC_TRIM_THRESHOLD_=131072
export MALLOC_TOP_PAD_=131072
export MALLOC_MMAP_MAX_=65536

Why does 64 bit JVM throw Out Of Memory before xmx is reached?

I am wrestling with large memory requirements for a java app.
In order to address more memory I have switch to a 64 bit JVM and am using a large xmx.
However, when the xmx is above 2GB the app seems to run out of memory earlier than expected.
When running with an xmx of 2400M and looking at GC info from -verbosegc I get...
[Full GC 2058514K->2058429K(2065024K), 0.6449874 secs]
...and then it throws an out of memory exception. I would expect it to increase the heap above 2065024K before running out of memory.
In a trivial example i have a test program that allocates memory in a loop and prints out information from Runtime.getRuntime().maxMemory() and Runtime.getRuntime().totalMemory() until it eventually runs out of memory.
Running this over a range of xmx values it appears that Runtime.getRuntime().maxMemory() reports about 10% less than xmx and that total memory will not grow beyond 90% of Runtime.getRuntime().maxMemory().
I am using the following 64bit jvm:
java version "1.6.0_26"
Java(TM) SE Runtime Environment (build 1.6.0_26-b03)
Java HotSpot(TM) 64-Bit Server VM (build 20.1-b02, mixed mode)
Here is the code:
import java.util.ArrayList;
public class XmxTester {
private static String xmxStr;
private long maxMem;
private long usedMem;
private long totalMemAllocated;
private long freeMem;
private ArrayList list;
/**
* #param args
*/
public static void main(String[] args) {
xmxStr = args[0];
XmxTester xmxtester = new XmxTester();
}
public XmxTester() {
byte[] mem = new byte[(1024 * 1024 * 50)];
list = new ArrayList();
while (true) {
printMemory();
eatMemory();
}
}
private void eatMemory() {
// TODO Auto-generated method stub
byte[] mem = null;
try {
mem = new byte[(1024 * 1024)];
} catch (Throwable e) {
System.out.println(xmxStr + "," + ConvertMB(maxMem) + ","
+ ConvertMB(totalMemAllocated) + "," + ConvertMB(usedMem)
+ "," + ConvertMB(freeMem));
System.exit(0);
}
list.add(mem);
}
private void printMemory() {
maxMem = Runtime.getRuntime().maxMemory();
freeMem = Runtime.getRuntime().freeMemory();
totalMemAllocated = Runtime.getRuntime().totalMemory();
usedMem = totalMemAllocated - freeMem;
}
double ConvertMB(long bytes) {
int CONVERSION_VALUE = 1024;
return Math.round((bytes / Math.pow(CONVERSION_VALUE, 2)));
}
}
I use this batch file to run it over multiple xmx settings. Its includes references to a 32 bit JVM, I wanted a comparison to a 32bit jvm - obviously this call fails as soon as xmx is larger than about 1500M
#echo off
set java64=<location of 64bit JVM>
set java32=<location of 32bit JVM>
set xmxval=64
:start
SET /a xmxval = %xmxval% + 64
%java64% -Xmx%xmxval%m -XX:+UseCompressedOops -XX:+DisableExplicitGC XmxTester %xmxval%
%java32% -Xms28m -Xmx%xmxval%m XmxTester %xmxval%
if %xmxval% == 4500 goto end
goto start
:end
pause
This spits out a csv which when put into excel looks like this (apologies for my poor formatting here)
32 bit
XMX max mem total mem free mem %of xmx used before out of mem exception
128 127 127 125 2 98.4%
192 191 191 189 1 99.0%
256 254 254 252 2 99.2%
320 318 318 316 1 99.4%
384 381 381 379 2 99.5%
448 445 445 443 1 99.6%
512 508 508 506 2 99.6%
576 572 572 570 1 99.7%
640 635 635 633 2 99.7%
704 699 699 697 1 99.7%
768 762 762 760 2 99.7%
832 826 826 824 1 99.8%
896 889 889 887 2 99.8%
960 953 953 952 0 99.9%
1024 1016 1016 1014 2 99.8%
1088 1080 1080 1079 1 99.9%
1152 1143 1143 1141 2 99.8%
1216 1207 1207 1205 2 99.8%
1280 1270 1270 1268 2 99.8%
1344 1334 1334 1332 2 99.9%
64 bit
128 122 122 116 6 90.6%
192 187 187 180 6 93.8%
256 238 238 232 6 90.6%
320 285 281 275 6 85.9%
384 365 365 359 6 93.5%
448 409 409 402 6 89.7%
512 455 451 445 6 86.9%
576 512 496 489 7 84.9%
640 595 595 565 30 88.3%
704 659 659 629 30 89.3%
768 683 682 676 6 88.0%
832 740 728 722 6 86.8%
896 797 772 766 6 85.5%
960 853 832 825 6 85.9%
1024 910 867 860 7 84.0%
1088 967 916 909 6 83.5%
1152 1060 1060 1013 47 87.9%
1216 1115 1115 1068 47 87.8%
1280 1143 1143 1137 6 88.8%
1344 1195 1174 1167 7 86.8%
1408 1252 1226 1220 6 86.6%
1472 1309 1265 1259 6 85.5%
1536 1365 1317 1261 56 82.1%
1600 1422 1325 1318 7 82.4%
1664 1479 1392 1386 6 83.3%
1728 1536 1422 1415 7 81.9%
1792 1593 1455 1448 6 80.8%
1856 1650 1579 1573 6 84.8%
1920 1707 1565 1558 7 81.1%
1984 1764 1715 1649 66 83.1%
2048 1821 1773 1708 65 83.4%
2112 1877 1776 1769 7 83.8%
2176 1934 1842 1776 66 81.6%
2240 1991 1899 1833 65 81.8%
2304 2048 1876 1870 6 81.2%
2368 2105 1961 1955 6 82.6%
2432 2162 2006 2000 6 82.2%

Why does it happen?
Basically, there are two strategies that the JVM / GC can use to decide when to give up and throw an OOME.
It can keep going and going until there is simply not enough memory after garbage collection to allocate the next object.
It can keep going until the JVM is spending more than a given percentage of time running the garbage collector.
The first approach has the problem that for a typical application the JVM will spend a larger and larger percentage of its time running the GC, in an ultimately futile effort to complete the task.
The second approach has the problem that it might give up too soon.
The actual behaviour of the GC in this area is governed by JVM options (-XX:...). Apparently, the default behaviour differs between 32 and 64 bit JVMs. This kind of makes sense, because (intuitively) the "out of memory death spiral" effect for a 64 bit JVM will last longer and be more pronounced.
My advice would be to leave this issue alone. Unless you really need to fill every last byte of memory with stuff it is better for the JVM to die early and avoid wasting lots of time. You can then restart it with more memory and get the job done.
Clearly, your benchmark is atypical. Most real programs simply don't try to grab all of the heap. It is possible that your application is atypical too. But it is also possible that your application is suffering from a memory leak. If that is the case, you should be investigating the leak rather than trying to figure out why you can't use all of memory.
However my issue is mainly with why it does not honor my xmx setting.
It is honoring it! The -Xmx is the upper limit on the heap size, not the criterion for deciding when to give up.
I have set an XMX of 2432M but asking the JVM to return its understanding of max memory returns 2162M.
It is returning the max memory that it has used, not the max memory it is allowed to use.
Why does it 'think' the max memory is 11% less than the xmx?
See above.
Furthermore why when the heap hits 2006M does it not extend the heap to at least 2162 ?
I presume that it is because the JVM has hit the "too much time spent garbage collecting" threshold.
Does this mean in 64 bit JVMs one should fudge the XMX setting to be 11% higher than the intended maximum ?
Not in general. The fudge factor depends on your application. For instance, an application with a larger rate of object churn (i.e. more objects created and discarded per unit of useful work) is likely to die with an OOME sooner.
I can predict the requirments based on db size and have a wrapper that adjusts xmx, howeveri have the 11% problem whereby my montioring suggests the app needs 2 GB, so I set a 2.4GB xmx. however instead of having an expected 400MB of 'headroom' the jvm only allows the heap to grow to 2006M.
IMO, the solution is to simply add an extra 20% (or more) on top of what you are currently adding. Assuming that you have enough physical memory, giving the JVM a larger heap is going to reduce overall GC overheads and make your application run faster.
The other tricks that you could try is to set -Xmx and -Xms to the same value and adjusting the tuning parameter that sets the maximum "time spent garbage collecting" ratio.

Does anyone here know a good, cross-platform way to get the process list?

Okay, I got into a conversation with a friend about Ada (I'm the local proponent here), and in his project he's having a pain trying to get Java (using JNI) to get the applications running on the client machine (only Windows, Mac, and Linux) to get a listing of applications.
I'm not familiar with Macs at all, and my Linux experience is mostly user-end within academia.
So, my question is this: does anyone know a good cross-platform way to get the process-list?
My solution would be to use a package spec with a general function returning the list in the manner the Java expects it and throw together three different bodies for each of the platforms which would get the process-list according to that system and compile the (resultant) three binaries for those targets individually.
Is there a [good] way to do it w/o resorting to three different versions?
(This is an Ada question, but Java solutions are welcome.)

Java has no cross-platform API to list running processes. ProcessBuilder may be used to excecute the ps command, as shown here and here. The (rough) equivalent in Ada would be GNAT.Os_Lib.Spawn in GNAT. I'm not sure about other implementations.

JavaSysMon can provide a list of running processes (as well as other system information) in a platform-independent manner. Currently it supports Mac OS X, Linux, Windows, and Solaris. As an added bonus, it is BSD licensed.
Wiki
JavaDocs

You were almost at the Ada solution. As you only want 1 procedure to execute & look at the system call (top/ps in linux/unix) i would suggest a separate procedure. This will live in its own directory, and only be referenced by the correct compilation (per os). As for the actual commands per os, that is not part of my answer.

Do you just mean to get a list of running processes?
If so, you can just Google the commands to get this (1) the name of the OS on which the program is running, then (2) run Runtime.getRuntime.exec(stringCommandToGetProcessList); to based on #1, and output the results.
You don't need a different Java binary for every OS. You only need one. Just Google the command to find the OS name/version, and the command to get the list of running applications.
You also don't need JNI to do this. Use the Runtime class to run commands as if they were on the command-line.
There's no cross-platform way to do it, because the commands are different on each OS. But since there's only three major OS's (maybe a dozen total that you want to support, in some crazy extreme example), then it's just a matter of making a list of the 12 different commands to do this.
On Macs, and many Linux versions, OS name/version:
$ uname -a
Darwin normalocity 10.8.0 Darwin Kernel Version 10.8.0: Tue Jun 7 16:32:41 PDT 2011; root:xnu-1504.15.3~1/RELEASE_X86_64 x86_64
Running processes (by highest usage):
$ top
Processes: 92 total, 5 running, 87 sleeping, 408 threads 20:38:35
Load Avg: 0.18, 0.20, 0.17 CPU usage: 7.26% user, 1.95% sys, 90.78% idle
SharedLibs: 6272K resident, 7300K data, 0B linkedit. MemRegions: 12204 total, 730M resident, 29M private, 393M shared.
PhysMem: 1076M wired, 1184M active, 1859M inactive, 4119M used, 4062M free.
VM: 207G vsize, 1041M framework vsize, 1851231(0) pageins, 603(0) pageouts.
Networks: packets: 1727104/1746M in, 984226/269M out. Disks: 295257/6745M read, 397634/15G written.
PID COMMAND %CPU TIME #TH #WQ #PORT #MRE RPRVT RSHRD RSIZE VPRVT VSIZE PGRP PPID STATE UID
12547 top 3.5 00:00.26 1/1 0 24 34 1208K 264K 1784K 17M 2378M 12547 12217 running 0
12217 bash 0.0 00:00.08 1 0 17 25 1328K 856K 1988K 17M 2378M 12217 12211 sleeping 502
12212 bash 0.0 00:00.08 1 0 17 25 1276K 856K 1980K 9688K 2378M 12212 12200 sleeping 502
12211 login 0.0 00:00.01 1 0 22 54 512K 312K 1648K 11M 2379M 12211 12196 sleeping 0
12202 bash 0.0 00:00.07 1 0 17 25 1276K 856K 1980K 9688K 2378M 12202 12199 sleeping 502
12201 bash 0.0 00:00.07 1 0 17 25 1276K 856K 1980K 9688K 2378M 12201 12198 sleeping 502
12200 login 0.0 00:00.01 1 0 22 54 512K 312K 1648K 11M 2379M 12200 12196 sleeping 0
12199 login 0.0 00:00.01 1 0 22 54 512K 312K 1648K 11M 2379M 12199 12196 sleeping 0
12198 login 0.0 00:00.01 1 0 22 54 512K 312K 1648K 11M 2379M 12198 12196 sleeping 0
12196 Terminal 33.9 00:01.84 5 1 114- 137 5736K+ 32M 23M+ 90M 2768M 12196 300 sleeping 502
11803- Google Chrom 0.0 04:06.79 7 1 99 365 45M 84M 79M 112M 1199M 11788 11788 sleeping 502
11800- Google Chrom 0.0 00:00.25 7 1 98 215 9632K 77M 23M 110M 1090M 11788 11788 sleeping 502
11799- Google Chrom 0.0 00:07.92 7 1 99 288 25M 82M 43M 109M 1108M 11788 11788 sleeping 502
11797- Google Chrom 0.0 00:01.49 7 1 99 316 27M 81M 48M 111M 1109M 11788 11788 sleeping 502
11796- Google Chrom 0.0 00:00.44 4 1 91 115 2824K 65M 8304K 96M 1012M 11788 11788 sleeping 502
11795- Google Chrom 0.0 00:00.96 7 1 98 215 9172K 77M 23M 111M 1091M 11788 11788 sleeping 502
11794- Google Chrom 0.0 00:07.64 8 1 100 294 20M 75M 36M 113M 1101M 11788 11788 sleeping 502
11793- Google Chrom 0.0 00:01.42 8 1 95 185 9732K 73M 24M 104M 1057M 11788 11788 sleeping 502
11788- Google Chrom 0.6 04:04.31 30 1 307 390 61M 110M 96M 254M 1298M 11788 300 sleeping 502
4328 ssh-agent 0.0 00:00.19 2 1 33 63 1300K 396K 2688K 59M 2420M 4328 300 sleeping 502
3855- Microsoft Of 0.0 00:36.14 4 1 121 337 12M 30M 22M 93M 1027M 3855 300 sleeping 502
492 AppleSpell 0.0 00:10.56 2 1 34 72 4608K 9028K 10M 88M 2469M 492 300 sleeping 502

Ada doesn't really have the concept of "processes" within the language. In fact, Ada code can run on platforms that do not support heavy processes at all (eg: Many smallish embedded platforms, like vxWorks).
That means you are going to have to use some kind of API (most likely supplied by your OS) to get that information.
If your OS supports POSIX, you may be able to use Posix bindings like Florist to get that info. There are full Unix subsystems available for Windows (Cygwin) and I believe MacOS is built on a flavor of Unix. So it might be possible to use Unix as sort of a lingua-franca so you can get your process info from a single (POSIX) API.
Now where Java is concerned, there are two issues: The Java language and the Java Platform (JVM). Java language fans like to conflate the two, but there are actually Ada compilers that target the JVM, and they can call all the same JVM API's that code written in the Java language can call. If there's one that allows Java programs to get a list of all the threads or processes that the JVM knows about, you could call that same routine from Ada too (if it is running under the JVM as well).

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.