Foolproof detection of OutOfmemory in Java

Foolproof detection of OutOfmemory in Java - java

TL;DR: Is there a foolproof (!) way I can detect from my master JVM that my slave JVM spawned via 2 intermediate scripts has experienced an OutOfMemory error on Linux?
Long version:
I'm running some sort of application launcher. Basically it receives some input and reacts by spawning a slave Java application to process said input. This happens via a python script (to correctly handle remote kill commands for it) which in turn calls a bash script (generated by Gradle and sets up the classpath) to actually spawn the slave.
The slave contains a worker thread and a monitor thread to make callbacks to a remote host for status updates. If status updates fail to occur for a set amount of time, the slave gets killed by the launcher. The reason for it not responding CAN be an OutOfMemoryError, however it can also be other reasons. I need to differentiate an OutOfMemoryError of the slave from some other error which caused it to stop working.
I don't just want to monitor memory usage and say once it reaches like 90% "ok that's enough". It may very well be that the GC succeeds in cleaning up sufficiently for the workload to finish. I only want to know if it failed to clean up and the JVM died because not enough memory could be freed.
What I have tried:
Use the -XX:OnOutOfMemory flag as a JVM option for the slave which calls a script which in turn creates an empty flag file. I then check with the launcher for the existence of the flag file if the slave died. Worked like a charm on Windows, did not work at all on Unix because there is a funky bug which causes the execution of the flag call to require the exact same amount of Xmx the slave has used. See https://bugs.openjdk.java.net/browse/JDK-8027434 for the bug. => Solution discarded because the slave needs the entire memory of the machine.
try{ longWork(); } catch (OutOfMemoryError e) { createOomFlagFile(); System.exit(100); } This does work in some cases. However there are also cases where this does not happen and the monitor thread simply stops sending status updates. No exception occurs, no OOM flag file gets created. I know from SSHing onto the machine though that Java is eating all the memory available on the system and the whole system is slow.
Is there some (elegant) foolproof way to detect this which I am missing?

You shouldn't wait for the OutOfMemory. My suggestion is, that you track memory consumption from the master application via Java Management Beans and issue warnings when memory consumption gets critical. I never did that on my own before, so I cannot get more precisely on how to do that, but maybe you find out or some others here can provide a solution.
Edit: this is the respective MXBean http://docs.oracle.com/javase/7/docs/api/java/lang/management/MemoryMXBean.html

Related

java process finishes after some time with no particular reason

I have a java .jar file that i launch on an AWS instance in detached mode. So when i exit the ssh session, it still runs.
The app does some network stuff, and is expected to run for days until it finishes it task.
I have made logs all over the app, made log in the end of main method. I also made a global try/catch and added logging to the catch section.
Still, after some days i enter into ssh and see that the app just stopped running. No exceptions, main method did not complete because the log in the end did not trigger. It seems that the process was just killed in the middle of working. Sometimes it works for 5 hours, sometimes for 3-4 days without stopping.
I have no idea what could be the cause of this. I expect the java process to run until it finished, or until it crashes. Am i missing something?
upd:
it is an aws t2.micro, i think, the free tier one. It runs ubuntu 18.04.3 LTS

You need to monitor the server and application. The first thing to look at is your instance cloudwatch statistics for any CPU or memory spikes. If you find one, you know what you need to fix if you want to run your application on micro instance. For further reading
Monitoring Your Instances Using CloudWatch
Alternatively, you can collect and dump the java process statistics regularly when you are running the application. This can give insight of how heap,stack and cpu usage. Check this SO post for further details :
How do I monitor the computer's CPU, memory, and disk usage in Java?

Does wildfly kills another wildfly?

We experienced a (at least in our eyes) strange problem:
We have two Wildfly 8.1 installations on the same linux machine (CentOS 6.6) running the same applications in different versions and listining to different ports.
Now, we discovered that all of a sudden, when starting one of them, the other one got killed. We then discovered that the amount of free memory was low due to other leaking processes. When we killed those, the two wildlflys were running both correctly again.
Since I don't think that linux itself decided to kill another random process, I assume that JBoss has either some sort of mechanism to free memory by killing something which it assumes is not longer needed or that there are (maybe by wrong configuration) resources used by both of them leading to one of them getting killed when not being able to obtain it.
Did anyone experience something similar or know of a mechanism of that sort?

Most probably it was the linux OOM Killer.
You can verify if one of the servers was killed by it by checking the logfiles:
grep -i kill /var/log/messages*
And if it was you shoud see something like:
host kernel: Out of Memory: Killed process 2592
The OOM killer uses the following algorithm when determining which process to kill:
The function select_bad_process() is responsible for choosing a process to kill. It decides by stepping through each running task and calculating how suitable it is for killing with the function badness(). The badness is calculated as follows, note that the square roots are integer approximations calculated with int_sqrt();
badness_for_task = total_vm_for_task / (sqrt(cpu_time_in_seconds) *
sqrt(sqrt(cpu_time_in_minutes)))
This has been chosen to select a process that is using a large amount of memory but is not that long lived. Processes which have been running a long time are unlikely to be the cause of memory shortage so this calculation is likely to select a process that uses a lot of memory but has not been running long.
You can manually see the badness of each process by reading the oom_score file in the process directory in /proc
cat /proc/10292/oom_score

Thread starts and fails to stops with Tomcat. What's happening?

i have a java multi-threaded program that is running. i am running it on a tomcat server. when the threads are still running, some executing tasks, some still waiting for some thing to return and all kinds of things, assume i stop the server all of a sudden in this scenario.. when i do i get a warning on the tomcat terminal saying a thread named x is still running and the server is being stopped so this might lead to a memory leakage. what is the OS actually trying to tell me here? can someone help me understand this?? i am running this program on my system several times and i have stopped the server abruptly 3 times and i have seen this message when ever i do that. have i runined my server? (i mean my system). did i do something very dangerous????
please help.
Thanks in advance!

when i do i get a warning on the tomcat terminal saying a thread named x is still running and the server is being stopped so this might lead to a memory leakage. what is the OS actually trying to tell me here?
Tomcat (not the OS) is surmising from this extra thread that some part of your code forked a thread that may not be properly cleaning itself up. It is thinking that maybe this thread is forked more than once and if your process runs for a long time, it could fill up usable memory which would cause the JVM to lock up or at least get very slow.
have i ruined my server? (i mean my system). did i do something very dangerous????
No, no. This is about the tomcat process itself. It is worried that this memory leak may stop its ability to do its job as software -- nothing more. Unless you see more than one thread or until you see memory problems with your server (use jconsole for this) then I would only take it as a warning and a caution.

It sounds like your web server is forking processes which are not terminated when you stop the server. Those could lead to a memory leak because they represent processes that will never die unless you reboot or manually terminate them with the kill command.
I doubt that you will permanently damage your system, unless those orphaned processes are doing something bad, but that would be unrelated to you stopping the server. You should probably do something like ps aux | grep tomcat to find the leftover processes and do the following
Kill them so they don't take up more system resoures.
Figure out why they are persisting when the server is stopped. This sounds like a misbehaving server.

how do I know what's stopping a spring webapp from shutdown

we've got a slightly grown spring webapp (on tomcat 7) that is very slow in shutdown. (which has negative impacts on the performance of our continous delivery)
My suspicion is, that there must be some bean that is blocking (or taking very long) in it's #PreDestroy method.
So far I've ensured that it's not related to a thread(pool) that is not shut down correctly by giving distinct names to every pool, thread and timer and ensuring that they are either daemon threads or being shut down correctly.
Has anybody every solved a situation like this and can give me a hint on how to cope with this?
BTW: killing the tomcat process is not an option - we really need a clean shutdown for our production system.

Profiling would be the nuclear option. It's probably easy to get a picture of what's happening (especially if it is just blocked threads since that state will be long lived) just using thread dumps. If you take 2 dumps separated by a few seconds and they show the same or similar output for one or more threads then that is probably the bottleneck. You can get a thread dump using jstack or "kill -3" (on a sensible operating system).

and if you're on Windows, then selecting the java console window, and hitting ctrl + pause will dump to that window - just hit 'enter' to resume execution

Java exit a program without quitting JVM

I want to exit a java process and free all the resources before it finishes its normal running, if a certain condition is meet. I dont however want to quit JVM, as I have other java programs running at the same time. Does return; do the above, or is there a better way to do it?
Thanks.

There is one JVM process per running Java application. If you exit that application, the process's JVM gets shut down. However, this does not affect other Java processes.

You need to understand the JVM mechanism and clarify the terminology.
Let's use the following as datum for the terminology.
Threads are divisions of concurrently processed flows within a process.
A process is an OS level thread. The OS manages the processes. A process is terminated by sending a termination signal to the OS management. The signal may be sent by the process itself or by another process that has the applicable privilege.
Within a process, you can create process level threads. Process level threads are normally facilitated by the process management of the OS, but they are initiated by the process and terminated by the process. Therefore, process level threads are not the same as processes.
An application is a collection of systems, programs and/or threads that cooperate in various forms. A program or process within an application may terminate without terminating the whole application.
Within the context of JVM terminology, program may be one of the following.
A program is run per JVM process. Each program consumes one JVM process and is invoked by supplying the classpath of java bytecode and specifying the main entry point found in the classpath. When you terminate a java program, the whole jvm process that ran that program also terminates.
A program is run per process level thread. For example, an application run within a tomcat or JEE server is run as a thread within the JEE process. The JEE process is itself a program consuming one JVM process. When you terminate an application program, the JEE process does not terminate.
You may initiate process level threads within a java program. You may write code that terminates a thread but that would not terminate the process (unless it is the last and only running thread in the process). The JVM garbage collection would take care of freeing of resources and you do not need to free resources yourself after a process level thread is terminated.
The above response is simplified for comprehension. Please read up on OS design and threading to facilitate a better understanding of processes and the JVM mechanism.

If the other threads running concurrently are not daemon threads, leaving main will not terminate the VM. The other threads will continue running.
I completely missed the point though.
If you start each program in a separate JVM, calling System.exit() in one of them will not influence the others, they're entirely different processes.
If you're starting them through a single script or something, depending on how it is written, something else could be killing the other processes. Without precise information about how you start these apps, there's really no telling what is going on.

#aix's answer is probably apropos to your question. Each time you run the java command (or the equivalent) you get a different JVM instance. Calling System.exit() in one JVM instance won't cause other JVM instances to exit. (Try it and see!)
It is possible to create a framework in which you do run multiple programs within the same JVM. Indeed this is effectively what you do when you run a "bean shell". The same sort of thing happens when your "programs" are services (or webapps, or whatever you call them) running in some application server framework.
The bad news is that if you do this kind of thing, there is no entirely reliable way make an individual "program" go away. In particular, if the program is not designed to be cooperative (e.g. if it doesn't check for interrupts), you will have to resort to the DEPRECATED Thread.stop() method and friends. And those methods can have nasty consequences for the JVM and the other programs running in it.
In theory, the solution to that problem is to use Isolates. Unfortunately, I don't think that any mainstream JVMs support Isolates.

Some common usecases leading these kind of requirements can be solved through tools like Nailgun, or Drip.
Nailgun allows you to run what appears to be multiple independent executions of a commandline program, but they all happen in the same JVM. Therefore repeated JVM start-up time does not have to be endured. If these execution interact with global state, then the JVM will get polluted in time and things start to break up.
Drip will use a new JVM for each execution, but it always keeps a precreated JVM with the correct classpath and options ready. This is less performant, but it can guarantee correctness through isolation.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.