We experienced a (at least in our eyes) strange problem:
We have two Wildfly 8.1 installations on the same linux machine (CentOS 6.6) running the same applications in different versions and listining to different ports.
Now, we discovered that all of a sudden, when starting one of them, the other one got killed. We then discovered that the amount of free memory was low due to other leaking processes. When we killed those, the two wildlflys were running both correctly again.
Since I don't think that linux itself decided to kill another random process, I assume that JBoss has either some sort of mechanism to free memory by killing something which it assumes is not longer needed or that there are (maybe by wrong configuration) resources used by both of them leading to one of them getting killed when not being able to obtain it.
Did anyone experience something similar or know of a mechanism of that sort?
Most probably it was the linux OOM Killer.
You can verify if one of the servers was killed by it by checking the logfiles:
grep -i kill /var/log/messages*
And if it was you shoud see something like:
host kernel: Out of Memory: Killed process 2592
The OOM killer uses the following algorithm when determining which process to kill:
The function select_bad_process() is responsible for choosing a process to kill. It decides by stepping through each running task and calculating how suitable it is for killing with the function badness(). The badness is calculated as follows, note that the square roots are integer approximations calculated with int_sqrt();
badness_for_task = total_vm_for_task / (sqrt(cpu_time_in_seconds) *
sqrt(sqrt(cpu_time_in_minutes)))
This has been chosen to select a process that is using a large amount of memory but is not that long lived. Processes which have been running a long time are unlikely to be the cause of memory shortage so this calculation is likely to select a process that uses a lot of memory but has not been running long.
You can manually see the badness of each process by reading the oom_score file in the process directory in /proc
cat /proc/10292/oom_score
Related
TL;DR: Is there a foolproof (!) way I can detect from my master JVM that my slave JVM spawned via 2 intermediate scripts has experienced an OutOfMemory error on Linux?
Long version:
I'm running some sort of application launcher. Basically it receives some input and reacts by spawning a slave Java application to process said input. This happens via a python script (to correctly handle remote kill commands for it) which in turn calls a bash script (generated by Gradle and sets up the classpath) to actually spawn the slave.
The slave contains a worker thread and a monitor thread to make callbacks to a remote host for status updates. If status updates fail to occur for a set amount of time, the slave gets killed by the launcher. The reason for it not responding CAN be an OutOfMemoryError, however it can also be other reasons. I need to differentiate an OutOfMemoryError of the slave from some other error which caused it to stop working.
I don't just want to monitor memory usage and say once it reaches like 90% "ok that's enough". It may very well be that the GC succeeds in cleaning up sufficiently for the workload to finish. I only want to know if it failed to clean up and the JVM died because not enough memory could be freed.
What I have tried:
Use the -XX:OnOutOfMemory flag as a JVM option for the slave which calls a script which in turn creates an empty flag file. I then check with the launcher for the existence of the flag file if the slave died. Worked like a charm on Windows, did not work at all on Unix because there is a funky bug which causes the execution of the flag call to require the exact same amount of Xmx the slave has used. See https://bugs.openjdk.java.net/browse/JDK-8027434 for the bug. => Solution discarded because the slave needs the entire memory of the machine.
try{ longWork(); } catch (OutOfMemoryError e) { createOomFlagFile(); System.exit(100); } This does work in some cases. However there are also cases where this does not happen and the monitor thread simply stops sending status updates. No exception occurs, no OOM flag file gets created. I know from SSHing onto the machine though that Java is eating all the memory available on the system and the whole system is slow.
Is there some (elegant) foolproof way to detect this which I am missing?
You shouldn't wait for the OutOfMemory. My suggestion is, that you track memory consumption from the master application via Java Management Beans and issue warnings when memory consumption gets critical. I never did that on my own before, so I cannot get more precisely on how to do that, but maybe you find out or some others here can provide a solution.
Edit: this is the respective MXBean http://docs.oracle.com/javase/7/docs/api/java/lang/management/MemoryMXBean.html
I currently work in a Weblogic Java EE project, where from time to time the application executes a Perl script to do some batch jobs. In the application the script is getting invoked as
Process p = Runtime.getRuntime().exec(cmdString);
Though it is a dangerous way to run, but it was working properly until we had a requirement to execute the script synchronously under a for loop. After a couple of run we are getting
java.io.IOException: Not enough space as probably OS is running out of virtual memory while exec-ing under a for loop. As a result we are not able to run the script at all in the server.
I am desperately looking for a safer and better way to run the Perl script, where we don't need to fork the parent process, or at-least not to eat-up all swap space!
The spec is as follows:
Appserver - Weblogic 9.52
JDK - 1.5
OS - SunOS 5.10
Sun-Fire-T200
I've had something similar on a couple of occasions. Since the child process is a fork of the (very large parent it can see all of it shares all it's memory (using copy on write). What i discovered was that the kernel needs to be able to ensure that it could copy all of the memory pages before forking the child, on a 32bit OS you run out of virtual head run really fast.
Possible solutions:
Use a 64Bit OS and JVM, pushes the issue down the road so far it doesn't matter
Host your script in another process (like HTTPD) and poke it using a HTTP request to invoke it
Create a perl-server, which reads perl scripts via network and executes them one by one.
If you want to keep your code unchanged and have enough disk free space, you can just add a sufficiently large swap area to your OS.
Assuming you need 10 GB, here is how you do it with UFS:
mkfile 10g /export/home/10g-swap
swap -a /export/home/10g-swap
echo "/export/home/10g-swap - - swap - no -" >> /etc/vfstab
If you use ZFS, that would be:
zfs create -V 10gb rpool/swap1
swap -a /dev/zvol/dsk/rpool/swap1
Don't worry about that large a swap, this won't have any performance impact as the swap will only be used for virtual memory reservation, not pagination.
Otherwise, as already suggested in previous replies, one way to avoid the virtual memory issue you experience would be to use a helper program, i.e. a small service that your contact through a network socket (or a higher level protocol like ssh) and that executes the perl script "remotely".
Note that the issue has nothing to do with a 32-bit or 64-bit JVM, it is just Solaris doesn't overcommit memory and this is by design.
I just can't figure it out, why i get this error. It is not always shown, but once it appears, my application refuses to accept connections (can't create new Socket-Threads, and also other threads i create in my JAVA-application for some of them i use ThreadPool).
top and htop shows me, there is ~ 900 MB of 2048MB used.
and there is also enough heap memory, about 200MB free.
cat /proc/sys/kernel/threads-max outputs:
1196032
and also, everything worked fine few days ago, it's a multiplayer-online game, and we had over 200 users online(~500 threads in total). But now even with 80 users online(~200 threads) after 10 min or few hours my application gets somehow broken with this OutOfMemoryError. In this case i do restart my application and again it works only for this short period of time.
I am very curious about, what if JVM act strangely on VPS, since other VPS on the same physical machine do also use JVM. Is that even possible?
Is there some sort of limit by provider what is not visible to me?
Or is there some sort of server attack?
I should also mention, by the time this error occours, sometimes munin fails to log the data for about only 10 min. Looking at graph-images, there is just white-space, like munin is not working at all. And again there is about 1 GB memory free as htop tells me by that time.
It might be also we case, i somehow produced a bug in my application. And start getting this error after I've done update. But even so, where do i begin the debugging ?
try increasing the stack size (-Xss)
You seem to host your app in some remote vps server. Are you sure the server, not your development box, has sufficient ram. People very often confuse their own machine with the remote machine.
Because if Bash is running out of memory too, is obviously a System Memory issue, not an App Memory issue. Post the results of free -m and ulimit -a on the remote machine to get more data.
If you distrust yout your provider to be using some troyanized htop, free and ulimit , you can test the real available memory with a simple C progran where you allocate with malloc 70~80% of your available ram and assigning random bytes on it in no more than 10 lines of ANSI C code. You can compile it statically on your box to avoid any crooked libc, and then transfer it with scp. That being said I heard rumors of vps providers giving less than promised but never encounter any.
Well moving from a VPS to a dedicated server solved my problem.
Additionally i found this
https://serverfault.com/questions/168080/java-vm-problem-in-openvz
this might be exactly the case, because on VPS i had there was really too low value for "privvmpages". It seems there is really some weird JVM behaviour in VPS.
As i already wrote in comments, even other programs(ls, top, htop, less) were not able to start at some time, although enough memory were available/free.
And.. provider did really made some changes on their System.
And also thank you everyone, for very fast reply and helping me solving this mystery.
You should try JRockit VM it is work perfect on my OpenVZ VPS, it consumes memory much less then Sun/Oracle jvm.
I'm running a java application that is supposed to process a batch of files through some decisioning module. I need to process the batch for about 50 hrs. The problem I'm facing is that the process runs fine for about an hour and then starts to idle. So, I did this - I run the JVM for one hour and then shut it down, restart the JVM after 30 mins, but still for some reason the second run is taking almost 4-5 hrs. to do what the first run does in 1 hr. Any help or insights would be greatly appreciated.
I am running this on a 64-bit windows r2 server, 2 intel quad core processors(2.53 GHz), 24 GB RAM. Java version is 1.6.0_22(64-bit), memory allotted to the application is - heap(16 GB) and PermGen(2GB).
the external module is also running on a jvm and i am shutting that down too, but i have a feeling that it is holding on to memory even after shutdown. before i start the jvm RAM usage is 1 GB, after I end it it tends to stay at about 3 GB. Is there any way i can ask JAVA to forcibly release that memory?
Are you sure the JVM you are trying to close is indeed closed?
Once a process ends all of the RAM it had allocated is no longer allocated. There's no way for a process to hang on to it once it closes, which also means there's no way for you to tell it to do so. This is done by the Operating System.
Frankly, this sounds like the JVM is still running, or something else is eating the RAM. Also, it sounds like you're trying to workaround a vicious bug instead of hunting it down and killing it?
I suspect the JVM isn't exiting at all. I see this symptom regularly with a JBoss instance that grinds to a halt with OutOfMemoryExceptions. When I kill the JVM (via a SIGTERM/SIGQUIT), the JVM doesn't exit in a timely fashion since it's consuming all its resources throwing/handling OOM exceptions.
Check your logs and process table carefully to ensure the JVM quits properly. At that point the OS will clear all resources related to that process (including the consumed memory)
i've noticed something in the process.
After I shut down the JVM, if i delete all the files that i wrote to the file system, then the RAM usage comes back to the original 1 GB.
Does this lead to anything and can i possibly do something about it.
Out of interest: Have you tried splitting up the process so that it can run in parallel?
A 50hr job is a big job! You have a decent machine there (8 cores and 24GB Ram) and you should be able to parallelise some parts of it.
I'm having problems with jetty crashing intermittently, I'm using Jetty 6.1.24.
I'm running a neo4j Spring MVC webapp, Jetty will stay running for approx 1 hour and then I have to restart Jetty. It is running on small amazon ec2 instance, debian with 1.7gb of RAM.
I start Jetty using java -Xmx900m -server -jar start.jar
I am connecting to the server using putty, when Jetty crashes the putty session disconnects, I cannot see what error caused it to crash.
I would like to be able to see if it is an error generated by Spring, I'm not sure how to log the output from the spring app with Jetty. Or if it is Jetty or a memory issue, what would be the best way to monitor Jetty? I cannot recreate this on my local machine running windows. What do you think would be the best way to approach this? Thanks
This isn't really a programmer question; perhaps it'll be moved over to ServerFault.
You didn't specifically state which operating system you're using, but I'm hazarding a guess at some Linux distribution. You have two options of figuring out what's wrong:
Start your session in screen. Screen will live for as long as the actual machine is powered on, until you reboot the operating system (or you exit screen).
you start screen like this
screen
and you get a new prompt where you can start your program (cd foo, jetty, etc). When you're happy and you just need to go somewhere, you can disconnect the screen by hitting CTRL+A and then CTRL+D. you'll drop back to the place you were before you invoked screen.
To get back to seeing the screen you type screen -R which means to resume an existing screen. you should see jetty again.
The nice thing is that if you lose connection (or you close putty by accident or whatever) then you can use screen -list to get a list of running screens, and then forcibly detach them -D and reattach them to the current putty -R, no harm done!
Use nohup. Nohup more or less detaches the process you're running from the console, so none of its output comes to the terminal. You start your program in the normal fashion, but you add the word nohup to your command.
For example:
nohup ls -l &
After ls -l is complete, your output is stored in nohup.out.
When you say crash do you mean the JVM segfaults and disappears? If that's the case I'd check and make sure you aren't exhausting the machine's available memory. Java on linux will crash when the system memory gets so low the JVM cannot allocate up to its maximum memory. For example, you've set the max JVM memory to 500MB of which it's using 250MB at the moment. However, the Linux OS only has 128MB available. This produces unstable results and the JVM will segfault.
On windows the JVM is more well behaved in this scenario and throws OutOfMemoryError when the system is running low on memory.
Validate how much system memory is available around the time of your crashes.
Verify if other processes on your box are eating up a lot of memory. Turn off anything that could be competing with the JVM.
Run jconsole and connect it to your JVM. That will tell you how memory is being used in your JVM process and give you a history to look back through when it does crash.
Eliminate any native code you might be loading into the JVM when doing this type of testing.
I believe Jetty has some native code to do high volume request processing. Make sure that's not being used. You want to isolate the crashes to Java and NOT some strange native lib. If you take out the native stuff and find it works then you have your answer as to what's causing it. If it continues to crash then it very well could be what I'm describing.
You can force the JVM to allocate all the memory at startup with -Xms900m that can make sure the JVM doesn't fight with other processes for memory. Once it has the full Xmx amount allocated it won't crash. Not a solution, but you can easily test it this way.
When you start java, redirect both outputs (stdout and stderr) to a file:
Using Bash:
java -Xmx900m -server -jar start.jar > stdout.txt 2> stderr.txt
After the crash, inspect those files.
If the crash is due to a signal (like SEGV=segmentation fault), there should be a file dump by the JVM at the location you've started java. For Sun VM (hotspot), it's something like hs_err_pid12121.log (here 12121 is the process ID).
Putty disconnecting STRONGLY hints that the server is running out of memory and starts shutting down processes left and right. It is probably your jetty instance growing too big.
The easiest thing to do now, is adding 1-2 Gb more swap space and do it again. Also note that you can use the jvisualvm to attach to the jetty instance to get runtime information directly.