I'm debugging my app, it should be running during several hours when deployed.
I've let the app running and I found it crashed after 4-5 hours with an Out of Memory Error.
I'm on a Mac, OSX 10.8.2.
What I'm seeing in the Activity Monitor is that the process has a stable Real Memory Size (around 350 Mb), but it's Virtual Memory Size it's slowly increasing.
Is it normal? Can this be the origin of my problem?
Thanks as always for your support
I'm going to reply my own question to help anyone with the same issue....
After lot of debugging, after breaking apart my app in little chunks, looks like my memory leak it's created by PGraphics object ONLY if it's render mode is set to P3D.
I don't know why, the issue it's not solved but by finding the problem I could code a workaround
Good bet that your application is accumulating data without ever releasing it. If you're using anything dynamically allocating like HashMaps or ArrayLists or the like, those are prime suspects. Depending on how big your code is, you may have to start reducing your codebase and monitoring memory usage over 10 minute spans to find out at what point memory no longer accumulates.
Related
The application hitting slowness issue and generate some heapdump file, the heapdump file is 1.2GB, and I need to run my ha456.jar using 8.4GB RAM only can open the heapdump.
Before this, when I analyze the heapdump, I will try to see the Bigger LeakSize and check for the Leak Suspect value, and I can see that which class or which method of my application holding the big memory. And then I will try to fix the code so that it can run with better performance.
For this time, I cant really get the point that which module/method of my application causing the out of memory issue. The following are some of my screen shot of my HeapAnalyzer:
For me, its just common class, for example java/lang/object, java/lang/Long, or java/util/HashMap. I cant really know which method of my application causing the out of memory.
Appreciate your advise on how to analyze on this.
Finding memory leak is always very difficult for anyone in front of the code, let alone from so far. So I can only give you some suggestions:
you got an heap dump, filter by your own objects and analyze who creates the most numerous
run your application and monitor it with VisualVM, use the application a little bit and then force a GC run... 9 times out of 10 the objects whose number does not decrease significantly or do not completely reset are your memory leak
This maybe happening due to a lot of records are read from somewhere like database, queue which is of type Long. There could be a cartesian join or something of that sort. Once i had a ton of strings causing oom and the culprit was a logger accumulating logs.
A couple of thoughts-
When you get oom error trace that back to the suspect method.
Get thread dump and see which threads are active and what they are executing.
Its a vague question. So please feel free to ask for any specific data.
We have a tomcat server running with two web-service's. One tomcat built using spring. We use mysql for 90% of tasks and mongo for caching of jsons (10%). The other web-service is written using grails. Both the services are medium sized codebases (About 35k lines of code each)
The computation only happens when there is an HTTP request (No batch processing). With about 2000 database hits per request (I know its humongous. We are working on it). The request rate is about 30 req/min. For one particular request, there is Image processing which is quite memory expensive. No JNI anywhere
We have found a weird behavior. Last night, I can confirm that there was no request to the server for about 12 hours. But when I look at the memory consumption, it is very confusing:
Without any requests, the memory keeps jumping from 500Mb to 1.2Gb (700 Mb jump is worrysome). There is no computation on server side as mentioned. I am not sure if its a memory leak:
The memory usage comes down. (Things would have been way easier if the memory didnt come down).
This behavior is reproducable with caches based on SoftReference or so. With full gc's. But I am not using them anywhere (Not sure if something else is using it)
What else can be the reason. is it a cause to worry?
PS: We have had Our of Memory Crashes (Not errors but JVM crash) quite frequently very recently.
This is actually normal behavior. You're just seeing garbage collection occur.
I just can't figure it out, why i get this error. It is not always shown, but once it appears, my application refuses to accept connections (can't create new Socket-Threads, and also other threads i create in my JAVA-application for some of them i use ThreadPool).
top and htop shows me, there is ~ 900 MB of 2048MB used.
and there is also enough heap memory, about 200MB free.
cat /proc/sys/kernel/threads-max outputs:
1196032
and also, everything worked fine few days ago, it's a multiplayer-online game, and we had over 200 users online(~500 threads in total). But now even with 80 users online(~200 threads) after 10 min or few hours my application gets somehow broken with this OutOfMemoryError. In this case i do restart my application and again it works only for this short period of time.
I am very curious about, what if JVM act strangely on VPS, since other VPS on the same physical machine do also use JVM. Is that even possible?
Is there some sort of limit by provider what is not visible to me?
Or is there some sort of server attack?
I should also mention, by the time this error occours, sometimes munin fails to log the data for about only 10 min. Looking at graph-images, there is just white-space, like munin is not working at all. And again there is about 1 GB memory free as htop tells me by that time.
It might be also we case, i somehow produced a bug in my application. And start getting this error after I've done update. But even so, where do i begin the debugging ?
try increasing the stack size (-Xss)
You seem to host your app in some remote vps server. Are you sure the server, not your development box, has sufficient ram. People very often confuse their own machine with the remote machine.
Because if Bash is running out of memory too, is obviously a System Memory issue, not an App Memory issue. Post the results of free -m and ulimit -a on the remote machine to get more data.
If you distrust yout your provider to be using some troyanized htop, free and ulimit , you can test the real available memory with a simple C progran where you allocate with malloc 70~80% of your available ram and assigning random bytes on it in no more than 10 lines of ANSI C code. You can compile it statically on your box to avoid any crooked libc, and then transfer it with scp. That being said I heard rumors of vps providers giving less than promised but never encounter any.
Well moving from a VPS to a dedicated server solved my problem.
Additionally i found this
https://serverfault.com/questions/168080/java-vm-problem-in-openvz
this might be exactly the case, because on VPS i had there was really too low value for "privvmpages". It seems there is really some weird JVM behaviour in VPS.
As i already wrote in comments, even other programs(ls, top, htop, less) were not able to start at some time, although enough memory were available/free.
And.. provider did really made some changes on their System.
And also thank you everyone, for very fast reply and helping me solving this mystery.
You should try JRockit VM it is work perfect on my OpenVZ VPS, it consumes memory much less then Sun/Oracle jvm.
i am running an third party RMI-Server app providing exactly one method ("getImage()" returns an image as byte[]). The implementation of this method (getting the image via a SOAP-WS) is provide by me.
The problem on running this RMI-Server is the high CPU consumption (measured with jvisualvm): 65% of cpu time go into "sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run()" and on second place with 15% is "sun.net.www.http.KeepAliveCache.run()". The "real" work (scaling the image) comes on 4th place.
The server is running on win 2003 server. i guess there is something wrong with resource/connection handling?? but is this an implementation problem or a windows configuration-problem?
another observation is: if cpu utilization is high the memory utilization goes also up - the question is: is this because the gc can't do its work or many images waiting to be delivered. all i can say the memoryis used for byte[].
so any ideas what to do?
thx in advance
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run() is the method that calls your remote implementations in the server, after marshalling the arguments and before marshalling the result. The timings probably mean that it takes more time to return the image over the wire as the RMI result than it does to scale the image.
I can only speculate that the byte array could be the image but it could be any number of things.
BTW are you running a multi core machine? I am also having this problem and found that VisualVM pointed at that same culprit at around 50% CPU utilization on the quad core Win7 machine and 100% on single core winXP. I am running Jetty for the server.
Sorry I can't answer the real question yet, but hope to hear a solution here or to be able to share one here soon. Since you ran into this a few months back maybe you already found one and can share?
We've been debugging this JBoss server problem for quite a while. After about 10 hours of work, the server goes into 100% CPU panic attacks and just stalls. During this time you cannot run any new programs, so you can't even kill -quit to get a stack trace. These high 100% SYS CPU loads last 10-20 seconds and repeat every few minutes.
We have been working on for quite a while. We suspect it has something to do with the GC, but cannot confirm it with a smaller program. We are running on i386 32bit, RHEL5 and Java 1.5.0_10 using -client and ParNew GC.
Here's what we have tried so far:
We limited the CPU affinity so we can actually use the server when the high load hits. With strace we see an endless loop of SIGSEGV and then the sig return.
We tried to reproduce this with a Java program. It's true that SYS CPU% climbs high with WeakHashMap or when accessing null pointers. Problem was that fillStackTrace took a lot of user CPU% and that's why we never reached 100% SYS CPU.
We know that after 10 hours of stress, GC goes crazy and full GC sometimes takes 5 seconds. So we assume it has something to do with memory.
jstack during that period showed all threads as blocked. pstack during that time, showed MarkSweep stack trace occasionally, so we can't be sure about this as well. Sending SIGQUIT yielded nothing: Java dumped the stack trace AFTER the SYS% load period was over.
We're now trying to reproduce this problem with a small fragment of code so we can ask Sun.
If you know what's causing it, please let us know. We're open to ideas and we are clueless, any idea is welcome :)
Thanks for your time.
Thanks to everybody for helping out.
Eventually we upgraded (only half of the java servers,) to JDK 1.6 and the problem disappeared. Just don't use 1.5.0.10 :)
We managed to reproduce these problems by just accessing null pointers (boosts SYS instead of US, and kills the entire linux.)
Again, thanks to everyone.
If you're certain that GC is the problem (and it does sound like it based on your description), then adding the -XX:+HeapDumpOnOutOfMemoryError flag to your JBoss settings might help (in JBOSS_HOME/bin/run.conf).
You can read more about this flag here. It was originally added in Java 6, but was later back-ported to Java 1.5.0_07.
Basically, you will get a "dump file" if an OutOfMemoryError occurs, which you can then open in various profiling tools. We've had good luck with the Eclipse Memory Analyzer.
This won't give you any "free" answers, but if you truly have a memory leak, then this will help you find it.
Have you tried profiling applications. There are some good profiling applications that can run on production servers. Those should give you if GC is running into trouble and with which objects
I had a similar issue with JBoss (JBoss 4, Linux 2.6) last year. I think in the end it did turn out to be related to an application bug, but it was definitely very hard to figure out. I would keep trying to send a 'kill -3' to the process, to get some kind of stack trace and figure out what is blocking. Maybe add logging statements to see if you can figure out what is setting it off. You can use 'lsof' to figure out what files it has open; this will tell you if there is a leak of some resource other than memory.
Also, why are you running JBoss with -client instead of -server? (Not that I think it will help in this case, just a general question).
You could try adding the command line option -verbose:gc which should print GC and heap sizes out to stdout. pipe stdout to a file and see if the high cpu times line up with a major gc.
I remember having similar issues with JBoss on Windows. Periodically the cpu would go 100%, and the Windows reported mem usage would suddenly drop down to like 2.5 MB, much smaller than possible to run JBoss, and after a few seconds build itself back up. As if the entire server came down and restarted itself. I eventually tracked my issue down to a prepared statement cache never expiring in Apache Commons.
If it does seem to be a memory issue, then you can start taking periodic heap dumps and comparing the two, or use something like JProbe Memory profiler to track everything.