i am running an third party RMI-Server app providing exactly one method ("getImage()" returns an image as byte[]). The implementation of this method (getting the image via a SOAP-WS) is provide by me.
The problem on running this RMI-Server is the high CPU consumption (measured with jvisualvm): 65% of cpu time go into "sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run()" and on second place with 15% is "sun.net.www.http.KeepAliveCache.run()". The "real" work (scaling the image) comes on 4th place.
The server is running on win 2003 server. i guess there is something wrong with resource/connection handling?? but is this an implementation problem or a windows configuration-problem?
another observation is: if cpu utilization is high the memory utilization goes also up - the question is: is this because the gc can't do its work or many images waiting to be delivered. all i can say the memoryis used for byte[].
so any ideas what to do?
thx in advance
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run() is the method that calls your remote implementations in the server, after marshalling the arguments and before marshalling the result. The timings probably mean that it takes more time to return the image over the wire as the RMI result than it does to scale the image.
I can only speculate that the byte array could be the image but it could be any number of things.
BTW are you running a multi core machine? I am also having this problem and found that VisualVM pointed at that same culprit at around 50% CPU utilization on the quad core Win7 machine and 100% on single core winXP. I am running Jetty for the server.
Sorry I can't answer the real question yet, but hope to hear a solution here or to be able to share one here soon. Since you ran into this a few months back maybe you already found one and can share?
Related
Its a vague question. So please feel free to ask for any specific data.
We have a tomcat server running with two web-service's. One tomcat built using spring. We use mysql for 90% of tasks and mongo for caching of jsons (10%). The other web-service is written using grails. Both the services are medium sized codebases (About 35k lines of code each)
The computation only happens when there is an HTTP request (No batch processing). With about 2000 database hits per request (I know its humongous. We are working on it). The request rate is about 30 req/min. For one particular request, there is Image processing which is quite memory expensive. No JNI anywhere
We have found a weird behavior. Last night, I can confirm that there was no request to the server for about 12 hours. But when I look at the memory consumption, it is very confusing:
Without any requests, the memory keeps jumping from 500Mb to 1.2Gb (700 Mb jump is worrysome). There is no computation on server side as mentioned. I am not sure if its a memory leak:
The memory usage comes down. (Things would have been way easier if the memory didnt come down).
This behavior is reproducable with caches based on SoftReference or so. With full gc's. But I am not using them anywhere (Not sure if something else is using it)
What else can be the reason. is it a cause to worry?
PS: We have had Our of Memory Crashes (Not errors but JVM crash) quite frequently very recently.
This is actually normal behavior. You're just seeing garbage collection occur.
I'm debugging my app, it should be running during several hours when deployed.
I've let the app running and I found it crashed after 4-5 hours with an Out of Memory Error.
I'm on a Mac, OSX 10.8.2.
What I'm seeing in the Activity Monitor is that the process has a stable Real Memory Size (around 350 Mb), but it's Virtual Memory Size it's slowly increasing.
Is it normal? Can this be the origin of my problem?
Thanks as always for your support
I'm going to reply my own question to help anyone with the same issue....
After lot of debugging, after breaking apart my app in little chunks, looks like my memory leak it's created by PGraphics object ONLY if it's render mode is set to P3D.
I don't know why, the issue it's not solved but by finding the problem I could code a workaround
Good bet that your application is accumulating data without ever releasing it. If you're using anything dynamically allocating like HashMaps or ArrayLists or the like, those are prime suspects. Depending on how big your code is, you may have to start reducing your codebase and monitoring memory usage over 10 minute spans to find out at what point memory no longer accumulates.
I just can't figure it out, why i get this error. It is not always shown, but once it appears, my application refuses to accept connections (can't create new Socket-Threads, and also other threads i create in my JAVA-application for some of them i use ThreadPool).
top and htop shows me, there is ~ 900 MB of 2048MB used.
and there is also enough heap memory, about 200MB free.
cat /proc/sys/kernel/threads-max outputs:
1196032
and also, everything worked fine few days ago, it's a multiplayer-online game, and we had over 200 users online(~500 threads in total). But now even with 80 users online(~200 threads) after 10 min or few hours my application gets somehow broken with this OutOfMemoryError. In this case i do restart my application and again it works only for this short period of time.
I am very curious about, what if JVM act strangely on VPS, since other VPS on the same physical machine do also use JVM. Is that even possible?
Is there some sort of limit by provider what is not visible to me?
Or is there some sort of server attack?
I should also mention, by the time this error occours, sometimes munin fails to log the data for about only 10 min. Looking at graph-images, there is just white-space, like munin is not working at all. And again there is about 1 GB memory free as htop tells me by that time.
It might be also we case, i somehow produced a bug in my application. And start getting this error after I've done update. But even so, where do i begin the debugging ?
try increasing the stack size (-Xss)
You seem to host your app in some remote vps server. Are you sure the server, not your development box, has sufficient ram. People very often confuse their own machine with the remote machine.
Because if Bash is running out of memory too, is obviously a System Memory issue, not an App Memory issue. Post the results of free -m and ulimit -a on the remote machine to get more data.
If you distrust yout your provider to be using some troyanized htop, free and ulimit , you can test the real available memory with a simple C progran where you allocate with malloc 70~80% of your available ram and assigning random bytes on it in no more than 10 lines of ANSI C code. You can compile it statically on your box to avoid any crooked libc, and then transfer it with scp. That being said I heard rumors of vps providers giving less than promised but never encounter any.
Well moving from a VPS to a dedicated server solved my problem.
Additionally i found this
https://serverfault.com/questions/168080/java-vm-problem-in-openvz
this might be exactly the case, because on VPS i had there was really too low value for "privvmpages". It seems there is really some weird JVM behaviour in VPS.
As i already wrote in comments, even other programs(ls, top, htop, less) were not able to start at some time, although enough memory were available/free.
And.. provider did really made some changes on their System.
And also thank you everyone, for very fast reply and helping me solving this mystery.
You should try JRockit VM it is work perfect on my OpenVZ VPS, it consumes memory much less then Sun/Oracle jvm.
I've developed a web application using the following tech stack:
Java
Mysql
Scala
Play Framework
DavMail integration (for calender and exchange server)
Javamail
Akka actors
On the first days, the application runs smoothly and without lags. But after 5 days or so, the application gets really slow! And now I have no clue how to profile this, since I have huge dependencies and it's hard to reproduce this kind of thing. I have looked into the memory and it seems that everything its okay.
Any pointers on the matter?
Try using VisualVM - you can monitor gc behaviour, memory usage, heap, threads, cpu usage etc. You can use it to connect to a remote VM.
`visualvm˙ is also a great tool for such purposes, you can connect to a remote JVM as well and see what's inside.
I suggest you doing this:
take a snapshot of the application running since few hours and since 5 days
compare thread counts
compare object counts, search for increasing numbers
see if your program spends more time in particular methods on the 5th day than on the 1str one
check for disk space, maybe you are running out of it
jconsole comes with the JDK and is an easy tool to spot bottlenecks. Connect it to your server, look into memory usage, GC times, take a look at how many threads are alive because it could be that the server creates many threads and they never exit.
I agree with tulskiy. On top of that you could also use JMeter if the investigations you will have made with jconsole are unconclusive.
The probable causes of the performances degradation are threads (that are created but never exit) and also memory leaks: if you allocate more and more memory, before having the OutOfMemoryError, you may encounter some performances degradation (happened to me a few weeks ago).
To eliminate your database you can monitor slow queries (and/or queries that are not using an index) using the slow query log
see: http://dev.mysql.com/doc/refman/5.1/en/slow-query-log.html
I would hazard a guess that you have a missing index, and it has only become apparent as your data volumes have increased.
Yet another profiler is Yourkit.
It is commercial, but with trial period (two weeks).
Actually, I've firstly tried VisualVM as #axel22 suggested, but our remote server was ssh'ed and we had problems with connecting via VisualVM (not saying that it is impossible, I've just surrendered after a few hours).
You might just want to try the 'play status' command, which will list web app state (threads, jobs, etc). This might give you a hint on what's going on.
So guys, in this specific case, I was running play in Developer mode, which makes the compiler works every now and then.
After changing to production mode, everything was lightning fast and no more problems anymore. But thanks for all the help.
We have an Java ERP type of application. Communication between server an client is via RMI. In peak hours there can be up to 250 users logged in and about 20 of them are working at the same time. This means that about 20 threads are live at any given time in peak hours.
The server can run for hours without any problems, but all of a sudden response times get higher and higher. Response times can be in minutes.
We are running on Windows 2008 R2 with Sun's JDK 1.6.0_16. We have been using perfmon and Process Explorer to see what is going on. The only thing that we find odd is that when server starts to work slow, the number of handles java.exe process has opened is around 3500. I'm not saying that this is the acual problem.
I'm just curious if there are some guidelines I should follow to be able to pinpoint the problem. What tools should I use? ....
Can you access to the log configuration of this application.
If you can, you should change the log level to "DEBUG". Tracing the DEBUG logs of a request could give you a usefull information about the contention point.
If you can't, profiler tools are can help you :
VisualVM (Free, and good product)
Eclipse TPTP (Free, but more complicated than VisualVM)
JProbe (not Free but very powerful. It is my favorite Java profiler, but it is expensive)
If the application has been developped with JMX control points, you can plug a JMX viewer to get informations...
If you want to stress the application to trigger the problem (if you want to verify whether it is a charge problem), you can use stress tools like JMeter
Sounds like the garbage collection cannot keep up and starts "halt-the-world" collecting for some reason.
Attach with jvisualvm in the JDK when starting and have a look at the collected data when the performance drops.
The problem you'r describing is quite typical but general as well. Causes can range from memory leaks, resource contention etcetera to bad GC policies and heap/PermGen-space allocation. To point out exact problems with your application, you need to profile it (I am aware of tools like Yourkit and JProfiler). If you profile your application wisely, only some application cycles would reveal the problems otherwise profiling isn't very easy itself.
In a similar situation, I have coded a simple profiling code myself. Basically I used a ThreadLocal that has a "StopWatch" (based on a LinkedHashMap) in it, and I then insert code like this into various points of the application: watch.time("OperationX");
then after the thread finishes a task, I'd call watch.logTime(); and the class would write a log that looks like this: [DEBUG] StopWatch time:Stuff=0, AnotherEvent=102, OperationX=150
After this I wrote a simple parser that generates CSV out from this log (per code path). The best thing you can do is to create a histogram (can be easily done using excel). Averages, medium and even mode can fool you.. I highly recommend to create a histogram.
Together with this histogram, you can create line graphs using average/medium/mode (which ever represents data best, you can determine this from the histogram).
This way, you can be 100% sure exactly what operation is taking time. If you can't determine the culprit, binary search is your friend (fine grain the events).
Might sound really primitive, but works. Also, if you make a library out of it, you can use it in any project. It's also cool because you can easily turn it on in production as well..
Aside from the GC that others have mentioned, Try taking thread dumps every 5-10 seconds for about 30 seconds during your slow down. There could be a case where DB calls, Web Service, or some other dependency becomes slow. If you take a look at the tread dumps you will be able to see threads which don't appear to move, and you could narrow your culprit that way.
From the GC stand point, do you monitor your CPU usage during these times? If the GC is running frequently you will see a jump in your overall CPU usage.
If only this was a Solaris box, prstat would be your friend.
For acute issues like this a quick jstack <pid> should quickly point out the problem area. Probably no need to get all fancy on it.
If I had to guess, I'd say Hotspot jumped in and tightly optimised some badly written code. Netbeans grinds to a halt where it uses a WeakHashMap with newly created objects to cache file data. When optimised, the entries can be removed from the map straight after being added. Obviously, if the cache is being relied upon, much file activity follows. You probably wont see the drive light up, because it'll all be cached by the OS.