We have a Java EE application(jsp/servlet,jdbc) running on Apache-tomcat server. The response time slows with time. It slows down at faster rate when continuously worked on.
The response time is back to normal after restart of the server.
I connected Jconsole to the server and I am attaching the screen shot of the heap memory usage,which goes up when doing intensive work and garbage collector kicks off periodically and the memory usage comes down.
However, when testing towards the end, despite kicking off garbage collector manually the response time is not going down. I
I also checked the connections and they seem to be closing off properly. i.e I do not notice any zcx
Any help is appreciated.
Attach with jvisualvm in the JDK. It allows you to profile Tomcat and find where the time goes.
My guess right now is the database connections. Either they go stale or the pool runs dry.
How much slower are the response times? Have you done any profiling or logging to help know which parts of your app are slower? It might be useful to setup a simple servlet to see if that also slows down as the other does. That might tell you if Tomcat or something in your app is slowing down.
Did you fine-tune your tomcat memory settings? Perhaps you need to increase perm gen size a bit.
e.g. -XX:MaxPermSize=512M
You can know it for sure if you can get a heap dump and load it to a tool like MemoryAnalyzer.
Related
We have a Java application, deployed in CloudFoundry, that occasionally throws an OOM error, due to requests that have a large payload as result.
When this happens, CloudFoundry kills the app and restarts it.
When the application is running on a development machine (rather than in CF), the OOM does not result in a crash (but does display an "out of heap memory" message in the output); usually the request-handler thread ends and the memory that was allocated for the request is garbage-collected. The application continues to run and successfully serves more requests.
Is there a way to configure CF to avoid restarting the app on OOM?
Thanks.
The short answer is no. The platform will always kill your app when you exceed the memory limit that you've assigned to it. This is the intended behavior. You cannot bypass this because that would essentially mean that your application has no memory limit.
On a side note, I would highly recommend using the Java buildpack v4.x (latest), if you are not already. It is much better about configuring the JVM such that you get meaningful errors like JVM OOME's instead of just letting your application crash. It also dumps helpful diagnostic information when this happens that will direct you to the source of the problem.
One other side note...
the OOM does not result in a crash (but does display an "out of heap memory" message in the output); usually the request-handler thread ends and the memory that was allocated for the request is garbage-collected.
You don't want to rely on this behavior. Once an OOME happens in the JVM, all bets are off. It may be able to recover and it may put your application into a horrible and unusable state. There's no way to know because there's no way to know exactly where the OOME will strike. When you get an OOME, the best course of action is to obtain any diagnostic information that you need and restart. This is exactly what the Java buildpack (v4+) does when your app runs on CF.
Hope that helps!
Its a vague question. So please feel free to ask for any specific data.
We have a tomcat server running with two web-service's. One tomcat built using spring. We use mysql for 90% of tasks and mongo for caching of jsons (10%). The other web-service is written using grails. Both the services are medium sized codebases (About 35k lines of code each)
The computation only happens when there is an HTTP request (No batch processing). With about 2000 database hits per request (I know its humongous. We are working on it). The request rate is about 30 req/min. For one particular request, there is Image processing which is quite memory expensive. No JNI anywhere
We have found a weird behavior. Last night, I can confirm that there was no request to the server for about 12 hours. But when I look at the memory consumption, it is very confusing:
Without any requests, the memory keeps jumping from 500Mb to 1.2Gb (700 Mb jump is worrysome). There is no computation on server side as mentioned. I am not sure if its a memory leak:
The memory usage comes down. (Things would have been way easier if the memory didnt come down).
This behavior is reproducable with caches based on SoftReference or so. With full gc's. But I am not using them anywhere (Not sure if something else is using it)
What else can be the reason. is it a cause to worry?
PS: We have had Our of Memory Crashes (Not errors but JVM crash) quite frequently very recently.
This is actually normal behavior. You're just seeing garbage collection occur.
I've developed a web application using the following tech stack:
Java
Mysql
Scala
Play Framework
DavMail integration (for calender and exchange server)
Javamail
Akka actors
On the first days, the application runs smoothly and without lags. But after 5 days or so, the application gets really slow! And now I have no clue how to profile this, since I have huge dependencies and it's hard to reproduce this kind of thing. I have looked into the memory and it seems that everything its okay.
Any pointers on the matter?
Try using VisualVM - you can monitor gc behaviour, memory usage, heap, threads, cpu usage etc. You can use it to connect to a remote VM.
`visualvm˙ is also a great tool for such purposes, you can connect to a remote JVM as well and see what's inside.
I suggest you doing this:
take a snapshot of the application running since few hours and since 5 days
compare thread counts
compare object counts, search for increasing numbers
see if your program spends more time in particular methods on the 5th day than on the 1str one
check for disk space, maybe you are running out of it
jconsole comes with the JDK and is an easy tool to spot bottlenecks. Connect it to your server, look into memory usage, GC times, take a look at how many threads are alive because it could be that the server creates many threads and they never exit.
I agree with tulskiy. On top of that you could also use JMeter if the investigations you will have made with jconsole are unconclusive.
The probable causes of the performances degradation are threads (that are created but never exit) and also memory leaks: if you allocate more and more memory, before having the OutOfMemoryError, you may encounter some performances degradation (happened to me a few weeks ago).
To eliminate your database you can monitor slow queries (and/or queries that are not using an index) using the slow query log
see: http://dev.mysql.com/doc/refman/5.1/en/slow-query-log.html
I would hazard a guess that you have a missing index, and it has only become apparent as your data volumes have increased.
Yet another profiler is Yourkit.
It is commercial, but with trial period (two weeks).
Actually, I've firstly tried VisualVM as #axel22 suggested, but our remote server was ssh'ed and we had problems with connecting via VisualVM (not saying that it is impossible, I've just surrendered after a few hours).
You might just want to try the 'play status' command, which will list web app state (threads, jobs, etc). This might give you a hint on what's going on.
So guys, in this specific case, I was running play in Developer mode, which makes the compiler works every now and then.
After changing to production mode, everything was lightning fast and no more problems anymore. But thanks for all the help.
We've been debugging this JBoss server problem for quite a while. After about 10 hours of work, the server goes into 100% CPU panic attacks and just stalls. During this time you cannot run any new programs, so you can't even kill -quit to get a stack trace. These high 100% SYS CPU loads last 10-20 seconds and repeat every few minutes.
We have been working on for quite a while. We suspect it has something to do with the GC, but cannot confirm it with a smaller program. We are running on i386 32bit, RHEL5 and Java 1.5.0_10 using -client and ParNew GC.
Here's what we have tried so far:
We limited the CPU affinity so we can actually use the server when the high load hits. With strace we see an endless loop of SIGSEGV and then the sig return.
We tried to reproduce this with a Java program. It's true that SYS CPU% climbs high with WeakHashMap or when accessing null pointers. Problem was that fillStackTrace took a lot of user CPU% and that's why we never reached 100% SYS CPU.
We know that after 10 hours of stress, GC goes crazy and full GC sometimes takes 5 seconds. So we assume it has something to do with memory.
jstack during that period showed all threads as blocked. pstack during that time, showed MarkSweep stack trace occasionally, so we can't be sure about this as well. Sending SIGQUIT yielded nothing: Java dumped the stack trace AFTER the SYS% load period was over.
We're now trying to reproduce this problem with a small fragment of code so we can ask Sun.
If you know what's causing it, please let us know. We're open to ideas and we are clueless, any idea is welcome :)
Thanks for your time.
Thanks to everybody for helping out.
Eventually we upgraded (only half of the java servers,) to JDK 1.6 and the problem disappeared. Just don't use 1.5.0.10 :)
We managed to reproduce these problems by just accessing null pointers (boosts SYS instead of US, and kills the entire linux.)
Again, thanks to everyone.
If you're certain that GC is the problem (and it does sound like it based on your description), then adding the -XX:+HeapDumpOnOutOfMemoryError flag to your JBoss settings might help (in JBOSS_HOME/bin/run.conf).
You can read more about this flag here. It was originally added in Java 6, but was later back-ported to Java 1.5.0_07.
Basically, you will get a "dump file" if an OutOfMemoryError occurs, which you can then open in various profiling tools. We've had good luck with the Eclipse Memory Analyzer.
This won't give you any "free" answers, but if you truly have a memory leak, then this will help you find it.
Have you tried profiling applications. There are some good profiling applications that can run on production servers. Those should give you if GC is running into trouble and with which objects
I had a similar issue with JBoss (JBoss 4, Linux 2.6) last year. I think in the end it did turn out to be related to an application bug, but it was definitely very hard to figure out. I would keep trying to send a 'kill -3' to the process, to get some kind of stack trace and figure out what is blocking. Maybe add logging statements to see if you can figure out what is setting it off. You can use 'lsof' to figure out what files it has open; this will tell you if there is a leak of some resource other than memory.
Also, why are you running JBoss with -client instead of -server? (Not that I think it will help in this case, just a general question).
You could try adding the command line option -verbose:gc which should print GC and heap sizes out to stdout. pipe stdout to a file and see if the high cpu times line up with a major gc.
I remember having similar issues with JBoss on Windows. Periodically the cpu would go 100%, and the Windows reported mem usage would suddenly drop down to like 2.5 MB, much smaller than possible to run JBoss, and after a few seconds build itself back up. As if the entire server came down and restarted itself. I eventually tracked my issue down to a prepared statement cache never expiring in Apache Commons.
If it does seem to be a memory issue, then you can start taking periodic heap dumps and comparing the two, or use something like JProbe Memory profiler to track everything.
I've got a somewhat dated Java EE application running on Sun Application Server 8.1 (aka SJSAS, precursor to Glassfish). With 500+ simultaneous users the application becomes unacceptably slow and I'm trying to assist in identifying where most of the execution time is spent and what can be done to speed it up. So far, we've been experimenting and measuring with LoadRunner, the app server logs, Oracle statpack, snoop, adjusting the app server acceptor and session (worker) threads, adjusting Hibernate batch size and join fetch use, etc but after some initial gains we're struggling to improve matters more.
Ok, with that introduction to the problem, here's the real question: If you had a slow Java EE application running on a box whose CPU and memory use never went above 20% and while running with 500+ users you showed two things: 1) that requesting even static files within the same app server JVM process was exceedingly slow, and 2) that requesting a static file outside of the app server JVM process but on the same box was fast, what would you investigate?
My thoughts initially jumped to the application server threads, both acceptor and session threads, thinking that even requests for static files were being queued, waiting for an available thread, and if the CPU/memory weren't really taxed then more threads were in order. But then we upped both the acceptor and session threads substantially and there was no improvement.
Clarification Edits:
1) Static files should be served by a web server rather than an app server. I am using the fact that in our case this (unfortunately) is not the configuration so that I can see the app server performance for files that it doesn't execute -- therefore excluding any database performance costs, etc.
2) I don't think there is a proxy between the requesters and the app server but even if there was it doesn't seem to be overloaded because static files requested from the same application server machine but outside of the application's JVM instance return immediately.
3) The JVM heap size (Xmx) is set to 1GB.
Thanks for any help!
SunONE itself is a pain in the ass. I have a very same problem, and you know what? A simple redeploy of the same application to Weblogic reduced the memory consumption and CPU consumption by about 30%.
SunONE is a reference implementation server, and shouldn't be used for production (don't know about Glassfish).
I know, this answer doesn't really helps, but I've noticed considerable pauses even in a very simple operations, such as getting a bean instance from a pool.
May be, trying to deploy JBoss or Weblogic on the same machine would give you a hint?
P.S. You shouldn't serve static content from under application server (though I do it too sometimes, when CPU is abundant).
P.P.S. 500 concurrent users is quite high a load, I'd definetely put SunONE behind a caching proxy or Apache which serves static content.
After using a Sun performance monitoring tool we found that the garbage collector was running every couple seconds and that only about 100MB out of the 1GB heap was being used. So we tried adding the following JVM options and, so far, this new configuration as greatly improved performance.
-XX:+DisableExplicitGC -XX:+AggressiveHeap
See http://java.sun.com/docs/performance/appserver/AppServerPerfFaq.html
Our lesson: don't leave JVM option tuning and garbage collection adjustments to the end. If you're having performance trouble, look at these settings early in your troubleshooting process.