I am trying to to detect what causes massive spikes in our Java struts bases web application deployed in Jboss. I have used Yourkit and visualVM to take dumps and have analysed dumps but these spikes are momentary and by the time the dump is taken nothing remains.
Question is - is there a way to detect what is causing a spike in the runtime?
Here are a couple of ideas:
Examine your request logs to see if there is any correlation with the spikes and either request volumes or specific request types.
Run the JVM with GC logging enabled and look for correlations.
Enable your debug-level logging in your application and look for correlations. (Be cautious with this one because turning on more application logging could change performance characteristics.)
(On Linux / Unix) run vmstat and iostat and look for correlations with extra disc activity or swapping/paging.
If you have a spike in the object creation rate or in the number / size of non-garbage objects, this is most likely caused by your application rather than the JVM or operating system. There is a good chance that it is due to a transient change in the nature of the application's workload; e.g. it is getting a spike in the requests, or it there is some unusual request that involves creating a lot of objects. Focus on the request and application logs.
As most likely garbage collection can cause such an issue, I'd recommend enabling the garbage collection logging in the JVM using these command line options:
-Xloggc:<path and filename to log to>
-XX:+PrintGCDetails
Related
I have a Grails/Spring application which runs in a servlet container on a web server like Tomcat. Sometime my app crashes because the JVM reaches its maximal allowed memory (Xmx).
The error which follows is a "java.lang.OutOfMemoryError" because Java heap space is full.
To prevent this error I want to check from within my app how much memory is in use and how much memory the current JVM has remaining.
How can I access these parameters from within my application?
Try to understand when OOM is thrown instead of trying to manipulate it through the application. And also, even if you are able to capture those values from within your application - how would you prevent the error? By calling GC explicitly. Know that,
Java machine specifications says that
OutOfMemoryError: The Java virtual machine implementation has run out of either virtual or physical memory, and the automatic storage manager was unable to reclaim enough memory to satisfy an object creation request.
Therefore, GC is guaranteed to run before a OOM is thrown. Your application is throwing an OOME after it has just run a full garbage collect, and discovered that it still doesn't have enough free heap to proceed.
This would be a memory leak or in general your application could have high memory requirement. Mostly if the OOM is thrown with in short span of starting the application - it is usually that application needs more memory, if your server runs fine for some time and then throw OOM then it is most likely a memory leak.
To discover the memory leak, use the tools mentioned by people above. I use new-relic to monitor my application and check the frequency of GC runs.
PS Scavenge aka minor-GC aka the parallel object collector runs for young generation only, and PS MarkAndSweep aka major GC aka parallel mark and sweep collector is for old generation. When both are run – its considered a full GC. Minor gc runs are pretty frequent – a Full GC is comparatively less frequent. Note the consumption of different heap spaces to analyze your application.
You can also try the following option -
If you get OOM too often, then start java with correct options, get a heap dump and analyze it with jhat or with memory analyzer from eclipse (http://www.eclipse.org/mat/)
-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=path to dump file
You can try the Grails Melody Plugin that display's the info in the url /monitoring relative to your context.
To prevent this error I want to check from within my app how much
memory is in use and how much memory the current JVM has remaining.
I think that it is not the best idea to proceed this way. Much better is to investigate what actually breaks your app and eliminate error or make some limitation there. There could be many different scenarios and your app can become unpredictable. So to sum up - capturing memory level for monitoring purpose is OK (but there are many dedicated tools for that) but in my opinion depending on these values in application logic is not recommended and bad practice
To do this you would use a profiler to profile your application and JVM, rather than having code to monitor such metrics inside your application.
Profiling is a form of dynamic program analysis that measures, for example, the space (memory) or time complexity of a program, the usage of particular instructions, or frequency and duration of function calls
Here are some good java profilers:
http://visualvm.java.net/ (Free)
http://www.ej-technologies.com/products/jprofiler/overview.html (Paid)
Let's say I have a very large Java application that's deployed on Tomcat. Over the course of a few weeks, the server will run out of memory, application performance is degraded, and the server needs a restart.
Obviously the application has some memory leaks that need to be fixed.
My question is.. If the application were deployed to a different server, would there be any change in memory utilization?
Certainly the services offered by the application server might vary in their memory utilization, and if the server includes its own unique VM -- i.e., if you're using J9 or JRockit with one server and Oracle's JVM with another -- there are bound to be differences. One relevant area that does matter is class loading: some app servers have better behavior than others with regard to administration. Warm-starting the application after a configuration change can result in serious memory leaks due to class loading problems on some server/VM combinations.
But none of these are really going to help you with an application that leaks. It's the program using the memory, not the server, so changing the server isn't going to affect much of anything.
There will probably be a slight difference in memory utilisation, but only in as much as the footprint differs between servlet containers. There is also a slight chance that you've encountered a memory leak with the container - but this is doubtful.
The most likely issue is that your application has a memory leak - in any case, the cause is more important than a quick fix - what would you do if the 'new' container just happens to last an extra week etc? Moving the problem rarely solves it...
You need to start analysing the applications heap memory, to locate the source of the problem. If your application is crashing with an OOME, you can add this to the JVM arguments.
-XX:-HeapDumpOnOutOfMemoryError
If the performance is just degrading until you restart the container manually, you should get into the routine of triggering periodic heap dumps. A timeline of dumps is often the most help, as you can see which object stores just grow over time.
To do this, you'll need a heap analysis tool:
JHat or IBM Heap Analyser or whatever your preference :)
Also see this question:
Recommendations for a heap analysis tool for Java?
Update:
And this may help (for obvious reasons):
How do I analyze a .hprof file?
I have a J2EE java application which processes SOAP requests. In our production environment (HPUX,OC4J,Java 5) we have about 20 threads running for this process, and we sometimes see 1 thread pausing for ~15 seconds. Until now, I haven't succeeded replicating the problem in our preproduction environment, and I'm scared of breaking stuff and violating SLA's if I use jconsole and associated tools on our production server.
Who has any inspiration? I know about http://java.sun.com/j2se/1.5/pdf/jdk50_ts_guide.pdf but I miss the experience to dare using it straight in production (plus, the HPUX guys threw some of these tools out of the toolbox, replacing them with HPJMeter)
Also, although this suggests a GC problem to me, I don't yet know enough to prove or disprove this theory and I am open to other suggestions.
We connect jconsole (and other tools) straight to production regularly. There is no significant overhead for us, the instrumentation is already going on within the JVM so you'd just be connecting a remote process to read published values. I say go for it!
Either way, you really need to see what's going on on the box. Thread dumps might or do some internal instrumentation. By internal instrumentation, I mean recording key measures within the code and exposing those somehow. It's essentially what the JVM does (exposing them via JMX) but rolling your own gives you more specificity. For example, I'm frequently recording request/response or other critical path performance timings internally.
oh, and one more thing. You can setup your app to using an agent to provide even more information. Typically this would be to plug a profiler in (like jprofiler or yourkit) but this does usually add more overhead and isn't recommended for production.
It's also worth thinking about the cost of not getting the information you need out of the VM. For example, is the cost of not fixing the issue more or less than the cost of a small % drop of performance when monitoring?
More scientifically, this article has some comments. It's suggesting up to 7% overhead (contradicting my previous point), a previous article from 2006 suggests 3-4% but both are highly contextual results. For example, CPU intensive applications may or may not be affected more than IO bound ones.
So a more appropriate answer from me (rather than just "go for it") would be to understand the impact it would have for your application in your environment through measurement. Run representative tests on a similar environment to production with jconsole connected and disconnected and see for your self.
Also see this stackoverflow question.
There are a few things that you can do on HP-UX to get additional information from a running Java process. If you send the PROF signal to the JVM, it will toggle the generation of a GC log (as if you had used the -Xverbosegc command line option). Generating the GC log is very inexpensive, so you should be able to turn this on in production without affecting the performance.
If you send the USR2 signal to the JVM, it starts profiling (same as -Xeprof). If you send the signal a second time, it turns off the profiling. This will have a noticeable performance impact, though it is smaller that what you would see from an external, third party profiler.
You can analyze the resulting data files using HPjmeter. HPjmeter can also connect to a running JVM for real-time monitoring. With Java 5, you need to start the JVM with the -agentlib option. If you were using Java 6, you could attach to the running JVM without needing any extra command line options.
So I've been trying to track down a good way to monitor when the JVM might potentially be heading towards an OOM situation. They best way that seems to work with our app is to track back-to-back concurrent mode failures through CMS. This indicates that the tenured pool is filling up faster than it can actually clean itself up, or its reclaiming very little.
The JMX bean for tracking GCs has very generic information such as memory usage before/after and the like. This information has been relatively inconsistent at best. Is there a better way I can be monitoring this potential warning sign of a dying JVM?
Assuming you're using the Sun JVM then I am aware of 2 options;
memory management mxbeans (API ref starts here) which you appear to be using already though note there are some hotspot specific internal ones you can get access to, see this blog for an example of how to use
jstat: cmd reference is here, you'll want the -gccause option. You can either write a script to launch this and parse the output or, theoretically, you could spawn a process from the host jvm (or another one) that then reads the output stream from jstat to detect the gc causes. I don't think the cause reporting is 100% comprehensive though. I don't know a way to get this info programatically from java code.
With standard JRE 1.6 GC, heap utilization can trend upwards overtime with the garbage collector running less and less frequently depending on the nature of your application and your maximum specified heap size. That said, it is hard to say what is going on without having more information.
A few methods to investigate further:
You could take a heap dump of your application while it is running using jmap, and then inspect the heap using jhat to see which objects are in heap at any given time.
You could also run your application with -XX:+HeapDumpOnOutOfMemoryError which will automatically make a heap dump on the first out of memory exception that the JVM encounters.
You could create a monitoring bean specific to your application, and create accessor methods you can hit with a remote JMX client. For example methods to return the sizes of queues and other collections that are likely places of memory utilization in your program.
HTH
What are the most likely causes for application server failure?
For example: "out of disk space" is more likely than "2 of the drives in a RAID 4 setup die simultaneously".
My particular environment is Java, so Java-specific answers are welcome, but not required.
EDIT just to clarify, i'm looking for downtime-related crashes (out of memory is a good example) not just one-time issues (like a temporary network glitch).
If you are trying to keep an application server up, start monitoring it. Nagios, Big Sister, and other Network Monitoring tools can be very useful.
Watch memory availability / usage, disk availability / usage, cpu availability / usage, etc.
The most common reason why a server goes down is rarely the same reason twice. Someone "fixes" the last-most-common-reason, and a new-most-common-reason is born.
Edwin is right - you need monitoring to understand what the problem is. Or better - understand what the problem is AND prevent it from causing downtime.
You should not only track resource consumption but also demand. The difference between the two shows you if you have sized your server correctly.
There are a ton of open source tools like nagios, CollectD, etc. that can give you server specific data - that's only monitoring though, not prevention. Librato Silverline (disclosure: I work there) allows you to monitor individual processes and then throttle the resources they use by placing them in application containers for which you define resource polices.
If your server is 8 cores or less you can use it for free.
"Out of Memory" exception due to memory leaks.
All sorts of things can cause a server to crash, ranging from busted hardware (e.g. disk failures) to faulty code (memory leak resulting in an out of memory exception, network failure that got rethrown as a runtime exception and was never caught, in servers that aren't Java servers a SEGFAULT, etc.)
At first, it is usually because of memory leaks, disk space problems, endless loops causing cpu to eat up.
Once you monitor those issues and set up correct logging and warning mechanisms, they turn meta on you... and exploding error handling becomes a possible reason for a full lockup: an error (or more likely: two in an unhappy combination) occurs but when the handler is trying to write to the logfiles or send a warning (by mail or something) it gets another error which it is trying to handle by writing to the logfile or sending a warning or... and this continues until one of the resources gives out: it may lead to skyrocketing server load, memory problems, filling disk space, locking up network traffic which means it won't be accessible for a remote user to correct the problem, etc.