I've asked for help on this before, here, but the issue still exists and the previously accepted answer doesn't explain it. I've also read most every article and SO thread on this topic, and most point to application leaks or modest overhead, neither of which I believe explain what I'm seeing.
I have a fairly large web service (application alone is 600MB), which when left alone grows to 5GB or more as reported by the OS. The JVM, however, is limited to 1GB (Xms, Xmx).
I'd done extensive memory testing and have found no leaks whatsoever. I've also run Oracle's Java Mission Control (basically the JMX Console) and verified that actual JVM use is only about 1GB. So that means about 4GB are being consumed by Tomcat itself, native memory, or the like.
I don't think and JNI, etc. is to blame as this particular installation has been mostly unused. All it's been doing is periodically checking the database for work requests, and periodically monitoring its resource consumption. Also, this hasn't been a problem until recently after years of use.
The JMX Console does report a high level of fragmentation (70%). But can that alone explain the additional memory consumption?
Most importantly, though, is not so much why is this happening, but how can I fix/configure it so that it stops happening. Any suggestions?
Here are some of the details of the environment:
Windows Server 2008 R2 64-bit
Java 1.7.0.25
Tomcat 7.0.55
JvmMX, JvmMs = 1000 (1GB)
Thread count: 60
Perm Gen: 90MB
Also seen on:
Windows 2012 R2 64-bit
Java 1.8.0.121
Related
OS: Windows Server 2008 R2 SP1
Web Server: IIS 7.5
JSP/Servlet Engine: Tomcat 5.5.28 (32-bit)
PHP: 5.4.14
Java: JRE SE 1.6.0_20 (32-bit)
Apache Isapi Connector hooks into Tomcat from IIS
PHP-Java Bridge 6.2.1
BMC AR System 7.5 Patch 6
Tomcat Initial and Max Memory: 1024 MB, 1024 MB
I am using a Java web application called AR System. After installing the PHP-Java Bridge, I started seeing java.lang.OutOfMemoryError: PermGen space error in the Tomcat logs. (I see in Windows Task Manager that there are 6 PHP-CGI.exe processes, all similar in memory footprint, give or take 5 MB). It would occur every other day or so and then shortened to every day, sometimes twice a day. Consequently, the application hangs and I have to restart it. And I added a Windows Task to restart Tomcat during non-peak hours to give me some cushion. I suspected a memory leak and started doing some research. Normally, Tomcat sits at around 300-350 MB. With the PHP-Java Bridge, memory jumped up significantly. In fact, the error has occurred anywhere from 450-600 MB.
I learned that default PermGen is 64MB and PermGen should be set to 1/4, up to 1/3 of Tomcat memory (sorry, I don't recall the link). Tomcat is running under Windows Services at this point, and I added the following to its properties:
-XX:+UseConcMarkSweepGC
-XX:+CMSPermGenSweepingEnabled
-XX:+CMSClassUnloadingEnabled
-XX:PermSize=128M
-XX:MaxPermSize=256M
I enforced GC on PermGen memory and increased the size from the default 64 MB size to 128-256 MB. Memory went up all the way to 800-850 MB, slowly, but it wasn't hanging during peak hours, albeit I still had Tomcat intentionally restart during non-peak hours, via a Windows Task. If I take off the restart, it MIGHT eventually hang but I haven't tried it.
I still suspected a memory leak. I installed a trial version of AppDynamics to monitor the application, its memory, and run leak detection. Additionally, to use tools like VisualVM and Memory Analyzer (MAT), I disabled The Tomcat Windows service and ran Tomcat from the Windows Command Line, via catalina.bat. I appended Java Options to the file; I made sure Tomcat memory was 1024 MB, Perm Gen was 128/256 MB, and ensured PHP-Java Bridge and AppDynamics was running. As of right now, PermGen is holding at 163 MB used, and AppDynamic's Automatic Leak Detection did not detect any leaks with any Java Collections.
I fired up MAT, created a heap dump and analyzed for leaks. When I ran it yesterday, it found three possible suspects:
net.sf.ehcache.Cache
net.sf.ehcache.store.DiskStore
org.apache.catalina.loader.WebappClassLoader
When I ran it today, it found 2 possible suspects:
java.util.HashMap
org.apache.jasper.servlet.JasperLoader
So, with MAT and AppDynamics, it appears that no memory leaks were detected for classes directly related to the PHP-Java Bridge JAR files. I haven't tried using Plumbr, but I can't find the free beta version. The free version detects leaks, but you have to pay to see it.
Again, I don't have a source link at this time, but I recall reading that Tomcat 5.x can have performance and memory leak issues. Of course, that doesn't mean everybody will have those issues, just a select number. I know Tomcat 6 and Tomcat 7 redesigned their memory management or how they structure memory. I also did speak with someone from BMC, the maker of AR System, and they said the current version of AR System I'm using could suffer from performance and memory issues. But, again, none of this was a problem before the PHP-Java Bridge. It was only after I installed it that this PermGen memory issue started.
Since the tools above did not report any leaks, does that mean there are no leaks and PHP-Java Bridge just needed more than 64 MB PermGen memory? Or, is there an inherit problem with my version of Tomcat and installing the PHP-Java Bridge just broke the proverbial camel's back?
Upgrading to a newer version of AR System and Tomcat is not an option. If there is a leak, I can uninstall the PHP-Java Bridge or continue trying to find a leak and fix it.
Any help would be appreciated.
Thank you.
Update 1
With MAT, I looked at the thread overview and stacks and you can see below that the PHP-Java Bridge contributes about 2/3 of the total heap memory of Tomcat. That's a lot of memory! I think there is a leak, I do. I can't find any information on the PHP-Java Bridge having inherit memory leak issues. But, to me, it appears that the problem is not that Tomcat is leaking. Ideas?
AppDynamics couldn't find any leaks, even when I manually added classes that were suspected in MAT. What I'm wondering is perhaps the PermGen error is a symptom of that case where the program has no leak and needs more PermGen memory allotted. It would be helpful to know if the PHP-Java Bridge is designed to eat a lot of memory, this much memory; maybe it's optimized for 64-bit, since the current setup is a 32-bit Java Web application. If I knew that this bridge needs a lot of memory, I would say OK, fine, and go from there. But it certainly appears as if there is a memory leak somewhere in the chain.
Update 2
I've been running Plumbr now for 2 hours and almost 10 minutes. I see that Tomcat memory is shooting up to 960 MB and probably will continue to climb. For those familiar with the program, the Java web application has been analyzed 3 times. So far, no leaks have been reported. If it stays this way, then the two conclusions I've arrived at are a) there are no leaks or b) there is a leak and, somehow, both AppDynamics and Plumbr missed it. If there are truly no leaks with this set of applications working together, then it must be that the Bridge uses a lot of memory and needs more PermGen memory than Tomcat's default, 64 MB -- at the very least, for 32-bit Java web applications.
Let's say I have a very large Java application that's deployed on Tomcat. Over the course of a few weeks, the server will run out of memory, application performance is degraded, and the server needs a restart.
Obviously the application has some memory leaks that need to be fixed.
My question is.. If the application were deployed to a different server, would there be any change in memory utilization?
Certainly the services offered by the application server might vary in their memory utilization, and if the server includes its own unique VM -- i.e., if you're using J9 or JRockit with one server and Oracle's JVM with another -- there are bound to be differences. One relevant area that does matter is class loading: some app servers have better behavior than others with regard to administration. Warm-starting the application after a configuration change can result in serious memory leaks due to class loading problems on some server/VM combinations.
But none of these are really going to help you with an application that leaks. It's the program using the memory, not the server, so changing the server isn't going to affect much of anything.
There will probably be a slight difference in memory utilisation, but only in as much as the footprint differs between servlet containers. There is also a slight chance that you've encountered a memory leak with the container - but this is doubtful.
The most likely issue is that your application has a memory leak - in any case, the cause is more important than a quick fix - what would you do if the 'new' container just happens to last an extra week etc? Moving the problem rarely solves it...
You need to start analysing the applications heap memory, to locate the source of the problem. If your application is crashing with an OOME, you can add this to the JVM arguments.
-XX:-HeapDumpOnOutOfMemoryError
If the performance is just degrading until you restart the container manually, you should get into the routine of triggering periodic heap dumps. A timeline of dumps is often the most help, as you can see which object stores just grow over time.
To do this, you'll need a heap analysis tool:
JHat or IBM Heap Analyser or whatever your preference :)
Also see this question:
Recommendations for a heap analysis tool for Java?
Update:
And this may help (for obvious reasons):
How do I analyze a .hprof file?
Currently we have developed application using Java 6 based on windows 32bit(Dual core & 3G Ram).
If we install into 64bit windows OS, does it will perform better because of the resources advantage that having in the 64bit(Same OS diff. bit)? The 64bit machine is having Quad core processor and Ram more than 4g. Is the any different for JVM between 32bit vs 64bit.
Thank you in advance for your feedback.
Extra info
I am doing Security Information Event management Sys.(SIEM) - log management.
We have 4 important parts ,
Collector -to collect logs from devices/system,
Aggregator -To aggregate the syslog to be meta data for reporting,
Real Time Monitoring-To display realtime analisys report/charts and dashboard that must run every second
GUI - Struts2 apps. that runs the web GUI, log analytics, backup and other things
So far the most resources cpu and memory are used by 1-Collector, 2-RealTime, 3-Aggregator.
Right now in 32bit, collector can recieved up to 2000logs per seconds. If more than that it will crash to memory heap. So we used tanuki software to auto restart back the collector service. We use the Tanuki to split the memory usage and auto restart once detected memory heap.
Our objective is to increase event per second from 2000logs to maximum if possible by using 64bit advantages.
For the GC we let the Java handle automatically, more important we can process the more logs in 1 second without any problem.
Switching to a 64-bit JVM doesn't guarantee any performance differences. You will, however see a huge difference in the amount of RAM that can be allocated. On 32-bit Windows, the maximum amount of RAM that could be allocated for the heap maxed out at around 1.6 GB.
If you see a lot of swapping with your application on the 32-bit machine, then switching to the 64-bit machine and adding sufficient RAM is likely to improve your performance. You might also be able to make design choices that favor faster, but more memory hungry algorithms where such choices exist.
As of this writing, you will probably not see significant difference between running your app on a 32-bit JVM and a 64-bit JVM on the exact same hardware. Eventually, support for 32-bit operating systems and JVMs will probably be discontinued, but that's a different concern than performance.
I strongly recommend you start out by profiling your app first to see where your performance hot spots are.
It's a common misconception that 64-bit automatically means better performance than 32-bit. See e.g. this JVM faq and this MS Windows 7 FAQ.
It really depends on the nature of your application and where your performance bottlenecks are.
If you have relatively un-tuned garbage collection, and your application is latency sensitive (i.e. must respond to a user request such as an http request quickly), adding more memory can actually worsen your GC pauses.
Is your application multi-threaded, as most web servers are? If so, going from 2 to 4 cores will very likely help if you don't have significant locking / contention issues.
If you look into GC tuning, you might want to try parallel GC on the 4 core cpu. This can significantly reduce GC pause times while incurring some extra overhead. For a latency sensitive app I worked on this was definitely worth it.
Please feel free to reply with more info - we could use some context on your app, it's workload, in-memory working set, etc.
Tomcat 5.5.x and 6.0.x
Grails 1.6.x
Java 1.6.x
OS CentOS 5.x (64bit)
VPS Server with memory as 384M
JAVA_OPTS : tried many combinations- including the following
export JAVA_OPTS='-Xms128M -Xmx512M -XX:MaxPermSize=1024m'
export JAVA_OPTS='-server -Xms128M -Xmx128M -XX:MaxPermSize=256M'
(As advised by http://www.grails.org/Deployment)
I have created a blank Grails application i.e simply by giving the command grails create-app and then WARed it
I am running Tomcat on a VPS Server
When I simply start the Tomcat server, with no apps deployed, the free memory is about 236M
and used memory is about 156M
When I deploy my "blank" application, the memory consumption spikes to 360M and finally the Tomcat instance is killed as soon as it takes up all free memory
As you have seen, my app is as light as it can be.
Not sure why the memory consumption is as high it is.
I am actually troubleshooting a real application, but have narrowed down to this scenario which is easier to share and explain.
UPDATE
I tested the same "blank" application on my local Tomcat 5.5.x on Windows and it worked fine
The memory consumption of the Java process shot from 32 M to 107M. But it did not crash and it remained under acceptable limits
So the hunt for answer continues... I wonder if something is wrong about my Linux box. Not sure what though...
UPDATE 2
Also see this http://www.grails.org/Grails+Test+On+Virtual+Server
It confirms my belief that my simple-blank app should work on my configuration.
It is a false economy to try to run a long running Java-based application in the minimal possible memory. The garbage collector, and hence the application will run much more efficiently if it has plenty of regular heap memory. Give an application too little heap and it will spend too much time garbage collecting.
(This may seem a bit counter-intuitive, but trust me: the effect is predictable in theory and observable in practice.)
EDIT
In practical terms, I'd suggest the following approach:
Start by running Tomcat + Grails with as much memory as you can possibly give it so that you have something that runs. (Set the permgen size to the default ... unless you have clear evidence that Tomcat + Grails are exhausting permgen.)
Run the app for a bit to get it to a steady state and figure out what its average working set is. You should be able to figure that out from a memory profiler, or by examining the GC logging.
Then set the Java heap size to be (say) twice the measured working set size or more. (This is the point I was trying to make above.)
Actually, there is another possible cause for your problems. Even though you are telling Java to use heaps of a given size, it may be that it is unable to do this. When the JVM requests memory from the OS, there are a couple of situations where the OS will refuse.
If the machine (real or virtual) that you are running the OS does not have any more unallocated "real" memory, and the OS's swap space is fully allocated, it will have to refuse requests for more memory.
It is also possible (though unlikely) that per-process memory limits are in force. That would cause the OS to refuse requests beyond that limit.
Finally, note that Java uses more virtual memory that can be accounted for by simply adding the stack, heap and permgen numbers together. There is the memory used by the executable + DLLs, memory used for I/O buffers, and possibly other stuff.
384MB is pretty small. I'm running a small Grails app in a 512MB VPS at enjoyvps.net (not affiliated in any way, just a happy customer) and it's been running for months at just under 200MB. I'm running a 32-bit Linux and JDK though, no sense wasting all that memory in 64-bit pointers if you don't have access to much memory anyway.
Can you try deploying a tomcat monitoring webapp e.g. psiprobe and see where the memory is being used?
I've got a somewhat dated Java EE application running on Sun Application Server 8.1 (aka SJSAS, precursor to Glassfish). With 500+ simultaneous users the application becomes unacceptably slow and I'm trying to assist in identifying where most of the execution time is spent and what can be done to speed it up. So far, we've been experimenting and measuring with LoadRunner, the app server logs, Oracle statpack, snoop, adjusting the app server acceptor and session (worker) threads, adjusting Hibernate batch size and join fetch use, etc but after some initial gains we're struggling to improve matters more.
Ok, with that introduction to the problem, here's the real question: If you had a slow Java EE application running on a box whose CPU and memory use never went above 20% and while running with 500+ users you showed two things: 1) that requesting even static files within the same app server JVM process was exceedingly slow, and 2) that requesting a static file outside of the app server JVM process but on the same box was fast, what would you investigate?
My thoughts initially jumped to the application server threads, both acceptor and session threads, thinking that even requests for static files were being queued, waiting for an available thread, and if the CPU/memory weren't really taxed then more threads were in order. But then we upped both the acceptor and session threads substantially and there was no improvement.
Clarification Edits:
1) Static files should be served by a web server rather than an app server. I am using the fact that in our case this (unfortunately) is not the configuration so that I can see the app server performance for files that it doesn't execute -- therefore excluding any database performance costs, etc.
2) I don't think there is a proxy between the requesters and the app server but even if there was it doesn't seem to be overloaded because static files requested from the same application server machine but outside of the application's JVM instance return immediately.
3) The JVM heap size (Xmx) is set to 1GB.
Thanks for any help!
SunONE itself is a pain in the ass. I have a very same problem, and you know what? A simple redeploy of the same application to Weblogic reduced the memory consumption and CPU consumption by about 30%.
SunONE is a reference implementation server, and shouldn't be used for production (don't know about Glassfish).
I know, this answer doesn't really helps, but I've noticed considerable pauses even in a very simple operations, such as getting a bean instance from a pool.
May be, trying to deploy JBoss or Weblogic on the same machine would give you a hint?
P.S. You shouldn't serve static content from under application server (though I do it too sometimes, when CPU is abundant).
P.P.S. 500 concurrent users is quite high a load, I'd definetely put SunONE behind a caching proxy or Apache which serves static content.
After using a Sun performance monitoring tool we found that the garbage collector was running every couple seconds and that only about 100MB out of the 1GB heap was being used. So we tried adding the following JVM options and, so far, this new configuration as greatly improved performance.
-XX:+DisableExplicitGC -XX:+AggressiveHeap
See http://java.sun.com/docs/performance/appserver/AppServerPerfFaq.html
Our lesson: don't leave JVM option tuning and garbage collection adjustments to the end. If you're having performance trouble, look at these settings early in your troubleshooting process.