I have a production web application (Struts, iBatis, Hibernate) that runs in Tomcat that would hang while serving requests after 6 - 7 days of running but would run fine again after doing a thread dump.
I have a hard time figuring out why that is the case.
I was just wondering whether anyone else has ever encountered something similar.
Maybe this will help you find the cause of your problem.
I have enable JMX on tomcat
(set these optional vm arguments when starting tomcat)
-Dcom.sun.management.jmxremote
-Dcom.sun.management.jmxremote.port=30188 (whatever port you want jmx to run on for tc)
-Dcom.sun.management.jmxremote.ssl=false
-Dcom.sun.management.jmxremote.authenticate=false
I then wrote a little app that monitors memory usage (via jmx) and notifies me if the memory usage goes over , say 80%.
I would then know as soon as something is starting to go wrong. Then I will get a histogram for in-memory objects (see http://java.sun.com/javase/6/docs/technotes/tools/share/jmap.html for how to get that).
At the end it turned out that one of my ejbQL queries caused a huge amount of memory being used.
Hope it might help in some way ......
First of all try to reproduce this in a test environment. You can use JMeter to stress the app. You can start tomcat using the -verbose:gc and -XX:+PrintGCDetails which will give you more insight on what is happening while GC runs. Then, when the site is not responding, you can get a thread dump and if this unblocks the site have a look at the GC details for more info.
Related
How to troubleshoot/Optimize CPU usage in a Springboot application. Are the allocated resources sufficient for an application having a total of around a 300k user base? The application isn't heavy at all. It just calls third-party APIs and do the necessary checks and gives the response.
How to identify exact codes that could have been using more resources than normally required? I found out somewhere that tracking the processes id from top command and reaching to thread dump and looking up for the corresponding hexadecimal value of processid that could have been using more CPU is one way to figure out. This wasn't easily achievable as some of the commands suggested didn't work. I would appreciate any help or suggestions.
Thanks in advance.
Htop command output
Htop when it's normal
The process of Collection of Thread Stack is no different for a spring boot app. Before a boot app is containerized it is still a Jar. If you suspect that its your application that is actually contributing to the high CPUT then run your jar and attach a profiler to it and trace the code contributing to the high CPU on load. If you can not do it then take the thread dump of the running jar/java process and use any free or opensource tool to analyze the trace. The second logic explained is applicable for the containerized application as well.
Follow this steps to take the thread dump of a java app/boot app running inside a docker container :-
docker exec -it <containerName> jstack > someFile.txt
Take multiple snapshot of it for better visiblity and comparision.
If you have not added JMX enable options to the JVM commandline, do it to begin with:
-Dcom.sun.management.jmxremote
-Dcom.sun.management.jmxremote.port=10000
-Dcom.sun.management.jmxremote.rmi.port=10000
-Dcom.sun.management.jmxremote.local.only=false
-Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.ssl=false
Then on your local machine you start "jmc" from your JDK bin folder and connect to your spring boot server.
You will then be able to see all the threads and enable both CPU load and thread locks on all active threads.
Be aware though that the above opens up JVM for unauthenticated entry so keep the port safe.
Next if your JVM dies send a "kill -3" SIGHUP which will tell the JVM to dump the core. that can then be read via the Eclipse MAT plugin in order to analyze the JVM inner doings.
Another way is to install jolokia into your server for other ways to retrieve the same info.
I have a server with 4 CPU's and 16GB of RAM.
There is a Weblogic Admin server and 2 managed servers and a Tomcat server running in this Ubuntu Machine.
The resource utilization explodes at times which is very unusual. This has never happened before and I think it has something to do with the Java Parameters that I used.
Have a look at this:
Weblogic Cluster:
Admin Server : qaas-01
Managed Servers : qams-01, qams-02
In the below image you will be able to see that the java processes associated with the above are multiplying and consuming too much memory.
Figured out that this is more generic and not specific to Weblogic.
A lot of processes are behaving the same way.
In the below picture its Apache Tomcat and Jenkin's slave process thats replicating and consuming memory.
Can anyone help me identify the real issue?
This question is quite broad, so start looking into why it may be happening. Post your JVM flags also and if you changed anything that may be causing this.
First you need to figure out what is taking up your CPU time.
Check weblogic config console to generate a stack trace to see what is going on. You may need to sit and watch the CPU so you can run that when it spikes. You can also force a stack trace using jstack. To get java stacktrace you may need to sudo and execute it as the user running the server otherwise you get OS thread dump which may not be as useful. Read about jstack.
If above does not give enough info as to why the CPU spiked, and since this is ubuntu you can run:
timeout 20 strace -cvf -p {SERVER PID HERE} -o strace_digest.txt
This will run strace for 20 seconds and report on which OS calls are being made most frequently. This can give you a hint as to what is going on.
Enable and check the garbage collection log and see how often it runs, it may not have enough memory. See if there is a correlation between GC running and CPU spike.
I don't think there is a definitive way to help you solve CPU spike by looking at top, but above is a start to get you debugging.
We have trouble with Tomcat 5.5 which stops at night on our production servers (Linux CentOS 4.8) and we have no idea why it stops...
There is no Tomcat's log in catalina.out or any application's log.
We tried different things to find why the server stops:
configure Tomcat to be able to generate a core dump
instrument System.exit() method with javassist to find if the method was called
add a shutdown hook to the JVM (with Runtime.getRuntime().addShutdownHook())
None of them worked, we have no core dump, the Exit method and the shutdown hook are not called.
My conclusions are:
The VM is not terminated properly but crash without any log.
Any idea or log to read to find why Tomcat stops?
1) Make sure you know where stderr is redirected and check if anything got printed there.
2) Check the memory limits on Tomcat and how much free memory does the system have. Review the Linux system logs under /var/log to see if anything suspicious happened during the time. For example, kernel can randomly kill a process (almost) without a trace if the system is running low on memory.
We've ran 5.5 in production for years and never had any unexplained shutdowns, FWIW.
This worked for me.
As suggested here in other answers checked system logs in /var/log/messages but permission denied for me. So, I used dmesg command instead and got this in the logs
"Out of memory: Kill process 14606 (java) score 106 or sacrifice child".
In the output I also noticed Swap Memory free 0 K. Ran top command to confirm the same. So, somehow there was a high memory usage which caused the OS to kill my tomcat process.
After spending hours finally got the reason.
ps -ef | grep tomcat showed that there were several tomcat processes running for the same application. It seems that, earlier tomcat shutdowns might not have been completed successfully and due to some reason the processes were not killed even after the shutdown, which was causing the high memory usage.
So, killed all running tomcat processes using kill. SWAP memory got freed.
Started tomcat again, worked fine. :)
Tomcat 7 has an option inside catalina to prevent the System.exit class call or something similar: http://ci.apache.org/projects/tomcat/tomcat7/docs/security-manager-howto.html .
Maybe there's a similar option for the 5.5 version. Try the documentation.
There are options to redirect the output to the same console that you use to start Tomcat. This information is redirected to logs when you execute on Unix based systems, on Windows, it remains with the console if not redirected.
Most probably there is a stack-overflow exception. This is typical behavior of Tomcat when it happens. For example, you're trying to serialize to JSON or XML beans with cyclic dependencies (but without handling of the cycles).
Everytime I had this issue (several times) it always has been this one. All other stops are usually logged properly (like OutOfMemory etc).
This type of stops leaves no trace anywhere.
Sometimes when I redeploy war too many times, jboss gives java.lang.OutOfMemoryError: PermGen space error, is it possible to monitor jboss with other Java program that is not run inside jboss, to make sure it has not run ot of memory and if it is, then automatically restart jboss?
I would expect that you can monitor the memory consumption via JMX and the MemoryMXBean. You can do this interactively via JConsole, or code up a simple monitor to do this automatically.
Here's some details on how to do this in-process, but you can do this remotely as well. See the JMX docs for more info.
Alternatively, you can run a process under the JavaServiceWrapper, and get it to shutdown/restart a process depending on messages coming out from stdout/err. That may be a simple way to perform your restart automatically. However, I'd prefer using the JMX solution in the long term so you can get advance warning of issues (and perhaps tie them to their underlying cause).
I would suggest HypericHQ. It's a very good standalone application that can monitor your JBoss instances, alert you when the permgen or heap gets low, and can even trigger a restart if required. It's a complex beast, but worth the investment.
I am using Rational Application Developer v7.0 that ships with an integrated test environment. When I get to debugging my webapp, the server startup time in debug mode is close to 5-6 minutes - enough time to take a coffee break!
At times, it so pisses me off that I start cursing IBM for building an operating system instead of an app server! Spawning 20+ processes and useless services with no documented configuration to tuning it, to starting any faster.
I am sure there are many java developers out there who would agree with me on this. I tried to disable the default apps and a set of services via my admin console, however that hasn't helped much.
I have no webservices, no enterprise beans, no queues, just a simple web app which requires a connection pool. Have you done something in the past to make your integrated test environment, start fast in debug mode and there by consume less RAM?
UPDATE:
I tried disabling a few services (internationalization, default apps etc...) and now the WebSphere server went from bad to worse. Not only doesn't it take horrifying startup time, it keeps freezing every now and then for up to 2 minutes. :-( Sounds like, optimization is not such a good thing, always!
The best way to debug server code is to use remote debugging.
First you need to add the following to the JVM params in the server start script:
-Xdebug -Xnoagent -Djava.compiler=NONE -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=5005
This will cause the JVM to listen on the specified port, then from your IDE you can start a remote debug session against that port and debug as if the code was running in the same process.
Working this way prevent you restarting the server so frequently and hence side-steps your problem with Websphere's start-up time.
You can get some odd results if the binaries on the server and the source in the IDE get out of sync but on the whole that's not a problem.
One of the main reasons is that you have a large application with many modules, classes, manifests, XML descriptors so on, and the fact that Websphere application server start up process is single threaded per se (thus each application may be started in a separate thread if they has equal weight). One other reason is that the Eclipse EMF and JST frameworks are very I/O intensive during startup and publish/deploy.
One other reason for the tedious start up is the annotation scanning which will occur during publish/deploy. This annotation scanning can be controlled and modified in a various ways. Look at this site:
http://wasdynacache.blogspot.se/2012/05/how-to-speed-up-annotation-processing.html
First of all, examine and evaluate your hardware, both CPU, memory and HDD. Is your processor/s running 100% for a long time during start up? If so, the processor may be too weak. Is paging occur? then you may have to put in some more RAM. The Websphere/eclipse JST and EMF frameworks are very I/O intense so you should consider to invest in a SSD disc. You should also make sure that other processes on your machine (virus protection software etc.) don´t steal hardware resources from the Websphere java processes.
So for the hardware:
1. Processor - a pretty fast one, since the publish and the startup is mostly singlethreaded you do not need that many cpu cores
2. Memory - You will at least need 512Mb of physical RAM, this depends of the size of your application of course.
3. Storage - I would definitely go for a fast SSD since the underlying eclipse framework is I/O intensive.
Here are some tricks to reduce the footprint of the start up phase. Please before applying these settings make sure that you record a baseline start up so that you can observe the difference in start up, i.e. the reduced start up time.
JVM args : -Xverify:none -Xquickstart -Xnoclassgc -XX:+UseNUMA -XtlhPrefetch -Xgcthreads4 (I got 4 virtual processors installed on my machine)
Extend the heap size to match the demands of your application.
Disable the autostart of the application to reduce publish time.
Disable PMI and unnecessary tracing.
Profile your application during startup and fix bottlenecks if found any.
Other JVM arguments that may gain performance:
com.ibm.cacheLocalHost=true
com.ibm.ws.classloader.zipFileCacheSize=512
com.ibm.ws.classloader.resourceRequestCacheSize=1024
com.ibm.ws.management.event.pull_notification_timeout =20000
com.ibm.ws.amm.scan.context.filter.packages=true
org.eclipse.jst.j2ee.commonarchivecore.disableZip=true
Jvm arguments that will make the Websphere application server to stop immediately:
com.ibm.ejs.sm.server.quiesceTimeout=0
com.ibm.ejs.sm.server.quiesceInactiveRequestTime=1000
Webcontainer properties:
com.ibm.wsspi.jsp.disableTldSearch=true
com.ibm.wsspi.jsp.disableResourceInjection=true
JVM arguments that may be specified eclipse.ini (Note that the heap parameters is configured according to the conditions of my environment)
-Dcom.ibm.ws.management.event.max_polling_interval=5000
-Xquickstart
-Xverify:none
-Xmxcl25000
-Xjit:dataTotal=65536
-Xcodecache64m
-Xscmx48m
-Xnolinenumbers
-Xverify:none
-Xmnx64m
-Xmx1446m
-Xmnx64m
-XX:+UseCompressedOops
-XX:+UseNUMA
5 to 6 mins is not normal. I use RAD and WAS everyday and get decent startup times. Which version of WAS are you running and how much RAM do you have?
If you share several workspaces and projects for a same WAS profile, consider creating a new WAS profile for your workspace.
You probably tried that but here's a simple check list of things to try on first hand. Make sure that your server settings in RAD has the following options enabled:
Optimize server for testing and developing
Run server with resources on the workspace
Minimize application files copied to the server
Uncheck "Enable universal test client" if you don't need it.
In the admin console you can verify some server settings such as
Run in development mode
Parallel start
Start components as needed
You can also uninstall the ivt app that comes installed by default when creating a new WAS profile. Then the usual things such as a drive that is not too fragmented and a pagefile size that is properly set.
And one last thing that you probably know already, republish to your server instead of restarting it.
That's one reason why Spring was born.
You don't even have to give all the niceties like JMS, remoting, etc. You'd be better off with Tomcat, ActiveMQ, and OpenEJB.
Anything but WebSphere.
There's some hints and tips for tuning RAD 6 on developerworks that may help, many of these also apply for RAD 7.
I have seen a similar list for RAD 7, I'll post it if I can find it.
I did find some tuning tips for Portal on RAD 7.
I would say my experience with the test environment has been suboptimal. I now tend to use Tomcat/Pluto configured for remote debugging with an External launch configuration to manage it from within bare Eclipse and rely on having appropriate JNDI configurations to abstract the underlying server.
If you are coding to the relevant APIs it shouldn't matter for development purposes that you're not on Websphere. If you do have a Webpshere specific issue you can always crank up the beast to debug it.
If you have no EJBs, no JMS, etc., just deploy under a standalone servlet container such as Tomcat or Jetty, you'll be amazed how fast it is :-), being ironic here but it's true!
If the connection pool really is the only appserver feature you use then why don't you simply use apache commons dbcp (http://commons.apache.org/dbcp/) drop webfear alltogether and use jetty instead. That should reduce your startup time to about 5 seconds. You can then later easily switch to websphere again for your production environment if you should really feel the need to.
WAS V7 addresses some of these problems by allowing you to configure what starts up when the app server starts up.
So if and when you migrate to WAS V7 you might seem some improvements in this space.