Profiling WebSphere with hprof for CPU SAMPLES output

Profiling WebSphere with hprof for CPU SAMPLES output - java

I'm trying to profile WebSphere using hprof over IBM stack (J9 JVM on AIX / Linux). Specifically, I'm interested in the CPU samples output from hprof, and particularly, the startup time (from the time WS is started until it is "ready for business").
The problem is, I can't get the CPU samples output in the hprof result file.
I'm using the following JVM argument for configuring hprof: -Xrunhprof:cpu=samples,file=path-to-hprof.txt, for dumping hprof output in ASCII format. According to the generated hprof output, the CPU SAMPLES output is only generated at program exit:
HEAP DUMP, SITES, CPU SAMPLES|TIME and MONITOR DUMP|TIME records are generated
at program exit.
So, for shutting down WebSphere gracefully after it successfully started, I'm using the stopServer.sh script, and expecting the CPU SAMPLES output to be present in the result java.hprof.txt file after shutdown completes, but it isn't.
What am I doing wrong? Is there a better method for using hprof with WebSphere and generating CPU profiling output? Any help will be much appreciated!
Edit: I'm running WebSphere version 8.0.0.11 over IBM J9 VM (build 2.6, JRE 1.6.0 20150619_253846) on RHEL 7.5.
P.S.: I also looked for a way for closing WS from the management console GUI, but couldn't find any.
P.P.S.: In the meanwhile I'm using the very nice jvmtop tool with the --profile <pid> option, but that provides only partial insight, and as opposed to hprof, has to be attached on the fly, so some parts of the execution are lost.

Thanks to #kgibm's helpful hints, I realized I was on the right track, and went back the next day to try again. Surprisingly, this time, it worked! The hprof file was generated with the expected WebSphere CPU samples output.
I kept experimenting to figure out what I got wrong in the first place. Here's what I think has happened:
At first, I had a couple of native agents specified in WebSphere JVM arguments. The combination of these agents caused WS to run much slower. When I killed WS, there were a few seconds between the Server server1 stop completed message was printed and hprof.txt being completely written. I believe I was too quick to view hprof.txt, before the CPU samples output was actually written.
Then, for troubleshooting this issue, I added the doe=n parameter to the hprof argument. doe stands for Dump On Exit, and defaults to y. Only later I realized that this is probably wrong, since as quoted, CPU samples output is only generated at exit.
I think that these two issues together contributed to my confusion, so when I started clean, everything was OK.
Perhaps it is worth clarifying in hprof documentation that the doe=n option is conflicting with cpu=samples, and possibly with the other options that write on exit as well (I didn't see such an indication in the docs, but it's possible I've missed it).

Related

Heavy resourse utilization - Weblogic server

I have a server with 4 CPU's and 16GB of RAM.
There is a Weblogic Admin server and 2 managed servers and a Tomcat server running in this Ubuntu Machine.
The resource utilization explodes at times which is very unusual. This has never happened before and I think it has something to do with the Java Parameters that I used.
Have a look at this:
Weblogic Cluster:
Admin Server : qaas-01
Managed Servers : qams-01, qams-02
In the below image you will be able to see that the java processes associated with the above are multiplying and consuming too much memory.
Figured out that this is more generic and not specific to Weblogic.
A lot of processes are behaving the same way.
In the below picture its Apache Tomcat and Jenkin's slave process thats replicating and consuming memory.
Can anyone help me identify the real issue?

This question is quite broad, so start looking into why it may be happening. Post your JVM flags also and if you changed anything that may be causing this.
First you need to figure out what is taking up your CPU time.
Check weblogic config console to generate a stack trace to see what is going on. You may need to sit and watch the CPU so you can run that when it spikes. You can also force a stack trace using jstack. To get java stacktrace you may need to sudo and execute it as the user running the server otherwise you get OS thread dump which may not be as useful. Read about jstack.
If above does not give enough info as to why the CPU spiked, and since this is ubuntu you can run:
timeout 20 strace -cvf -p {SERVER PID HERE} -o strace_digest.txt
This will run strace for 20 seconds and report on which OS calls are being made most frequently. This can give you a hint as to what is going on.
Enable and check the garbage collection log and see how often it runs, it may not have enough memory. See if there is a correlation between GC running and CPU spike.
I don't think there is a definitive way to help you solve CPU spike by looking at top, but above is a start to get you debugging.

Is there a way to profile a java application without the use of an external tool?

External tools are giving me trouble. Is there a way to get simple cpu usage/time spent per function without the use of some external gui tool?
I've been trying to profile my java program with VisualVM, but I'm having terrible, soul crushing, ambition killing results. It will only display heap usage, what I'm interested in is CPU usage, but that panel simply says Not supported for this JVM. Doesn't tell me which JVM to use, by the way. I've download JDK 6 and launched it using that, I made sure my program targets the same VM, but nothing! Still the same, unhelpful error message.
My needs are pretty simple. I just want to find out where the program is spending its time. Python has an excellent built in profiler that print out where time was spent in each function, both with per call, and total time formats. That's really the extent of what I'm looking for right now. Anyone have any suggestions?

It's not pretty, but you could use the built in hprof profiling mechanism, by adding a switch to the command line.
-Xrunhprof:cpu=times
There are many options available; see the Oracle documentation page for HPROF for more information.
So, for example, if you had an executable jar you wanted to profile, you could type:
java -Xrunhprof:cpu=times -jar Hello.jar
When the run completes, you'll have a (large) text file called "java.hprof.txt".
That file will contain a pile of interesting data, but the part you're looking for is the part which starts:
CPU TIME (ms) BEGIN (total = 500) Wed Feb 27 16:03:18 2013
rank self accum count trace method
1 8.00% 8.00% 2000 301837 sun.nio.cs.UTF_8$Encoder.encodeArrayLoop
2 5.40% 13.40% 2000 301863 sun.nio.cs.StreamEncoder.writeBytes
3 4.20% 17.60% 2000 301844 sun.nio.cs.StreamEncoder.implWrite
4 3.40% 21.00% 2000 301836 sun.nio.cs.UTF_8.updatePositions
Alternatively, if you've not already done so, I would try installing the VisualVM-Extensions, VisualGC, Threads Inspector, and at least the Swing, JVM, Monitor, and Jvmstat Tracer Probes.
Go to Tools->Plugins to install them. If you need more details, comment, and I'll extend this answer further.

Profile Entire Java Program Execution in VisualVM

In Java profiling, it seems like all (free) roads nowadays lead to the VisualVM profiler included with JDK6. It looks like a fine program, and everyone touts how you can "attach it to a running process" as a major feature. The problem is, that seems to be the only way to use it on a local process. I want to be able to start my program in the profiler, and track its entire execution.
I have tried using the -Xrunjdwp option described in how to profile application startup with visualvm, but between the two transport methods (shared memory and server), neither is useful for me. VisualVM doesn't seem to have any integration with the former, and VisualVM refuses to connect to localhost or 127.0.0.1, so the latter is no good either. I also tried inserting a simple read of System.in into my program to insert a pause in execution, but in that case VisualVM blocks until the read completes, and doesn't allow you to start profiling until after execution is under way. I have also tried looking into the Eclipse plugin but the website is full of dead links and the launcher just crashes with a NullPointerException when I try to use it (this may no longer be accurate).
Coming from C, this doesn't seem like a particularly difficult task to me. Am I just missing something or is this really an impossible request? I'm open to any kinds of suggestions, including using a different (also free) profiler, and I'm not averse to the command line.

Consider using HPROF and opening the data file with a tool like HPjmeter - or just reading the resulting text file in your favorite editor.
Command used: javac -J-agentlib:hprof=heap=sites Hello.java
SITES BEGIN (ordered by live bytes) Fri Oct 22 11:52:24 2004
percent live alloc'ed stack class rank self accum bytes objs bytes objs trace name
1 44.73% 44.73% 1161280 14516 1161280 14516 302032 java.util.zip.ZipEntry
2 8.95% 53.67% 232256 14516 232256 14516 302033 com.sun.tools.javac.util.List
3 5.06% 58.74% 131504 2 131504 2 301029 com.sun.tools.javac.util.Name[]
4 5.05% 63.79% 131088 1 131088 1 301030 byte[]
5 5.05% 68.84% 131072 1 131072 1 301710 byte[]
HPROF is capable of presenting CPU usage, heap allocation statistics,
and monitor contention profiles. In addition, it can also report
complete heap dumps and states of all the monitors and threads in the
Java virtual machine.

The best way to solve this problem without modifying your application, is to not use VisualVM at all. As far as other free options are concerned, you could use either Eclipse TPTP or the Netbeans profiler, or whatever comes with your IDE.
If you can modify your application, to suspend it's state while you setup the profiler in VisualVM, it is quite possible to do so, using the VisualVM Eclipse plugin. I'm not sure why you are getting the NullPointerException, since it appears to work on my workstation. You'll need to configure the plugin by providing the path to the jvisualvm binary and the path of the JDK; this is done by visiting the VisualVM configuration dialog at Windows -> Preferences -> Run/Debug - > Launching -> VisualVM Configuration (as shown in the below screenshot).
You'll also need to configure your application to start with the VisualVM launcher, instead of the default JDT launcher.
All application launches from Eclipse, will now result in VisualVM tracking the new local JVM automatically, provided that VisualVM is already running. If you do not have VisualVM running, then the plugin will launch VisualVM, but it will also continue running the application.
Inferring from the previous sentence, it is evident that having the application halt in the main() method before performing any processing is quite useful. But, that is not the main reason for suspending the application. Apparently, VisualVM or its Eclipse plugin does not allow for automatically starting the CPU or memory profilers. This would mean that these profilers would have to be started manually, thereby necessitating the need to suspend the application.
Additionally, it is worth noting that adding the flags: -Xrunjdwp:transport=dt_socket,address=8000,server=y,suspend=y to the JVM startup will not help you in the case of VisualVM, to suspend the application and setup up the profilers. The flags are meant to help you in the case of profilers that can actually connect to the open port of the JVM, using the JDWP protocol. VisualVM does not use this protocol and therefore you would have to connect to the application using JDB or a remote debugger; but that would not resolve the problem associated with profiler configuration, as VisualVM (at least as of Java 6 update 26) does not allow you to configure the profilers on a suspended process as it simply does not display the Profiler tab.

This is now possible with the startup profiler plugin to VisualVM.

The advice with -Xrunjdwp is incorrect. It just enables debugger and with suspend=y it waits for debugger to attach. Since VisualVM is not debugger, it does not help you. However inserting System.in or Thread.sleep() will pause the startup and allows VisualVM to attach to your application. Be sure to read Profiling with VisualVM 1 and Profiling with VisualVM 2 to better understand profiler settings. Note also that instead of profiling, you can use 'Sampler' tab in VisualVM, which is more suitable for profiling entire java program execution. As other mentioned you can also use NetBeans Profiler, which directly support profiling of the application startup.

My JBoss server hits 100% SYS CPU on Linux; what can cause this?

We've been debugging this JBoss server problem for quite a while. After about 10 hours of work, the server goes into 100% CPU panic attacks and just stalls. During this time you cannot run any new programs, so you can't even kill -quit to get a stack trace. These high 100% SYS CPU loads last 10-20 seconds and repeat every few minutes.
We have been working on for quite a while. We suspect it has something to do with the GC, but cannot confirm it with a smaller program. We are running on i386 32bit, RHEL5 and Java 1.5.0_10 using -client and ParNew GC.
Here's what we have tried so far:
We limited the CPU affinity so we can actually use the server when the high load hits. With strace we see an endless loop of SIGSEGV and then the sig return.
We tried to reproduce this with a Java program. It's true that SYS CPU% climbs high with WeakHashMap or when accessing null pointers. Problem was that fillStackTrace took a lot of user CPU% and that's why we never reached 100% SYS CPU.
We know that after 10 hours of stress, GC goes crazy and full GC sometimes takes 5 seconds. So we assume it has something to do with memory.
jstack during that period showed all threads as blocked. pstack during that time, showed MarkSweep stack trace occasionally, so we can't be sure about this as well. Sending SIGQUIT yielded nothing: Java dumped the stack trace AFTER the SYS% load period was over.
We're now trying to reproduce this problem with a small fragment of code so we can ask Sun.
If you know what's causing it, please let us know. We're open to ideas and we are clueless, any idea is welcome :)
Thanks for your time.

Thanks to everybody for helping out.
Eventually we upgraded (only half of the java servers,) to JDK 1.6 and the problem disappeared. Just don't use 1.5.0.10 :)
We managed to reproduce these problems by just accessing null pointers (boosts SYS instead of US, and kills the entire linux.)
Again, thanks to everyone.

If you're certain that GC is the problem (and it does sound like it based on your description), then adding the -XX:+HeapDumpOnOutOfMemoryError flag to your JBoss settings might help (in JBOSS_HOME/bin/run.conf).
You can read more about this flag here. It was originally added in Java 6, but was later back-ported to Java 1.5.0_07.
Basically, you will get a "dump file" if an OutOfMemoryError occurs, which you can then open in various profiling tools. We've had good luck with the Eclipse Memory Analyzer.
This won't give you any "free" answers, but if you truly have a memory leak, then this will help you find it.

Have you tried profiling applications. There are some good profiling applications that can run on production servers. Those should give you if GC is running into trouble and with which objects

I had a similar issue with JBoss (JBoss 4, Linux 2.6) last year. I think in the end it did turn out to be related to an application bug, but it was definitely very hard to figure out. I would keep trying to send a 'kill -3' to the process, to get some kind of stack trace and figure out what is blocking. Maybe add logging statements to see if you can figure out what is setting it off. You can use 'lsof' to figure out what files it has open; this will tell you if there is a leak of some resource other than memory.
Also, why are you running JBoss with -client instead of -server? (Not that I think it will help in this case, just a general question).

You could try adding the command line option -verbose:gc which should print GC and heap sizes out to stdout. pipe stdout to a file and see if the high cpu times line up with a major gc.
I remember having similar issues with JBoss on Windows. Periodically the cpu would go 100%, and the Windows reported mem usage would suddenly drop down to like 2.5 MB, much smaller than possible to run JBoss, and after a few seconds build itself back up. As if the entire server came down and restarted itself. I eventually tracked my issue down to a prepared statement cache never expiring in Apache Commons.
If it does seem to be a memory issue, then you can start taking periodic heap dumps and comparing the two, or use something like JProbe Memory profiler to track everything.

HPjmeter-like graphical tool to view -agentlib:hprof profiling output

What tools are available to view the output of the built-in JVM profiler? For example, I'm starting my JVM with:
-agentlib:hprof=cpu=times,thread=y,cutoff=0,format=a,file=someFile.hprof.txt
This generates output in the hprof ("JAVA PROFILE 1.0.1") format.
I have had success in the past using HPjmeter to view these output files in a reasonable way. However, for whatever reason the files that are generated using the current version of the Sun JVM fail to load in the current version of HPjmeter:
java.lang.NullPointerException
at com.hp.jmeter.f.jb.a(Unknown Source)
at com.hp.jmeter.f.a.a(Unknown Source)
at com.hp.c.a.j.z.run(Unknown Source)
Exception in thread "HPeprofDataFileReaderThread" java.lang.AssertionError: null pointer exception from loader
at com.hp.jmeter.f.a.a(Unknown Source)
at com.hp.c.a.j.z.run(Unknown Source)
(Why would they obfuscate the bytecode for a free product?!)
Two questions arise from this:
Does anyone know the cause of this HPjmeter error? (EDIT: Yes--see below)
What other tools exist to read hprof files? And why are there none from Sun (are there)?
I know the Eclipse TPTP and other tools can monitor JVMTI data on the fly, but I need a solution that can process the generated hprof files after the fact since the deployed machine only has a JRE (not a JDK) intalled.
EDIT: A very helpful HPjmeter developer replied to my question on an HP ITRC forum indicating that heap=dump needs to be included in the -agentlib options temporarily until a bug in HPjmeter is fixed. This information makes HPjmeter viable again, but I will still leave the question open to see if anyone knows of any other tools.
EDIT: As of version 4.0.00 of HPjmeter (available 05/2009) this bug has been fixed.

Your Kit Java Profiler is able to read hprof snapshots (I am not sure if only for memory profiling or for CPU as well). It is not free but is by far the best java profiler I ever used. It presents the results in a clear, intuitive way and performs well on large data sets. The documentation is also pretty good.

For viewing and analyzing the output of hprof=samples or hprof=cpu I have used PerfAnal with good results. The GUI is a bit spartan, but very useful.
PerfAnal is a free download (GPL, originally an example project in the book Java Programming on Linux).
See this article:
http://www.oracle.com/technetwork/articles/javase/perfanal-137231.html
for more information and the download.
Normally you can just run
java -jar PerfAnal.jar hprof.java.txt
You may need to fiddle with -Xmx for large hprof files.

I am not 100% sure it'll work (it sounds like it will) and I am not sure it'll show it in the format you want... but have you thought about the VisualVM?
I believe it'll open up the resulting file.

I have been using Eclipse Memory Analyzer for analyzing different performance problems successfully. First of all, install the tool as described in the project webpage in Eclipse.
After that, you can create a dump file knowing the pid of the jvm to be analyzed
jmap -dump:format=b,file=<filename>.hprof <jvm_pid>
Then just import the .hprof file in eclipse. It has some automatic reports that try (for me they usually do not work) to point out which could be the possible problems.
Edit:
Answering the comment: You are right, it is more like a leak finder for Java. For performance problems, I have played with JRat for small projects. It shows time comsumed per method, number of times a method is called, hierarchy of calls, etc. The only problem is that as far as I know, it does not support .hprof files. To use it, yo need to execute your program adding a VM argument
-javaagent:<path>/shiftone-jrat.jar
This will generate a directory with the profile captured by the tool. Then, execute
java -jar shiftone-jrat.jar
And open the trace. Even been a simple tool, I think it could be useful.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.