Why aren't all methods displayed in VisualVM profiler? - java

I am using VisualVM to see where my application is slow. But it does not show all methods, probably does not show all methods that delays the application.
I have a realtime application (sound processing) and have a time deficiency in few hundreds of microseconds.
Is it possible that VisualVM hides methods which are fast themselves?
UPDATE 1
I found slow method by sampler and guessing. It was toString() method which was called from debug logging which was turned off, but consuming a time.
Settings helped and now I know how to see it: it was depending on Start profiling from option.

Other than the filters mentioned by Ryan Stewart, here are a couple of additional reasons why methods may not appear in the profiler:
Sampling profiles are inherently stochastic: a sample of the current stack of all threads is taken every N ms. Some methods which actually executed but which aren't caught in any sample during your run just won't appear. This is generally not too problematic since the very fact they didn't appear in any sample, means that with very high probability these are methods which aren't taking up a large part of your runtime.
When using instrumentation based sampling in visualvm (called "CPU profiling"), you need to define the entry point for profiled methods (the "Start profiling from" option). I have found this fails for methods in the default package, and also won't pick up time in methods which are current running when the profiler is attached (for the duration of the current invocation - it will get later invocations. This is probably because the instrumented method won't be swapped in until the current invocation finishes.
Sampling is subject to a potentially serious issue with stack traced based profiling, which is that samples are only taken at safe points in the code. When a trace is requested, each thread is forced to a safe point, then the stack is taken. In some cases you may have a hot spot in your code which does no safe point polling (common for simple loops that the JIT can guarantee terminate after a fixed number of iterations), interleaved with a bit of code that does have a safepoint poll. Your stacks will always show your process in the safe-point code, never in the safe-point free code, even though the latter may be taking the majority of CPU time.

I don't have it in front of my at the moment, but before you start profiling, there's a settings pane that's hidden by default and lets you enter regexes for filtering out methods. By default, it filters out a lot of the core JDK stuff.

I had the same problem with my pet project. I added a package name and the problem is solved. I don't understand why. VisualVM 1.4.1, jdk1.8.0_181 and jdk-10.0.2, Windows 10

Related

When doing performance analysis with JProfiler, can I display a sliding-window-last-X-seconds view?

I have a java application that, from time to time, seems to have hiccups where it lags a lot / becomes unresponsive for a few seconds, then continues like normal again. This isn't associated with any disk or network output, but CPU usage goes way up for a short time when this happens.
I'd like to use JProfiler to see what happens during that time, but I don't know what triggers the behaviour (so I can't just move my application to that point, then start CPU recording), and leaving CPU recording on all the time until a hiccup occurs doesn't help much either, since that will include the CPU percentages of everything up to that point in the calculation, distracting from what's using CPU now.
So what I'd like is a view that shows me "average CPU usage by method over the last X seconds", that throws away all data that's older than X seconds automatically, and calculates just the averages over those last X samples (assuming 1 sample per second). I wasn't able to find any option that allows me to do this; is this something that JProfiler just doesn't support, or haven't I looked hard enough?
These kind of exceptional circumstances can be analyzed with JProfiler's "exceptional method runs" feature.
In the call tree, select the method that shows the performance spike and select "Add As Exceptional Method" from the context menu.
Then, you can see the slowest invocations separately with all other invocations merged into a single node:
This screen cast shows the entire feature:
http://blog.ej-technologies.com/2011/02/methods-statistics-and-exceptional.html

Find the most executed section of java source code

I am using netbeans IDE 7.4. I want to find the lines of code that the most of running time is spent. I heard a little about profilers that can used to thread monitoring and etc... .
But I don't know (exactly!) how to find the section{s} of code that frequently used in my program. I want to find out the mechanism and equipments provided by JVM for that- not only using the third party packages(profilers and etc...).
You can profile the CPU with visualVM and you 'll find which methods are CPU consumming. You have to make filter (regex) to focus on your classes.
Suppose line L of your code, if removed, would cut the overall wall-clock time in its thread by 50%. Then if you, at a random time, dump the stacks of all running threads, locate your thread, and disregard all the levels that are not your code, there is a 50% chance that you will see line L in the remaining lines.
So if you do this 10 times, you will see line L about 5 times, give or take.
In fact, any line of your code, if you see it on more than one stack sample, if you can remove or bypass it, will save you a healthy fraction of time, guaranteed.
What's more, this method (while crude) will find any speedup that profilers can find, and more that they can't.
The math behind it can be found here.
Example: A worker thread is spending 80% of its time doing I/O and allocating memory in the process of parsing XML to build a data structure. You can see that the XML comes from a data structure in a different piece of code in the same thread. It's big code - you wouldn't have known this without the samples pointing it out. You only have to take two or three samples before you see it twice. Just bypass the XML - 5x speedup.

Blocking Behaviour - Concurrent Data Structures Java

I'm currently running a highly concurrent benchmark which accesses a ConcurrentSkipList from the Java collections. I'm finding that threads are getting blocked within that method, more precisely here:
java.util.concurrent.ConcurrentSkipListMap.doGet(ConcurrentSkipListMap.java:828)
java.util.concurrent.ConcurrentSkipListMap.get(ConcurrentSkipListMap.java:1626)
(This is obtained through, over 10 seconds interval, printing the stack trace of each individual thread). This is still not resolved after minutes
Is this is an expected behaviour of collections? What are the concurrent other collections likely to experience blocking?
Having tested, it, I exhibit similar behaviour with ConcurrentHashMaps:
java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:994)
This could well be a spurious result.
When you ask Java for a dump of all its current stack traces, it tells each thread to wait when it gets to a yield point, then it captures the traces, and then it resumes all the threads. As you can imagine, this means that yield points are over-represented in these traces; these include synchronized methods, volatile accesses, etc. ConcurrentSkipListMap.head, a volatile field, is accessed in doGet.
See this paper for a more detailed analysis.
Solaris Studio has a profiler that captures stack traces from the OS and translates them to Java stack traces. This does away with the bias toward yield points and gives you more accurate results; you might find that doGet goes away almost entirely. I've only had luck running it on Linux, and even then it's not out-of-the-box. If you're interested, ask me in the comments how to set it up, I'd be happy to help.
As an easier approach, you could wrap your calls to ConcurrentSkipList.get with System.nanoTime() to get a sanity check on whether this is really where your time is going. Figure out how much time you're spending in that method, and confirm whether it's about what you'd expect given that the profiler says you're spending such-and-such percent of your time in that method.
Shameless self-plug: I created a simple program that demonstrates this a few months ago for a presentation at work. If you run it against a profiler, it should show that SpinWork.work appears a lot, while HardWork.work doesn't show up at all -- even though the latter actually takes a lot more time. It doesn't contain yield points.
Well, it isn't blocking in its truest form. Blocking implies the suspension of thread activity. ConcurrentSkipListMap is non-blocking, it will spin until is succeeds. But it also guarantees it will eventually succeed (that is it shouldn't get into an infinite loop)
That being said, unless you are doing many many gets and many many puts asynchronously I don't see how you can be spending so much time in this method.
If you can re-create it with an example and share with us that may help.
ConcurrentHashMap.get is a volatile read, which means, the CPU must finish all outstanding writes before it can perform the read. This is called a STORE/LOAD barrier. Depending on how much is going on in the other thread/cores, this can take a long time. See https://www.cs.umd.edu/users/pugh/java/memoryModel/jsr-133-faq.html.

Analyze the "runnable" thread dump under high load server side

The thread dump from a java based application is easy to get but hard to analyze!
There is something interesting we can see from the thread dump.
Suppose we are in a heavily load java web app. And I often take 10 or 15 thread dump files during the peak time ( under the high loads) to make the wide data. So first, there is no doubt that we need tune up the codes whose status is Blocked and Monitor. I can't dig into more with the rest Runnable threads.
So, if the "method" appears from the thread dump many time, can we say it is slower or faster than the other under high load server side? Of course, we can use more profiling tools to check that but the thread dump may give us the same useful information, especially we are in the production env.
Thank you in adv!
Vance
I would look carefully at the call stack of the thread(s) in each dump, regardless of the thread's state, and ask "What exactly is it doing or waiting for at that point in time, and why?"
You don't just look at functions on the call stack, you look at the lines of code where the functions are called. That tells you the local reason for the call. If you combine the local reasons for the calls on the stack, that gives you the complete reason (the "why chain") for what the thread was doing at that time.
What you're looking for is bad reasons that appear on more than one snapshot.
(It only takes one unnecessary call on the stack to make that whole sample improvable, so the deeper the stack, the better the hunting.)
Since they're bad, they can be fixed and you will get improved performance. The amount of the performance improvement is roughly the fraction of snapshots that show them.
That's this technique.
I'd say that if a method appears very often in a thread dump you'd have to either
optimize that method since it is called many times or
check whether that method is called too often
If you see that the thread runs spend lots of time in a particluar method, there might also be some bug (like we had with using a special regex that suffered from a bug in the regex engine). So you'd need to investigate that.

Profiling native methods in Java - strange results

I have been using Yourkit 8.0 to profile a mathematically-intensive application running under Mac OS X (10.5.7, Apple JDK 1.6.0_06-b06-57), and have noticed some strange behavior in the CPU profiling results.
For instance - I did a profiling run using sampling, which reported that 40% of the application's 10-minute runtime was spent in the StrictMath.atan method. I found this puzzling, but I took it at it's word and spent a bit of time replacing atan with an extremely simply polynomial fit.
When I ran the application again, it took almost exactly the same time as before (10 minutes) - but my atan replacement showed up nowhere in the profiling results. Instead, the runtime percentages of the other major hotspots simply increased to make up for it.
To summarize:
RESULTS WITH StrictMath.atan (native method)
Total runtime: 10 minutes
Method 1: 20%
Method 2: 20%
Method 3: 20%
StrictMath.atan: 40%
RESULTS WITH simplified, pure Java atan
Total runtime: 10 minutes
Method 1: 33%
Method 2: 33%
Method 3: 33%
(Methods 1,2,3 do not perform any atan calls)
Any idea what is up with this behavior? I got the same results using EJ-Technologies' JProfiler. It seems like the JDK profiling API reports inaccurate results for native methods, at least under OS X.
This can happen because of inconsistencies of when samples are taken. So for example, if a method uses a fair amount of time, but doesn't take very long to execute, it is possible for the sampling to miss it. Also, I think garbage collection never happens during a sample, but if some code causes a lot of garbage collection it can greatly contribute to a slowdown without appearing in the sample.
In similar situation I've found it very helpful to run twice, once with tracing as well as once with sampling. If a method appears in both it is probably using a lot of CPU, otherwise it could well just be an artifact of the sampling process.
Since you're using a Mac, you might try Apple's Shark profiler (free download from ADC) which has Java support and Apple's performance group has put a fair amount of time into the tool.
As Nick pointed out, sampling can be misleading if the sample interval is close enough to the function's execution time and the profiler rarely checks when the function is actually executing. I don't know whether YourKit supports this but in Shark you can change the sampling interval to something other than the default 10ms and see if the results are substantially different.
There's also a separate call-tracing mode which will record every function enter/return - this completely avoids the possibility of aliasing errors but collects a ton of data and higher overhead, which could matter if your app is doing any sort of real-time processing.
You may want to look at the parameters that are being passed into the three methods. It may be that the time is being spent generating return values or in methods that are creating a lot of temporary objects.
I find YourKit greatly exaggerates the cost of calling sub-methods (due to its logging method, I assume). If you only follow the advice that the profile gives you you'll end up just merging functions with no real gain as the HotSpot usually does excellently for this.
Therefore, I'd highly advise to test batches completely outside profilers too, to get a better idea whether changes are really beneficial (it may seem obvious but this cost me some development time).
It worth noting that Java methods can be inlined if they are small enough, however native methods are inlined under different rules. If a method is inlined it doesn't appear in the profiler (certainly not YourKit anyway)
Profilers can be like that.
This is the method I use.
Works every time.
And this is why.

Categories