Debugging a slow Java method

Debugging a slow Java method - java

VisualVM is showing me that a particular method is taking a long time to execute.
Are there any widely used strategies for looking at the performance (in regards to time) of a Java method?
My gut feeling is that the sluggish response time will come from a method that is somewhere down the call hierarchy from the one VisualVM is reporting but I think getting some hard numbers is better than fishing around in the code based on an assumption when it comes to performance.

VisualVM should be showing you the methods which use the most CPU. If the biggest user is your method, it means it not a method you are calling unless you are calling many methods which individually look small but in total are more.
I suggest you take the difference of the methods this method calls and your total. That is how much your method is adding which being profiled. Note: how much it adds when not profiled could be less as the profiler has an overhead.

You need to use tools like JProfiler, Yourkit etc. You can profile you code in depth & you can exactly catch which method is taking much time. You can go as much in depth hierarchy as you want with these tools.

Related

Why aren't all methods displayed in VisualVM profiler?

I am using VisualVM to see where my application is slow. But it does not show all methods, probably does not show all methods that delays the application.
I have a realtime application (sound processing) and have a time deficiency in few hundreds of microseconds.
Is it possible that VisualVM hides methods which are fast themselves?
UPDATE 1
I found slow method by sampler and guessing. It was toString() method which was called from debug logging which was turned off, but consuming a time.
Settings helped and now I know how to see it: it was depending on Start profiling from option.

Other than the filters mentioned by Ryan Stewart, here are a couple of additional reasons why methods may not appear in the profiler:
Sampling profiles are inherently stochastic: a sample of the current stack of all threads is taken every N ms. Some methods which actually executed but which aren't caught in any sample during your run just won't appear. This is generally not too problematic since the very fact they didn't appear in any sample, means that with very high probability these are methods which aren't taking up a large part of your runtime.
When using instrumentation based sampling in visualvm (called "CPU profiling"), you need to define the entry point for profiled methods (the "Start profiling from" option). I have found this fails for methods in the default package, and also won't pick up time in methods which are current running when the profiler is attached (for the duration of the current invocation - it will get later invocations. This is probably because the instrumented method won't be swapped in until the current invocation finishes.
Sampling is subject to a potentially serious issue with stack traced based profiling, which is that samples are only taken at safe points in the code. When a trace is requested, each thread is forced to a safe point, then the stack is taken. In some cases you may have a hot spot in your code which does no safe point polling (common for simple loops that the JIT can guarantee terminate after a fixed number of iterations), interleaved with a bit of code that does have a safepoint poll. Your stacks will always show your process in the safe-point code, never in the safe-point free code, even though the latter may be taking the majority of CPU time.

I don't have it in front of my at the moment, but before you start profiling, there's a settings pane that's hidden by default and lets you enter regexes for filtering out methods. By default, it filters out a lot of the core JDK stuff.

I had the same problem with my pet project. I added a package name and the problem is solved. I don't understand why. VisualVM 1.4.1, jdk1.8.0_181 and jdk-10.0.2, Windows 10

Should I inline long code in a loop, or move it in a separate method?

Assume I have a loop (any while or for) like this:
loop{
A long code.
}
From the point of time complexity, should I divide this code in parts, write a function outside the loop, and call that function repeatedly?
I read something about functions very long ago, that calling a function repeatedly takes more time or memory or like something, I don't remember it exactly. Can you also provide some good reference about things like this (time complexity, coding style)?
Can you also provide some reference book or tutorial about heap memory, overheads etc. which affects the performance of program?

The performance difference is probably very minimal in this case. I would concentrate on clarity rather than performance until you identify this portion of your code to be a serious bottleneck.
It really does depend on what kind of code you're running in the loop, however. If you're just doing a tiny mathematical operation that isn't going to take any CPU time, but you're doing it a few hundred thousand times, then inlining the calculation might make sense. Anything more expensive than that, though, and performance shouldn't be an issue.

There is an overhead of calling a function.
So if the "long code" is fast compared to this overhead (and your application cares about performance), then you should definitely avoid the overhead.
However, if the performance is not noticably worse, it's better to make it more readable, by using a (or better multiple) function.

Rule one of performance optmisation: Measure it.
Personally, I go for readable code first and then optimise it IF NECESSARY. Usually, it isn't necessary :-)
See the first line in CHAPTER 3 - Measurement Is Everything
"We should forget about small efficiencies, say about 97% of the time:
premature optimization is the root of all evil." - Donald Knuth
In this case, the difference in performance will probably be minimal between the two solutions, so writing clearer code is the way to do it.

There really isnt a simple "tutorial" on performance, it is a very complex subject and one that even seasoned veterans often dont fully understand. Anyway, to give you more of an idea of what the overhead of "calling" a function is, basically what you are doing is "freezing" the state of your function(in Java there are no "functions" per se, they are all called methods), calling the method, then "unfreezing", where your method was before.
The "freezing" essentially consists of pushing state information(where you were in the method, what the value of the variables was etc) on to the stack, "unfreezing" consists of popping the saved state off the stack and updating the control structures to where they were before you called the function. Naturally memory operations are far from free, but the VM is pretty good at keeping the performance impact to an absolute minimum.
Now keep in mind Java is almost entirely heap based, the only things that really have to get pushed on the stack are the value of pointers(small), your place in the program(again small), and whatever primitives you have local to your method, and a tiny bit of control information, nothing else. Furthermore, although you cannot explicitly inline in Java(though Im sure there are bytecode editors out there that essentially let you do that), most VMs, including the most popular HotSpot VM, will do this automatically for you. http://java.sun.com/developer/technicalArticles/Networking/HotSpot/inlining.html
So the bottom line is pretty much 0 performance impact, if you want to verify for yourself you can always run benchmarking and profiling tools, they should be able to confirm it for you.

From a execution speed point of view it shouldn't matter, and if you still believe this is a bottleneck it is easy to measure.
From a development performance perspective, it is a good idea to keep the code short. I would vote for turning the loop contents into one (or more) properly named methods.

Forget it! You can't gain any performance by doing the job of the JIT. Let JIT inline it for you. Keep the methods short for readability and also for performance, as JIT works better with short methods.
There are microptimizations which may help you gain some performance, but don't even think about them. I suggest the following rules:
Write clean code using appropriate objects and algorithms for readability and for performance.
In case the program is too slow, profile and identify the critical parts.
Think about improving them using better objects and algorithms.
As a last resort, you may also consider microoptimizations.

java debug perfomance issues - best practices

Just curious to know if there are a list of steps that I can use as guidelines to debug performance issues to pinpoint what is taking the most time. There are a myriad of tools starting with logging, timing methods, load test tools, timing database queries and so on....
considering that there are so many different things, is there a list of things that are at the top of the list.
if so please let me

Check the machine physically has enough RAM. Paging will kill applications.
Check what proportion of the application's time is spent in garbage collection. A high proportion means you'll benefit from heap or GC tuning.
Run the app in a profiler and put it through its paces, monitoring CPU usage. Look for those methods where the CPU spends all its time. A good profiler will let you roll up the time spent in third party code where you have no control, allowing you to identify the hot spots in your own code.
For the top hot spots in your application, work out where the time is being spent. Is it all I/O? Calculations? Multi-thread synchronisation? Object allocation/deallocation? A poor algorithm with n-squared or worse complexity? Something else?
Deal with the hot spots one at a time, and each time you change something, measure and work out whether you've actually improved anything. Roll back the ineffective changes and work out where the problem has moved to if the change worked.

There is nothing really specific to Java about something like this, with any language/framework/tool you should follow the same pattern:
Measure the performance before you change a single thing
Hypothesize about possible causes/fixes
Implement the change
Measure performance after the change to compare with #1
Repeat until happy

Measure
MEASURE!!!!!
Compare Apples to Apples. Don't run your tests on a busy subnet(especially don't try to justify this ludicrous practice by saying - "I want the circumstances to be realistic")
Measure- Capture time stamps at each discrete step.
Note that although there is a relationship, throughput and response time are not the same thing
After you make a change... MEASURE!!!!! Never say to yourself, it seems better. You know how you know its better? compare measurement 1 to measurement 2
Test one thing at a time. Don't create one uber performance suite that attempts to simulate realistic conditions. Its too much and you are setting yourself up to be overwhelmed. Test for message size. Test for concurrency. Test in isolation
Once you start to isolate the bottlenecks then the next steps will start to feel more natural, fine tuning your tests will become easier, you may choose at that point to hook up a profiler to investigate GC/CPU perf and memory consumption(VisualVM is good and free).The point is treat performance issues like a binary search. Start by measuring everything and continually subdivide the problem to it reveals itself.

The first and most important step in any kind of perfomance tuning is to identify what is slow, and measure just how slow it is. In most cases (particularly if the performance problem is easy to reproduce), a profiler is the most effective tool for that, as it will give you detailed statistics on execution time, breaking it down to single methods, without having to manually instrument your program.

Check DB queries
Check Statements in loop, statement in loops make application slow instead use prepared statement/callable statements
Capture time stamps at each discrete step
Identify hot spots area where time is being spent, like I/O, Calculations,
multithreaded synchronization, garbage collection and look for Poor algorithms.

Profiling native methods in Java - strange results

I have been using Yourkit 8.0 to profile a mathematically-intensive application running under Mac OS X (10.5.7, Apple JDK 1.6.0_06-b06-57), and have noticed some strange behavior in the CPU profiling results.
For instance - I did a profiling run using sampling, which reported that 40% of the application's 10-minute runtime was spent in the StrictMath.atan method. I found this puzzling, but I took it at it's word and spent a bit of time replacing atan with an extremely simply polynomial fit.
When I ran the application again, it took almost exactly the same time as before (10 minutes) - but my atan replacement showed up nowhere in the profiling results. Instead, the runtime percentages of the other major hotspots simply increased to make up for it.
To summarize:
RESULTS WITH StrictMath.atan (native method)
Total runtime: 10 minutes
Method 1: 20%
Method 2: 20%
Method 3: 20%
StrictMath.atan: 40%
RESULTS WITH simplified, pure Java atan
Total runtime: 10 minutes
Method 1: 33%
Method 2: 33%
Method 3: 33%
(Methods 1,2,3 do not perform any atan calls)
Any idea what is up with this behavior? I got the same results using EJ-Technologies' JProfiler. It seems like the JDK profiling API reports inaccurate results for native methods, at least under OS X.

This can happen because of inconsistencies of when samples are taken. So for example, if a method uses a fair amount of time, but doesn't take very long to execute, it is possible for the sampling to miss it. Also, I think garbage collection never happens during a sample, but if some code causes a lot of garbage collection it can greatly contribute to a slowdown without appearing in the sample.
In similar situation I've found it very helpful to run twice, once with tracing as well as once with sampling. If a method appears in both it is probably using a lot of CPU, otherwise it could well just be an artifact of the sampling process.

Since you're using a Mac, you might try Apple's Shark profiler (free download from ADC) which has Java support and Apple's performance group has put a fair amount of time into the tool.
As Nick pointed out, sampling can be misleading if the sample interval is close enough to the function's execution time and the profiler rarely checks when the function is actually executing. I don't know whether YourKit supports this but in Shark you can change the sampling interval to something other than the default 10ms and see if the results are substantially different.
There's also a separate call-tracing mode which will record every function enter/return - this completely avoids the possibility of aliasing errors but collects a ton of data and higher overhead, which could matter if your app is doing any sort of real-time processing.

You may want to look at the parameters that are being passed into the three methods. It may be that the time is being spent generating return values or in methods that are creating a lot of temporary objects.

I find YourKit greatly exaggerates the cost of calling sub-methods (due to its logging method, I assume). If you only follow the advice that the profile gives you you'll end up just merging functions with no real gain as the HotSpot usually does excellently for this.
Therefore, I'd highly advise to test batches completely outside profilers too, to get a better idea whether changes are really beneficial (it may seem obvious but this cost me some development time).

It worth noting that Java methods can be inlined if they are small enough, however native methods are inlined under different rules. If a method is inlined it doesn't appear in the profiler (certainly not YourKit anyway)

Profilers can be like that.
This is the method I use.
Works every time.
And this is why.

Java Optimizations

I am wondering if there is any performance differences between
String s = someObject.toString(); System.out.println(s);
and
System.out.println(someObject.toString());
Looking at the generated bytecode, it seems to have differences. Is the JVM able to optimize this bytecode at runtime to have both solutions providing same performances ?
In this simple case, of course solution 2 seems more appropriate but sometimes I would prefer solution 1 for readability purposes and I just want to be sure to not introduce performances "decreases" in critical code sections.

The creation of a temporary variable (especially something as small as a String) is inconsequential to the speed of your code, so you should stop worrying about this.
Try measuring the actual time spent in this part of your code and I bet you'll find there's no performance difference at all. The time it takes to call toString() and print out the result takes far longer than the time it takes to store a temporary value, and I don't think you'll find a measurable difference here at all.
Even if the bytecode looks different here, it's because javac is naive and your JIT Compiler does the heavy lifting for you. If this code really matters for speed, then it will be executed many, many times, and your JIT will select it for compilation to native code. It is highly likely that both of these compile to the same native code.
Finally, why are you calling System.out.println() in performance-critical code? If anything here is going to kill your performance, that will.

If you have critical code sections that demand performance, avoid using System.out.println(). There is more overhead incurred by going to standard output than there ever will be with a variable assignment.
Do solution 1.
Edit: or solution 2

There is no* code critical enough that the difference between your two samples makes any difference at all. I encourage you to test this; run both a few million times, and record the time taken.
Pick the more readable and maintainable form.
* Exaggerating for effect. If you have code critical enough, you've studied it to learn this.

The generated bytecode is not a good measure of the the performance of an given piece of code, since this bytecode will get analysed, optimised and ( in case of the server compiler ) re-analysed and re-optimised if it is deemed to be a performance bottleneck.
When in doubt, use a profiler.

Compared to output to the console, I doubt that any difference in performance between the two is going to be measurable. Don't optimize before you have measured and confirmed that you have a problem.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.