I am working on a simple text markup Java Library which should be, amongst other requirements, fast.
For that purpose, I did some profiling, but the results give me worse numbers that are then measured when running in non-profile mode.
So my question is - how much reliable is the profiling? Does that give just an informational ratio of the time spent in methods? Does that take JIT compiler into account, or is the profiling mode only interpreted? I use NetBeans Profiler and Sun JDK 1.6.
Thanks.
When running profiling, you'll always incur a performance penalty as something has to measure the start/stop time of the methods, keep track of the objects of the heap (for memory profiling), so there is a management overhead.
However, it will give you clear pointers to find out where bottlenecks are. I tend to look for the methods where the most cumulative time is spent and check whether optimisations can be made. It is also useful to determine whether methods are called unnecessarily.
With methods that are very small, take the profile results with a pinch of salt, sometimes the process of measuring can take more time than the method call itself and skew the results (it might appear that a small, often-called method has more of a performance impact).
Hope that helps.
Because of instrumentation profiled code on average will run slower than non-profiled code. Measuring speed is not the purpose of profiling however.
The profiling output will point you to bottlenecks, places that threads spend more time in, code that behaves worse than expected, or possible memory leaks.
You can use those hints to improve said methods and profile again until you are happy with the results.
A profiler will not be a solution to a coding style that is x% slower than optimal however, you still need to spend time fine-tuning those parts of your code that are used more often than others.
I'm not surprised by the fact that you get worst results when profiling your application as instrumenting java code will typically always slow its execution. This is actually nicely captured by the Wikipedia page on Profiling which mentions that instrumentation can causes changes in the performance of a program, potentially causing inaccurate inaccurate results and heisenbugs (due to the observer effect: observers affect what they are observing, by the mere act of observing it alone).
Having that said, if you want to measure speed, I think that you're not using the right tool. Profilers are used to find bottlenecks in an application (and for that, you don't really care of the overall impact). But if you want to benchmark your library, you should use a performance testing tool (for example, something like JMeter) that will be able to give you an average execution time per call. You will get much better and more reliable results with the right tool.
The Profiling should have no influence on the JIT compiler. The code inserted to profile your application however will slow the methods down quite a bit.
Profillers work on severall different models, either they insert code to see how long and how often methods are running or they only take samples by repeatedly polling what code is currently executed.
The first will slow down your code quite a bit, while the second is not 100% accurate, as it may miss some method calls.
Profiled code is bound to run slower as mentioned in most of the previous comments. I would say, use profiling to measure only the relative performance of various parts of your code (say methods). Do not use the measures from a profiler as an indicator of how the code would perform overall (unless you want a worst-case measure, in which case what you have would be an overestimated value.)
I have found I get different result depending on which profiler I use. However the results are often valid, but a different perspective on the problem. Something I often do when profiling CPU usage is to enable memory allocation profiling. This often gives me different CPU results (due to the increased overhead caused by the memory profiling) and can give me some useful places to optimise.
Related
i am exploring different techniques to optimize my applicationcode which will help to boost the performance. for this i was going through various algos for time and spcae complexity which is where i came to know about JIT assebly logs which can also be useful. I have tried for my some sample codes but not found much to optimize.
Does it really help to boost the performance?
Analysing the assembly code is useful for understanding super low level micro-tuning optimisations. However, there is likely to be far more low hanging fruit for you to worry about.
I would start by looking at your
IO, is your network, disk activity and network services working efficiently. Look at system profiling tools to help. => system monitoring and tools to measure the performance of your system.
memory allocation. Are you producing a minimum of garbage? => memory profiler.
CPU consumption. Are there any method which seem to be called more than they should or could be optimised, => CPU profiler.
A good low level profiler is Java Flight Recorder. I would only use this AFTER you have checked all the tools above, however this is usually as low as I ever need to go.
To go lower there is tools like JITWatch which can show you the byte code and assembly for any line of code (and much more) and JMH with perfasm which can show you your hot assembly instructions. These tools are not simple to set up and work best for very short/simple sections of code.
Usually the biggest head aches are outside your Java code. If you get to worrying about assembly, you either have a trivial micro-benchmark, or you have a very highly tuned system.
JIT is one complicated universe in itself. I believe understanding how it is designed to work will be more useful than actually looking at the logs. Optimizations like Escape Analysis have rather interesting behavioral patterns. For the same set of input on the same instructions, JIT may or may not do certain optimizations based on the overall JVM state. In fact for successive JVM runs, it is not guaranteed that JIT will behave in the same way.
So, bottom line- understand how JVM works. Understand how java works and when to use what. Frankly don't worry too much about JIT optimizations unless you really need to worry about them.
PS : I am not saying looking at low level logs is bad.. I am just saying that logs / behavior will not be consistent across system architectures (simplest example - Memory barriers are emitted in different ways by different JVMs).
I wonder why I should use JMH for benchmarking if I can switch off JIT?
Is JMH not suppressing optimizations which can be prevented by disabling JIT?
TL;DR; Assess the Formula 1 performance by riding a bycicle at the same track.
The question is very odd, especially if you ask yourself a simple follow-up question. What would be the point of running the benchmark in the conditions that are drastically different from your production environment? In other words, how would a knowledge gained running in interpreted mode apply to real world?
The issue is not black and white here: you need optimizations to happen as they happen in the real world, and you need them broken in some carefully selected places to make a good experimental setup. That's what JMH is doing: it provides the means for constructing the experimental setups. JMH samples explain the intricacies and scenarios quite well.
And, well, benchmarking is not about fighting the compiler only. Lots and lots of non-compiler (and non-JVM!) issues need to be addressed. Of course, it can be done by hand (JMH is not magical, it's just a tool that was also written by humans), but you will spend most of your time budget addressing simple issues, while having no time left to address the really important ones, specific to your experiment.
The JIT is not bulletproof and almighty. For instance, it will not kick in before a certain piece of code is run a certain number of times, or it will not kick in if a piece of bytecode is too large/too deeply buried/etc. Also, consider live instrumentation which, more often than not, prevents the JIT from operating at all (think profiling).
Therefore the interest remains in being able to either turn it off or on; if a piece of code is indeed time critical, you may want to know how it will perform depending on whether the JIT kicks in.
But those situations are very rare, and are becoming more and more rare as the JVM (therefore the JIT) evolves.
To see if I can really take any benefit of native code (written C) by using JNI (instead of writing complete java application), I want to measure overhead of calling through JNI. What is the best way to measure this overhead?
I wouldn't use a profiler to do quantitative performance testing. Profiling tends to introduce distortions into the actual timing numbers.
I'd create a benchmark that performed one of the actual calculations that you are considering doing in C and compare the C + JNI + Java version against a pure Java version. Be sure that you are comparing apples and apples; i.e. profile and optimize both versions before you benchmark them.
To do the actual benchmarking, I'd construct a loop that performed the calculation a large number of times, record the timings over a large number of iterations and compare. Make sure that you take account of JVM warmup effects; e.g. class loading, JIT compilation and heap warmup.
Like Thihara, I doubt that using C + JNI will help much. And even if it does, you need to take account of the downsides of JNI; e.g. C code portability, platform specific build issues ... and possible JVM hard crashes if your native code has bugs.
Measuring the overhead alone may give you strange results. I'd code a small part of the performance-critical code in both Java and C++ and measure the program performance, e.g., using caliper (microbenchmarking is quite a complicated thing and hardly anybody gets it right).
I would not use any profiler, especially C++ profiler, since the performance of the C++ part alone doesn't matter and since profilers may distort the results.
Use a C++ profiler and a Java profiler. They are available in IDEs for Java. I can only assume in the case of C++ though. And whatever test you design please run through a substantial number of loops to minimize environmental errors.
Oh and do post the results back since I'm also curious to see if there are any improvement in using native code over modern JVMs. Chances are though you won't see a huge performance improvement in native code.
For university, I perform bytecode modifications and analyze their influence on performance of Java programs. Therefore, I need Java programs---in best case used in production---and appropriate benchmarks. For instance, I already got HyperSQL and measure its performance by the benchmark program PolePosition. The Java programs running on a JVM without JIT compiler. Thanks for your help!
P.S.: I cannot use programs to benchmark the performance of the JVM or of the Java language itself (such as Wide Finder).
Brent Boyer, wrote a nice article series for IBM developer works: Robust Java benchmarking, which is accompanied by a micro-benchmarking framework which is based on a sound statistical approach. Article and the Resources Page.
Since, you do that for university, you might be interested in Andy Georges, Dries Buytaert, Lieven Eeckhout: Statistically rigorous java performance evaluation in OOPSLA 2007.
Caliper is a tool provided by Google for micro-benchmarking. It will provide you with graphs and everything. The folks who put this tool together are very familiar with the principle of "Premature Optimization is the root of all evil," (to jwnting's point) and are very careful in explaining the role of benchmarking.
Any experienced programmer will tell you that premature optimisation is worse than no optimisation.
It's a waste of resources at best, and a source of infinite future (and current) problems at worst.
Without context, any application, even with benchmark logs, will tell you nothing.
I may have a loop in there that takes 10 hours to complete, the benchmark will show it taking almost forever, but I don't care because it's not performance critical.
Another loop takes only a millisecond but that may be too long because it causes me to fail to catch incoming data packets arriving at 100 microsecond intervals.
Two extremes, but both can happen (even in the same application), and you'd never know unless you knew that application, how it is used, what it does, under which conditions and requirements.
If a user interface takes 1/2 second to render it may be too long or no problem, what's the context? What are the user expectations?
I have an application, which I have re-factored so that I believe it is now faster. One can't possibly feel the difference, but in theory, the application should run faster. Normally I would not care, but as this is part of my project for my master's degree, I would like to support my claim that the re-factoring did not only lead to improved design and 'higher quality', but also an increase in performance of the application (a small toy-thing - a train set simulation).
I have toyed with the latest VisualVM thing today for about four hours but I couldn't get anything helpful out of it. There isn't (or I haven't found it) a way to simply compare the profiling results taken from the two versions (pre- and post- refactoring).
What would be the easiest, the most straightforward way of simply telling the slower from the faster version of the application. The difference of the two must have had an impact on the performance. Thank you.
I would suggest creating a few automated tests that simulate real usage of the application. Create enough tests to have a decent benchmark.
Run the test suite for both versions of the app under various loads.
That should give you a pretty real world time of performance. Doing it at a lower level may not give you an accurate truth.
I assume you can find a good way to measure the difference, and you can say it is due to the refactoring if there's nothing else you did, but I would be bothered by that, because that's not really understanding why it's faster.
Here's an example of really aggressive performance tuning.
What convinces me is if it can be shown that
A particular line of code, or small set of lines, is directly responsible for approximate fraction F% of overall wall-clock time,
It is shown that that line or lines is not really necessary in the sense that a way can be found to use it a lot less or maybe not at all,
That change results in a reduction in overall wall-clock time of approximately F%.