I wonder why I should use JMH for benchmarking if I can switch off JIT?
Is JMH not suppressing optimizations which can be prevented by disabling JIT?
TL;DR; Assess the Formula 1 performance by riding a bycicle at the same track.
The question is very odd, especially if you ask yourself a simple follow-up question. What would be the point of running the benchmark in the conditions that are drastically different from your production environment? In other words, how would a knowledge gained running in interpreted mode apply to real world?
The issue is not black and white here: you need optimizations to happen as they happen in the real world, and you need them broken in some carefully selected places to make a good experimental setup. That's what JMH is doing: it provides the means for constructing the experimental setups. JMH samples explain the intricacies and scenarios quite well.
And, well, benchmarking is not about fighting the compiler only. Lots and lots of non-compiler (and non-JVM!) issues need to be addressed. Of course, it can be done by hand (JMH is not magical, it's just a tool that was also written by humans), but you will spend most of your time budget addressing simple issues, while having no time left to address the really important ones, specific to your experiment.
The JIT is not bulletproof and almighty. For instance, it will not kick in before a certain piece of code is run a certain number of times, or it will not kick in if a piece of bytecode is too large/too deeply buried/etc. Also, consider live instrumentation which, more often than not, prevents the JIT from operating at all (think profiling).
Therefore the interest remains in being able to either turn it off or on; if a piece of code is indeed time critical, you may want to know how it will perform depending on whether the JIT kicks in.
But those situations are very rare, and are becoming more and more rare as the JVM (therefore the JIT) evolves.
Related
Is there a way to achieve JIT performance while removing JIT overhead? Preferably by compiling a class file to an native image.
I have investigated GCJ, but even for a simple program, GCJ output's performance is much worse than Java JIT.
You could try Excelsior.
http://www.excelsior-usa.com/jet.html
I've had good experiences with this in the past (but it was a long time ago)
There have been in the past a number of "static" compilers for Java, but I don't know that any are currently available. To the best of my knowledge the last one in use was the "Java Transformer" for the IBM iSeries "Classic JVM", but that JVM was deprecated in favor of the J9 JVM.
The "Java Transformer" did quite well, but, as others have noted, it could not take advantage of all of the info that a JITC has available at runtime (though it did manage to take advantage of some of the runtime info).
(And it should be noted that "JITC overhead" is really minimal. Compilation occurs pretty quickly and efficiently in most cases. The problem is that compilation doesn't even start until the interpreter has run long enough to collect statistics and trigger the JITC.)
The simplest solution is often to warmup your code on startup. If you have a server based application, the cost of startup isn't as important as the cost when the service is used. In this situation you can warmup all the critical code by calling it 10K - 20K times which triggers all that code to compile.
This can take less than a second in simple cases so has very little impact on startup and means you are using compiled code when the service is used.
If you have a client based application you usually have a lot of processing power for just one user in which case the cost of the background JIT is less important.
The moral of the story is; try to check you have a problem to solve before diving into a solution. Very often questions on stack over flow are about problems which have either a) already been solved or b) are not a significant problem in the first place.
Measuring the extent of your problem or performance is the best guide as to what matters and what doesn't. If you don't measure, you are just guessing. (Even if you have ten+ year experience performance tuning Java systems)
I have just found my answer here:
Why is Java faster when using a JIT vs. compiling to machine code?
Quote from top answer:
This means that you cannot write a AOT compiler which covers ALL Java
programs as there is information available only at runtime about the
characteristics of the program.
I'd recommend you to find the root cause of inferior performance of your Java code before trying out AOT compilation or rewriting any portions in C++.
Head over to http://www.javaperformancetuning.com/ for tons of information and links.
For university, I perform bytecode modifications and analyze their influence on performance of Java programs. Therefore, I need Java programs---in best case used in production---and appropriate benchmarks. For instance, I already got HyperSQL and measure its performance by the benchmark program PolePosition. The Java programs running on a JVM without JIT compiler. Thanks for your help!
P.S.: I cannot use programs to benchmark the performance of the JVM or of the Java language itself (such as Wide Finder).
Brent Boyer, wrote a nice article series for IBM developer works: Robust Java benchmarking, which is accompanied by a micro-benchmarking framework which is based on a sound statistical approach. Article and the Resources Page.
Since, you do that for university, you might be interested in Andy Georges, Dries Buytaert, Lieven Eeckhout: Statistically rigorous java performance evaluation in OOPSLA 2007.
Caliper is a tool provided by Google for micro-benchmarking. It will provide you with graphs and everything. The folks who put this tool together are very familiar with the principle of "Premature Optimization is the root of all evil," (to jwnting's point) and are very careful in explaining the role of benchmarking.
Any experienced programmer will tell you that premature optimisation is worse than no optimisation.
It's a waste of resources at best, and a source of infinite future (and current) problems at worst.
Without context, any application, even with benchmark logs, will tell you nothing.
I may have a loop in there that takes 10 hours to complete, the benchmark will show it taking almost forever, but I don't care because it's not performance critical.
Another loop takes only a millisecond but that may be too long because it causes me to fail to catch incoming data packets arriving at 100 microsecond intervals.
Two extremes, but both can happen (even in the same application), and you'd never know unless you knew that application, how it is used, what it does, under which conditions and requirements.
If a user interface takes 1/2 second to render it may be too long or no problem, what's the context? What are the user expectations?
Optimizations based on escape analysis is a planned feature for Proguard. In the meantime, are there any existing tools like proguard that already do optimizations which require escape analysis?
Yes, I think the Soot framework performs escape analysis.
What do you expect from escape analysis on compiler level? Java classes are more like object files in C - they are linked in the JVM, hence the escape analysis can be performed only on single-method level, which is of limited usability and will hamper the debugging (e.g. you will have lines of code through which you can not step).
In Java's design, the compiler is quite dumb - it checks for correctness (like Lint), but doesn't try to optimize. The smart pieces are put in the JVM - it uses multiple optimization techniques to yield well performing code on the current platform, under the current conditions. Since the JVM knows all the code that is currently loaded it can assume a lot more than the compiler and perform speculative optimizations which are reverted the moment the assumptions are invalidated. HotSpot JVM can replace code with more optimized version on the fly while the function is running (e.g. in the middle of a loop as the code gets 'hotter').
When not in debugger, variables with non-overlapping lifetimes are collapsed, invariants are hoisted out of loops, loops are unrolled, etc. All this happens in the JIT-ted code and is done dependent on how much time is spent in this function (it does not make much sense to spend time optimizing code that never runs). If we perform some of these optimizations upfront, the JIT will have less freedom and the overall result might be a net negative.
Another optimization is stack allocation of objects that do not escape the current method - this is done in certain cases, though I read a paper somewhere that the time to perform rigorous escape analysis vs the time gained by optimizations suggests that it's not worth it, so the current strategy is more heuristic.
Overall, the more information the JVM has about your original code, the better it can optimize it. And the optimizations the JVM does are constantly improving, hence I would think about compiled code optimizations only when speaking about very restricted and basic JVMs like mobile phones. In these cases you want to run your application through obfuscator anyway (to shorten class names, etc.)
I am working on a simple text markup Java Library which should be, amongst other requirements, fast.
For that purpose, I did some profiling, but the results give me worse numbers that are then measured when running in non-profile mode.
So my question is - how much reliable is the profiling? Does that give just an informational ratio of the time spent in methods? Does that take JIT compiler into account, or is the profiling mode only interpreted? I use NetBeans Profiler and Sun JDK 1.6.
Thanks.
When running profiling, you'll always incur a performance penalty as something has to measure the start/stop time of the methods, keep track of the objects of the heap (for memory profiling), so there is a management overhead.
However, it will give you clear pointers to find out where bottlenecks are. I tend to look for the methods where the most cumulative time is spent and check whether optimisations can be made. It is also useful to determine whether methods are called unnecessarily.
With methods that are very small, take the profile results with a pinch of salt, sometimes the process of measuring can take more time than the method call itself and skew the results (it might appear that a small, often-called method has more of a performance impact).
Hope that helps.
Because of instrumentation profiled code on average will run slower than non-profiled code. Measuring speed is not the purpose of profiling however.
The profiling output will point you to bottlenecks, places that threads spend more time in, code that behaves worse than expected, or possible memory leaks.
You can use those hints to improve said methods and profile again until you are happy with the results.
A profiler will not be a solution to a coding style that is x% slower than optimal however, you still need to spend time fine-tuning those parts of your code that are used more often than others.
I'm not surprised by the fact that you get worst results when profiling your application as instrumenting java code will typically always slow its execution. This is actually nicely captured by the Wikipedia page on Profiling which mentions that instrumentation can causes changes in the performance of a program, potentially causing inaccurate inaccurate results and heisenbugs (due to the observer effect: observers affect what they are observing, by the mere act of observing it alone).
Having that said, if you want to measure speed, I think that you're not using the right tool. Profilers are used to find bottlenecks in an application (and for that, you don't really care of the overall impact). But if you want to benchmark your library, you should use a performance testing tool (for example, something like JMeter) that will be able to give you an average execution time per call. You will get much better and more reliable results with the right tool.
The Profiling should have no influence on the JIT compiler. The code inserted to profile your application however will slow the methods down quite a bit.
Profillers work on severall different models, either they insert code to see how long and how often methods are running or they only take samples by repeatedly polling what code is currently executed.
The first will slow down your code quite a bit, while the second is not 100% accurate, as it may miss some method calls.
Profiled code is bound to run slower as mentioned in most of the previous comments. I would say, use profiling to measure only the relative performance of various parts of your code (say methods). Do not use the measures from a profiler as an indicator of how the code would perform overall (unless you want a worst-case measure, in which case what you have would be an overestimated value.)
I have found I get different result depending on which profiler I use. However the results are often valid, but a different perspective on the problem. Something I often do when profiling CPU usage is to enable memory allocation profiling. This often gives me different CPU results (due to the increased overhead caused by the memory profiling) and can give me some useful places to optimise.
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
How should I unit test threaded code?
The classical unit testing is basically just putting x in and expecting y out, and automating that process. So it's good for testing anything that doesn't involve time. But then, most of the nontrivial bugs I've come across have had something to do with timing. Threads corrupt each others' data, or cause deadlocks. Nondeterministic behavior happens – in one run out of million. Hard stuff.
Is there anything useful out there for "unit testing" parts of multithreaded, concurrent systems? How do such tests work? Isn't it necessary to run the subject of such test for a long time and vary the environment in some clever manner, to become reasonably confident that it works correctly?
Most of the work I do these days involves multi-threaded and/or distributed systems. The majority of bugs involve "happens-before" type errors, where the developer assumes (wrongly) that event A will always happen before event B. But every 1000000th time the program is run, event B happens first, and this causes unpredictable behavior.
Additionally, there aren't really any good tools to detect timing issues, or even data corruption caused by race conditions. Tools like Helgrind and drd from the Valgrind toolkit work great for trivial programs, but they are not very useful in diagnosing large, complex systems. For one thing, they report false positives quite frequently (Helgrind especially). For another thing, it's difficult to actually detect certain errors while running under Helgrind/drd simply because programs running under Helgrind run almost 1000x slower, and you often need to run a program for quite a long time to even reproduce the race condition. Additionally, since running under Helgrind totally changes the timing of the program, it may become impossible to reproduce a certain timing issue. That's the problem with subtle timing issues; they're almost Heisenbergian in the sense that altering a program to detect timing issues may obscure the original issue.
The sad fact is, the human race still isn't adequately prepared to deal with complex, concurrent software. So unfortunately, there's no easy way to unit-test it. For distributed systems especially, you should plan your program carefully using Lamport's happens-before diagrams to help you identify the necessary order of events in your program. But ultimately, you can't really get away from brute-force unit testing with randomly varying inputs. It also helps to vary the frequency of thread context-switching during your unit-test by, e.g. running another background process which just takes up CPU cycles. Also, if you have access to a cluster, you can run multiple unit-tests in parallel, which can detect bugs much quicker and save you a lot of time.
If you can run your tests under Linux, valgrind includes a tool called helgrind which purports to detect race conditions and potential deadlocks in programs that use pthreads; you might get some benefit from running your multithreaded code under that, since it will report potential errors even if they didn't actually occur in that particular test run.
I have never heard of anything that can.
I guess if someone was to design one, it would have to have exact control over the execution of the threads and execute all possible combinations of stepping of the threads.
Sounds like a major task, not to mention the mathematical combinations for non-trivial sized threads when there are a handful or more of them...
Although, a quick search of stackoverflow... Unit testing a multithreaded application?
If the tested system is simple enough you could control the concurrency quite well by blocking operations in external mockup systems. This blocking can be done for example by waiting for some other operation to be started. If you can control all external calls this might work quite well by implementing different blocking sequences. I have tried this and it does reveal lock-level bugs quite well if you know possible problematic sequences well. And compared to many other concurrency testing it is quite deterministic. However this approach doesn't detect low level race conditions too well. I usually just go for load testing to find those, but I quess that isn't exactly unit testing.
I have seen these concurrency testing frameworks for .net, I'd assume its only matter of time before someone writes one for Java (hopefully).
And not to forget good old code reading. One of the best ways to find concurrency bugs is to just read through the code once again giving it your full concentration.
Perhaps the answer is that you shouldn't. In concurrent systems, there may not always be a single deterministic answer that is correct.
Take the example of people boarding a train and choosing a seat. You are going to end up with different results everytime.
Awaitility is a useful framework when you need to deal with asynchronicity in your tests. It allows you to wait until some state somewhere in your system is updated. For example:
await().untilCall( to(myService).myMethod(), equalTo(3) );
or
await().until( fieldIn(myObject).ofType(int.class), greaterThan(1));
It also has Scala and Groovy support.