I'm trying to reason about how the JIT of Hotspot reasons. I'm mostly interested in the latest compilation stage (C2 compiler). Does the JIT in Java rely on assertions for optimisations? If that was the case, I could imagine that there are examples where code could run faster with assertions enabled.
For example, in a piece of code like this:
static int getSumOfFirstThree(int[] array) {
assert(array.length >= 3);
return array[0] + array[1] + array[2];
}
Will the JIT, when assertions are enabled, be clever enough to eliminate the bounds checks on the array accesses?
Alternatively, are there other cases that you can think of (practical or not) where assertions will actually improve the native code that the JIT will compile?
In this case, there is multiple bounds checks to be made, and it is possible the JIT can coalesce them so that only one check is made, however the assertion doesn't avoid the need to make the check.
Assertions prevent optimisations like inlining as the method is larger and size is a factor in determining whether to inline a method. Typically inlining improves performance but in some cases it doesn't as it can cause the L0 or L1 CPU caches to become inefficient due to larger code being generated.
An example of where assertion can improve performance is something like this.
boolean assertionOn = false;
assert assertionOn = true;
if (assertionOn) {
assumeDataIsGood(); // due to checks elsewhere
} else {
expensiveCheckThatDataMightNotBeGood();
}
This is perhaps an anti-pattern to using assertions, but is going to be cheaper with assertions on by intent.
Related
In section 12.3.3., "Unrealistic Sampling of Code Paths" the Java Concurrency In Practice book says:
In some cases, the JVM
may make optimizations based on assumptions that may only be true temporarily, and later back them out by invalidating the compiled code if they become untrue
I cannot understand above statement.
What are these JVM assumptions?
How does the JVM know whether the assumptions are true or untrue?
If the assumptions are untrue, does it influence the correctnes of my data?
The statement that you quoted has a footnote which gives an example:
For example, the JVM can use monomorphic call transformation to convert a virtual method call to a direct method call if no classes currently loaded override that method, but it invalidates the compiled code if a class is subsequently loaded that overrides the method.
The details are very, very, very complex here. So the following is a extremely oversimpilified example.
Imagine you have an interface:
interface Adder { int add(int x); }
The method is supposed to add a value to x, and return the result. Now imagine that there is a program that uses an implementation of this class:
class OneAdder implements Adder {
int add(int x) {
return x+1;
}
}
class Example {
void run() {
OneAdder a1 = new OneAdder();
int result = compute(a1);
System.out.println(result);
}
private int compute(Adder a) {
int sum = 0;
for (int i=0; i<100; i++) {
sum = a.add(sum);
}
return sum;
}
}
In this example, the JVM could do certain optimizations. A very low-level one is that it could avoid using a vtable for calling the add method, because there is only one implementation of this method in the given program. But it could even go further, and inline this only method, so that the compute method essentially becomes this:
private int compute(Adder a) {
int sum = 0;
for (int i=0; i<100; i++) {
sum += 1;
}
return sum;
}
and in principle, even this
private int compute(Adder a) {
return 100;
}
But the JVM can also load classes at runtime. So there may be a case where this optimization has already been done, and later, the JVM loads a class like this:
class TwoAdder implements Adder {
int add(int x) {
return x+2;
}
}
Now, the optimization that has been done to the compute method may become "invalid", because it's not clear whether it is called with a OneAdder or a TwoAdder. In this case, the optimization has to be undone.
This should answer 1. of your question.
Regarding 2.: The JVM keeps track of all the optimizations that have been done, of course. It knows that it has inlined the add method based on the assumption that there is only one implementation of this method. When it finds another implementation of this method, it has to undo the optimization.
Regarding 3.: The optimizations are done when the assumptions are true. When they become untrue, the optimization is undone. So this does not affect the correctness of your program.
Update:
Again, the example above was very simplified, referring to the footnote that was given in the book. For further information about the optimization techniques of the JVM, you may refer to https://wiki.openjdk.java.net/display/HotSpot/PerformanceTechniques . Specifically, the speculative (profile-based) techniques can probably be considered to be mostly based on "assumptions" - namely, on assumptions that are made based on the profiling data that has been collected so far.
Taking the quoted text in context, this section of the book is actually talking about the importance of using realistic text data (inputs) when you do performance testing.
Your questions:
What are these JVM assumptions?
I think the text is talking about two things:
On the one hand, it seems to be talking about optimizing based on the measurement of code paths. For example whether the "then" or "else" branch of an if statement is more likely to be executed. This can indeed result in generation of different code and is susceptible to producing sub-optimal code if the initial measurements are incorrect.
On the other hand, it also seems to be talking about optimizations that may turn out to be invalid. For example, at a certain point in time, there may be only one implementation of a given interface method that has been loaded by the JVM. On seeing this, the optimizer may decide to simplify the calling sequence to avoid polymorphic method dispatching. (The term used in the book for this a "monomorphic call transformation".) A bit latter, a second implementation may be loaded, causing the optimizer to back out that optimization.
The first of these cases only affects performance.
The second of these would affect correctness (as well as performance) if the optimizer didn't back out the optimization. But the optimizer does do that. So it only affects performance. (The methods containing the affected calls need to be re-optimized, and that affects overall performance.)
How do JVM know the assumptions are true or untrue?
In the first case, it doesn't.
In the second case, the problem is noticed when the JVM loads the 2nd method, and sees a flag on (say) the interface method that says that the optimizer has assumed that it is effectively a final method. On seeing this, the loader triggers the "back out" before any damage is done.
If the assumptions are untrue, does it influence the correctness of my data?
No it doesn't. Not in either case.
But the takeaway from the section is that the nature of your test data can influence performance measurements. And it is not simply a matter of size. The test data also needs to cause the application to behave the same way (take similar code paths) as it would behave in "real life".
I know that in this code:
public static void main(String[] args) {myMethod();}
private Object myMethod() {
Object o = new Object();
return o;
}
the garbage collector will destroy o after the execution of myMethod because the return value of myMethod is not assigned, and therefore there are no references to it. But what if the code is something like:
public static void main(String[] args) {myMethod();}
private Object myMethod() {
int i = 5;
return i + 10;
}
Will the compiler even bother processing i + 10, seeing as the return value is not assigned?
And if i was not a simple primitive, but a larger object:
public static void main(String[] args) {myMethod();}
private Object myMethod() {
return new LargeObject();
}
where LargeObject has an expensive constructor, will the compiler still allocate memory and call the constructor, in case it has any side effects?
This would be especially important if the return expression is complex, but has no side effects, such as:
public static void main(String[] args) {
List<Integer> list = new LinkedList();
getMiddle();
}
private Object getMiddle(List list) {
return list.get((int) list(size) / 2);
}
Calling this method in real life without using the return value would be fairly pointless, but it's for the sake of example.
My question is: Given these examples (object constructor, operation on primitive, method call with no side effects), can the compiler skip the return statement of a method if it sees that the value won't be assigned to anything?
I know I could come up with many tests for these problems, but I don't know if I would trust them. My understanding of code optimization and GC are fairly basic, but I think I know enough to say that the treatment of specific bits of code aren't necessarily generalizable. This is why I'm asking.
First, lets deal with a misconception that is apparent in your question, and some of the comments.
In a HotSpot (Oracle or OpenJDK) Java platform, there are actually two compilers that have to be considered:
The javac compiler translates Java source code to bytecodes. It does minimal optimization. In fact the only significant optimizations that it does are evaluation of compile-time-constant expressions (which is actually necessary for certain comile-time checks) and re-writing of String concatenation sequences.
You can easily see what optimizations are done ... using javap ... but it is also misleading to because the heavy-duty optimization has not been done yet. Basically, the javap output is mostly unhelpful when it comes to optimization.
The JIT compiler does the heavy-weight optimization. It is invoked at runtime while your program is running.
It is not invoked immediately. Typically your bytecodes are interpreted for the first few times that any method is called. The JVM is gathering behavioral stats that will be used by the JIT compiler to optimize (!).
So, in your example, the main method is called once and myMethd is called once. The JIT compiler won't even run, so in fact the bytecodes will be interpreted. But that is cool. It would take orders of magnitude more time for the JIT compiler to optimize than you would save by running the optimizer.
But supposing the optimizer did run ...
The JIT code compiler generally has a couple strategies:
Within a method, it optimizes based on the information local to the method.
When a method is called, it looks to see if the called method can be inlined at the call site. After the inlining, the code can then be further optimized in its context.
So here's what is likely to happen.
Then your myMethod() is optimized as a free standing method, the unnecessary statements will not be optimized away. Because they won't be unnecessary in all possible contexts.
When / if a method call to myMethod() is inlined (e.g. into the main(...) method, the optimizer will then determine that (for example) these statements
int i = 5;
return i + 10;
are unnecessary in this context, and optimize it away.
But bear in mind that JIT compiler are evolving all of the time. So predicting exactly what optimizations will occur, and when, is next to impossible. And probably fruitless.
Advice:
It is worthwhile thinking about whether you are doing unnecessary calculations at the "gross" level. Choosing the correct algorithm or data structure is often critical.
At the fine grained level, it is generally not worth it. Let the JIT compiler deal with it.
UNLESS you have clear evidence that you need to optimize (i.e. a benchmark that is objectively too slow), and clear evidence there is a performance bottleneck at a particular point (e.g. profiling results).
Questions like "what will the compiler do?" about Java are a little naïve. First, there are two compilers and an interpreter involved. The static compiler does some simple optimization, like perhaps optimizing any arithmetic expression using effectively final operands. It certainly compiles constants, literals, and constant expressions into bytecode literals.The real magic happens at runtime.
I see no reason why result calculation would be optimized away except if the return value is ignored. Ignoring a return value is rare and should be rarer.
At runtime much more information is available in context. For optimizations the runtime interpreter plus compiler dynamic duo can account for things like "Is this section of code even worth optimizing?" HotSpot and its ilk won't optimize away the return new Foo(); instantiation if the caller uses the return value. But they will perhaps do it differently, maybe throw the attributes on the stack, or even in registers, circumstances permitting. So while the object exists on the logical Java heap, it could exist elsewhere on the physical JVM components.
Who knows if specific optimizations will happen? No one. But they or something like them, or something even more magical, might happen. Likely the optimizations that HotSpot performs are different from and better than what we expect or imagine, when in its wisdom it decides to take the trouble to optimize.
Oh, and at runtime HotSpot might deoptimize code it previously optimized. This is to maintain the semantics of the Java code.
Which one would be faster and why?
public void increment{
synchronized(this){
i++;
}
}
or
public synchronized void increment(){
I++;
}
Would method inlining improve the second option?
The difference is unlikely to matter or be visible but the second example could be faster. Why? In Oracle/OpenJDK, the less byte code a method uses the more it can be optimised e.g. inlined.
There is a number of thresholds for when a method could be optimised and these thresholds are based on the number of bytes of byte code. The second example uses less byte code so it is possible it could be optimise more.
One of these optimisations is escape analysis which could eliminate the use of synchronized for thread local object. This would make it much faster. The threshold for escape analysis is method which are under 150 bytes (after inlining) by default. You could see cases where the first solution makes the method just over 150 bytes and the second is just under 150 bytes.
NOTE:
don't write code on this basis, the different it too trivial to matter. Code clarity is far, far more important.
if performance does matter, using an alternative like AtomicInteger is likely to be an order of magnitude faster. (consistently, not just in rare cases)
BTW AFAIK the concurrency libraries use little assert statements, not because they are not optimised away but because they still count to the size of the byte code and indirectly slow down their use by meaning in some cases the code isn't as optimal. (this claim of mine should have a reference, but I couldn't find it)
Since JDK 7 I've been happily using the method it introduced to reject null values which are passed to a method which cannot accept them:
private void someMethod(SomeType pointer, SomeType anotherPointer) {
Objects.requireNonNull(pointer, "pointer cannot be null!");
Objects.requireNonNull(anotherPointer, "anotherPointer cannot be null!");
// Rest of method
}
I think this method makes for very tidy code which is easy to read, and I'm trying to encourage colleagues to use it. But one (particularly knowledgeable) colleague is resistant, and says that the old way is more efficient:
private void someMethod(SomeType pointer, SomeType anotherPointer) {
if (pointer == null) {
throw new NullPointerException("pointer cannot be null!");
}
if (anotherPointer == null) {
throw new NullPointerException("anotherPointer cannot be null!");
}
// Rest of method
}
He says that calling requireNonNull involves placing another method on the JVM call stack and will result in worse performance than a simple == null check.
So my question: is there any evidence of a performance penalty being incurred by using the Objects.requireNonNull methods?
Let's look at the implementation of requireNonNull in Oracle's JDK:
public static <T> T requireNonNull(T obj) {
if (obj == null)
throw new NullPointerException();
return obj;
}
So that's very simple. The JVM (Oracle's, anyway) includes an optimizing two-stage just-in-time compiler to convert bytecode to machine code. It will inline trivial methods like this if it can get better performance that way.
So no, not likely to be slower, not in any meaningful way, not anywhere that would matter.
So my question: is there any evidence of a performance penalty being incurred by using the Objects.requireNonNull methods?
The only evidence that would matter would be performance measurements of your codebase, or of code designed to be highly representative of it. You can test this with any decent performance testing tool, but unless your colleague can point to a real-world example of a performance problem in your codebase related to this method (rather than a synthetic benchmark), I'd tend to assume you and he/she have bigger fish to fry.
As a bit of an aside, I noticed your sample method is a private method. So only code your team is writing calls it directly. In those situations, you might look at whether you have a use case for assertions rather than runtime checks. Assertions have the advantage of not executing in "released" code at all, and thus being faster than either alternative in your question. Obviously there are places you need runtime checks, but those are usually at gatekeeping points, public methods and such. Just FWIW.
Formally speaking, your colleague is right:
If someMethod() or corresponding trace is not hot enough, the byte code is interpreted, and extra stack frame is created
If someMethod() is called on 9-th level of depth from hot spot, the requireNonNull() calls shouldn't be inlined because of MaxInlineLevel JVM Option
If the method is not inlined for any of the above reasons, argument by T.J. Crowder comes into play, if you use concatenation for producing error message
Even if requireNonNull() is inlined, JVM wastes time and space for performing this.
On the other hand, there is FreqInlineSize JVM option, which prohibits inlining too big (in bytecodes) methods. The method's bytecodes is counted by themselves, without accounting size of methods, called within this method. Thus, extracting pieces of code into independent methods could be useful sometimes, in the example with requireNonNull() this extraction is made for you already.
If you want evidence ... then the way to get it is to write a micro-benchmark.
(I recommend looking at the Calliper project first! Or JMH ... per Boris's recommendation. Either way, don't try and write a micro-benchmark from scratch. There are too many ways to get it wrong.)
However, you can tell your colleague two things:
The JIT compiler does a good job of inlining small method calls, and it is likely that this will happen in this case.
If it didn't inline the call, the chances are that the difference in performance would only be a 3 to 5 instructions, and it is highly unlikely that it would make a significant difference.
Yes, there is evidence that the difference between manual null check and Objects.requireNonNull() is negligible. OpenJDK commiter Aleksey Shipilev created benchmarking code that proves this while fixing JDK-8073479, here is his conclusion and performance numbers:
TL;DR: Fear not, my little friends, use Objects.requireNonNull.
Stop using these obfuscating Object.getClass() checks,
those rely on non-related intrinsic performance, potentially
not available everywhere.
Runs are done on i5-4210U, 1.7 GHz, Linux x86_64, JDK 8u40 EA.
The explanations are derived from studying the generated code
("-prof perfasm" is your friend here), the disassembly is skipped
for brevity.
Out of box, C2 compiled:
Benchmark Mode Cnt Score Error Units
NullChecks.branch avgt 25 0.588 ± 0.015 ns/op
NullChecks.objectGetClass avgt 25 0.594 ± 0.009 ns/op
NullChecks.objectsNonNull avgt 25 0.598 ± 0.014 ns/op
Object.getClass() is intrinsified.
Objects.requireNonNull is perfectly inlined.
where branch, objectGetClass and objectsNonNull are defined as follows:
#Benchmark
public void objectGetClass() {
o.getClass();
}
#Benchmark
public void objectsNonNull() {
Objects.requireNonNull(o);
}
#Benchmark
public void branch() {
if (o == null) {
throw new NullPointerException();
}
}
Your colleague is most likely wrong.
JVM is very intelligent and will most likely inline the Objects.requireNonNull(...) method. The performance is questionable but there will be definitely much more serious optimizations than this.
You should use the utility method from JDK.
Effective Java by Joshua Bloch
Item 67: Optimize judiciously
There are three aphorisms concerning optimization that everyone should know:
More computing sins are committed in the name of efficiency (without necessarily achieving it) than for any other single reason—including blind stupidity.
—William A. Wulf [Wulf72]
We should forget about small efficiencies, say about 97% of the time:
premature optimization is the root of all evil.
—Donald E. Knuth [Knuth74]
We follow two rules in the matter of optimization:
Rule 1. Don’t do it.
Rule 2 (for experts only). Don’t do it yet—that is, not until you
have a perfectly clear and unoptimized solution.
—M. A. Jackson [Jackson75]
Meh, No. But, yes.
No, the direct code is always better since the method stack does not need to be touched.
Yes, if the VM implementation has null-check skips or some optimized null-checks.
Meh, method stack is so light to be modified and updated (yet it will consume some time).
As a general rule, readability and maintainability should trump optimization.
This rule safeguards against speculative optimization from people who think they know how a compiler works even though they have never even attempted to write one and they have never had a look inside one.
Your colleague is wrong unless they prove that the performance penalty is noticeable and untenable for users.
Objects.requireNonNull is more optimised as if you this you are code reusability.
Also in oracle requireNonNull si defined as
public static <T> T requireNonNull(T obj) {
if (obj == null)
throw new NullPointerException();
return obj;
}
so its already in bytecode.
Consider the following method in Java:
public static boolean expensiveComputation() {
for (int i = 0; i < Integer.MAX_VALUE; ++i);
return false;
}
And the following main method:
public static void main(String[] args) {
boolean b = false;
if (expensiveComputation() && b) {
}
}
Logical conjunction (same as &&) is a commutative operation. So why the compiler doesn't optimize the if-statement code to the equivalent:
if (b && expensiveComputation()) {
}
which has the benefits of using short-circuit evaluation?
Moreover, does the compiler try to make other logic simplifications or permutation of booleans in order to generate faster code? If not, why? Surely some optimizations would be very difficult, but my example isn't simple? Calling a method should always be slower than reading a boolean, right?
Thank you in advance.
It doesn't do that because expensiveComputation() may have side effects which change the state of the program. This means that the order in which the expressions in the boolean statements are evaluated (expensiveComputation() and b) matters. You wouldn't want the compiler optimizing a bug into your compiled program, would you?
For example, what if the code was like this
public static boolean expensiveComputation() {
for (int i = 0; i < Integer.MAX_VALUE; ++i);
b = false;
return false;
}
public static boolean b = true;
public static void main(String[] args) {
if (expensiveComputation() || b) {
// do stuff
}
}
Here, if the compiler performed your optimization, then the //do stuff would run when you wouldn't expect it to by looking at the code (because the b, which is originally true, is evaluated first).
Because expensiveComputation() may have side-effects.
Since Java doesn't aim to be a functionally pure language, it doesn't inhibit programmers from writing methods that have side-effects. Thus there probably isn't a lot of value in the compiler analyzing for functional purity. And then, optimizations like you posit are unlikely to be very valuable in practice, as expensiveComputation() would usually be required to executed anyway, to get the side effects.
Of course, for a programmer, it's easy to put the b first if they expect it to be false and explicitly want to avoid the expensive computation.
Actually, some compilers can optimise programs like the one you suggested, it just has to make sure that the function has no side-effects. GCC has a compiler directive you can annotate a function with to show that it has no side-effects, which the compiler may then use when optimizing. Java may have something similar.
A classic example is
for(ii = 0; strlen(s) > ii; ii++) < do something >
which gets optimized to
n = strlen(s); for(ii = 0; n > ii; ii++) < do something >
by GCC with optimization level 2, at least on my machine.
The compiler will optimize this if you run the code often enough, probably by inlining the method and simplifying the resulting boolean expression (but most likely not by reordering the arguments of &&).
You can benchmark this by timing a loop of say a million iterations of this code repeatedly. The first iteration or two are much slower than the following.
The version of java I am using optimises a in an expression a && b but not with b.
i.e. If a is false, b does not get evaluated but if b was false it did not do this.
I found this out when I was implementing validation in a website form: I created messages to display on the web-page in a series of boolean methods.
I expected of the fields in the page which were incorrectly entered to become highlighted but, because of Java's speed-hack, the code was only executed until the first incorrect field was discovered. After that, Java must have thought something like "false && anything is always false" and skipped the remaining validation methods.
I suppose, as a direct answer to your question, if you make optimisations like this, your program may run slower than it could. However, someone else's program will completely break because they have assumed the non-optimised behaviour like the side effect thing mentioned in other answers.
Unfortunately, it's difficult to automate intelligent decisions, especially with imperative languages (C, C++, Java, Python,... i.e the normal languages).