In section 12.3.3., "Unrealistic Sampling of Code Paths" the Java Concurrency In Practice book says:
In some cases, the JVM
may make optimizations based on assumptions that may only be true temporarily, and later back them out by invalidating the compiled code if they become untrue
I cannot understand above statement.
What are these JVM assumptions?
How does the JVM know whether the assumptions are true or untrue?
If the assumptions are untrue, does it influence the correctnes of my data?
The statement that you quoted has a footnote which gives an example:
For example, the JVM can use monomorphic call transformation to convert a virtual method call to a direct method call if no classes currently loaded override that method, but it invalidates the compiled code if a class is subsequently loaded that overrides the method.
The details are very, very, very complex here. So the following is a extremely oversimpilified example.
Imagine you have an interface:
interface Adder { int add(int x); }
The method is supposed to add a value to x, and return the result. Now imagine that there is a program that uses an implementation of this class:
class OneAdder implements Adder {
int add(int x) {
return x+1;
}
}
class Example {
void run() {
OneAdder a1 = new OneAdder();
int result = compute(a1);
System.out.println(result);
}
private int compute(Adder a) {
int sum = 0;
for (int i=0; i<100; i++) {
sum = a.add(sum);
}
return sum;
}
}
In this example, the JVM could do certain optimizations. A very low-level one is that it could avoid using a vtable for calling the add method, because there is only one implementation of this method in the given program. But it could even go further, and inline this only method, so that the compute method essentially becomes this:
private int compute(Adder a) {
int sum = 0;
for (int i=0; i<100; i++) {
sum += 1;
}
return sum;
}
and in principle, even this
private int compute(Adder a) {
return 100;
}
But the JVM can also load classes at runtime. So there may be a case where this optimization has already been done, and later, the JVM loads a class like this:
class TwoAdder implements Adder {
int add(int x) {
return x+2;
}
}
Now, the optimization that has been done to the compute method may become "invalid", because it's not clear whether it is called with a OneAdder or a TwoAdder. In this case, the optimization has to be undone.
This should answer 1. of your question.
Regarding 2.: The JVM keeps track of all the optimizations that have been done, of course. It knows that it has inlined the add method based on the assumption that there is only one implementation of this method. When it finds another implementation of this method, it has to undo the optimization.
Regarding 3.: The optimizations are done when the assumptions are true. When they become untrue, the optimization is undone. So this does not affect the correctness of your program.
Update:
Again, the example above was very simplified, referring to the footnote that was given in the book. For further information about the optimization techniques of the JVM, you may refer to https://wiki.openjdk.java.net/display/HotSpot/PerformanceTechniques . Specifically, the speculative (profile-based) techniques can probably be considered to be mostly based on "assumptions" - namely, on assumptions that are made based on the profiling data that has been collected so far.
Taking the quoted text in context, this section of the book is actually talking about the importance of using realistic text data (inputs) when you do performance testing.
Your questions:
What are these JVM assumptions?
I think the text is talking about two things:
On the one hand, it seems to be talking about optimizing based on the measurement of code paths. For example whether the "then" or "else" branch of an if statement is more likely to be executed. This can indeed result in generation of different code and is susceptible to producing sub-optimal code if the initial measurements are incorrect.
On the other hand, it also seems to be talking about optimizations that may turn out to be invalid. For example, at a certain point in time, there may be only one implementation of a given interface method that has been loaded by the JVM. On seeing this, the optimizer may decide to simplify the calling sequence to avoid polymorphic method dispatching. (The term used in the book for this a "monomorphic call transformation".) A bit latter, a second implementation may be loaded, causing the optimizer to back out that optimization.
The first of these cases only affects performance.
The second of these would affect correctness (as well as performance) if the optimizer didn't back out the optimization. But the optimizer does do that. So it only affects performance. (The methods containing the affected calls need to be re-optimized, and that affects overall performance.)
How do JVM know the assumptions are true or untrue?
In the first case, it doesn't.
In the second case, the problem is noticed when the JVM loads the 2nd method, and sees a flag on (say) the interface method that says that the optimizer has assumed that it is effectively a final method. On seeing this, the loader triggers the "back out" before any damage is done.
If the assumptions are untrue, does it influence the correctness of my data?
No it doesn't. Not in either case.
But the takeaway from the section is that the nature of your test data can influence performance measurements. And it is not simply a matter of size. The test data also needs to cause the application to behave the same way (take similar code paths) as it would behave in "real life".
Related
I know that in this code:
public static void main(String[] args) {myMethod();}
private Object myMethod() {
Object o = new Object();
return o;
}
the garbage collector will destroy o after the execution of myMethod because the return value of myMethod is not assigned, and therefore there are no references to it. But what if the code is something like:
public static void main(String[] args) {myMethod();}
private Object myMethod() {
int i = 5;
return i + 10;
}
Will the compiler even bother processing i + 10, seeing as the return value is not assigned?
And if i was not a simple primitive, but a larger object:
public static void main(String[] args) {myMethod();}
private Object myMethod() {
return new LargeObject();
}
where LargeObject has an expensive constructor, will the compiler still allocate memory and call the constructor, in case it has any side effects?
This would be especially important if the return expression is complex, but has no side effects, such as:
public static void main(String[] args) {
List<Integer> list = new LinkedList();
getMiddle();
}
private Object getMiddle(List list) {
return list.get((int) list(size) / 2);
}
Calling this method in real life without using the return value would be fairly pointless, but it's for the sake of example.
My question is: Given these examples (object constructor, operation on primitive, method call with no side effects), can the compiler skip the return statement of a method if it sees that the value won't be assigned to anything?
I know I could come up with many tests for these problems, but I don't know if I would trust them. My understanding of code optimization and GC are fairly basic, but I think I know enough to say that the treatment of specific bits of code aren't necessarily generalizable. This is why I'm asking.
First, lets deal with a misconception that is apparent in your question, and some of the comments.
In a HotSpot (Oracle or OpenJDK) Java platform, there are actually two compilers that have to be considered:
The javac compiler translates Java source code to bytecodes. It does minimal optimization. In fact the only significant optimizations that it does are evaluation of compile-time-constant expressions (which is actually necessary for certain comile-time checks) and re-writing of String concatenation sequences.
You can easily see what optimizations are done ... using javap ... but it is also misleading to because the heavy-duty optimization has not been done yet. Basically, the javap output is mostly unhelpful when it comes to optimization.
The JIT compiler does the heavy-weight optimization. It is invoked at runtime while your program is running.
It is not invoked immediately. Typically your bytecodes are interpreted for the first few times that any method is called. The JVM is gathering behavioral stats that will be used by the JIT compiler to optimize (!).
So, in your example, the main method is called once and myMethd is called once. The JIT compiler won't even run, so in fact the bytecodes will be interpreted. But that is cool. It would take orders of magnitude more time for the JIT compiler to optimize than you would save by running the optimizer.
But supposing the optimizer did run ...
The JIT code compiler generally has a couple strategies:
Within a method, it optimizes based on the information local to the method.
When a method is called, it looks to see if the called method can be inlined at the call site. After the inlining, the code can then be further optimized in its context.
So here's what is likely to happen.
Then your myMethod() is optimized as a free standing method, the unnecessary statements will not be optimized away. Because they won't be unnecessary in all possible contexts.
When / if a method call to myMethod() is inlined (e.g. into the main(...) method, the optimizer will then determine that (for example) these statements
int i = 5;
return i + 10;
are unnecessary in this context, and optimize it away.
But bear in mind that JIT compiler are evolving all of the time. So predicting exactly what optimizations will occur, and when, is next to impossible. And probably fruitless.
Advice:
It is worthwhile thinking about whether you are doing unnecessary calculations at the "gross" level. Choosing the correct algorithm or data structure is often critical.
At the fine grained level, it is generally not worth it. Let the JIT compiler deal with it.
UNLESS you have clear evidence that you need to optimize (i.e. a benchmark that is objectively too slow), and clear evidence there is a performance bottleneck at a particular point (e.g. profiling results).
Questions like "what will the compiler do?" about Java are a little naïve. First, there are two compilers and an interpreter involved. The static compiler does some simple optimization, like perhaps optimizing any arithmetic expression using effectively final operands. It certainly compiles constants, literals, and constant expressions into bytecode literals.The real magic happens at runtime.
I see no reason why result calculation would be optimized away except if the return value is ignored. Ignoring a return value is rare and should be rarer.
At runtime much more information is available in context. For optimizations the runtime interpreter plus compiler dynamic duo can account for things like "Is this section of code even worth optimizing?" HotSpot and its ilk won't optimize away the return new Foo(); instantiation if the caller uses the return value. But they will perhaps do it differently, maybe throw the attributes on the stack, or even in registers, circumstances permitting. So while the object exists on the logical Java heap, it could exist elsewhere on the physical JVM components.
Who knows if specific optimizations will happen? No one. But they or something like them, or something even more magical, might happen. Likely the optimizations that HotSpot performs are different from and better than what we expect or imagine, when in its wisdom it decides to take the trouble to optimize.
Oh, and at runtime HotSpot might deoptimize code it previously optimized. This is to maintain the semantics of the Java code.
Lets say I have the following code:
private Rule getRuleFromResult(Fact result){
Rule output=null;
for (int i = 0; i < rules.size(); i++) {
if(rules.get(i).getRuleSize()==1){output=rules.get(i);return output;}
if(rules.get(i).getResultFact().getFactName().equals(result.getFactName())) output=rules.get(i);
}
return output;
}
Is it better to leave it as it is or to change it as follows:
private Rule getRuleFromResult(Fact result){
Rule output=null;
Rule current==null;
for (int i = 0; i < rules.size(); i++) {
current=rules.get(i);
if(current.getRuleSize()==1){return current;}
if(current.getResultFact().getFactName().equals(result.getFactName())) output=rules.get(i);
}
return output;
}
When executing, program goes each time through rules.get(i) as if it was the first time, and I think it, that in much more advanced example (let's say as in the second if) it takes more time and slows execution. Am I right?
Edit: To answer few comments at once: I know that in this particular example time gain will be super tiny, but it was just to get the general idea. I noticed I tend to have very long lines object.get.set.change.compareTo... etc and many of them repeat. In scope of whole code that time gain can be significant.
Your instinct is correct--saving intermediate results in a variable rather than re-invoking a method multiple times is faster. Often the performance difference will be too small to measure, but there's an even better reason to do this--clarity. By saving the value into a variable, you make it clear that you are intending to use the same value everywhere; if you re-invoke the method multiple times, it's unclear if you are doing so because you are expecting it to return different results on different invocations. (For instance, list.size() will return a different result if you've added items to list in between calls.) Additionally, using an intermediate variable gives you an opportunity to name the value, which can make the intention of the code clearer.
The only different between the two codes, is that in the first you may call twice rules.get(i) if the value is different one one.
So the second version is a little bit faster in general, but you will not feel any difference if the list is not bit.
It depends on the type of the data structure that "rules" object is. If it is a list then yes the second one is much faster as it does not need to search for rules(i) through rules.get(i). If it is a data type that allows you to know immediately rules.get(i) ( like an array) then it is the same..
In general yes it's probably a tiny bit faster (nano seconds I guess), if called the first time. Later on it will be probably be improved by the JIT compiler either way.
But what you are doing is so called premature optimization. Usually should not think about things that only provide a insignificant performance improvement.
What is more important is the readability to maintain the code later on.
You could even do more premature optimization like saving the length in a local variable, which is done by the for each loop internally. But again in 99% of cases it doesn't make sense to do it.
While I am writing the code sometimes I bump in the situation when I need to choose whether I should create a separate method (the advantage is that I can use my own syntax later) or implement the complex method which already exists (also less lines of the code).
Here are the examples using different programming languages (Objective-C and Java) to explain the question.
Objective-C example:
-(double) maxValueFinder: (NSMutableArray *)data {
double max = [[data valueForKeyPath:#"#max.intValue"] doubleValue];
return maxValue;
}
then later:
...
double max = [self maxValueFinder:data];
...
or just every time try to call:
...
double max = [[data valueForKeyPath:#"#max.intValue"] doubleValue];
...
Java example:
public static double maxFinder (ArrayList<Double> data) {
double maxValue = Collections.max(data);
return maxValue;
}
then later:
...
double max = maxFinder(data);
...
or just every time try to call:
...
double max = Collections.max(data);
...
or more complex case to make the point of my question sharper:
//using jsoup
public static Element getElement(Document content){
Element link = content.getElementsByTag("a").first();
return link;
}
or every time:
...
Element link = content.getElementsByTag("a").first();
...
Which approach cost less resources (performance, memory) or it is the same?
It absolutely doesn't matter. At least in your Java case you're uselessly recreating existing functionality, which is ridiculous.
You should first see if the functionality is contained in the standard library, then see if existing well known libraries have it, and only after that should you consider writing implementations yourself (especially for more complex functionality).
Performance has nothing to do with your question, except in the sense that the more time you spend on recreating existing functionality, the less time you have left for actual new code (therefore lowering your programming performance).
As for creating wrapper methods, that can be useful in some cases, especially if the actual method calls are often chained and you find yourself having more and more of those in the code. But there's a delicate difference between code clarity and writing excessive code.
public void parseHtml() {
parseFirstPart();
parseSecondPart();
parseThirdPart();
}
If we assume that each parse method only contains 1 or maybe 2 method calls then adding these additional methods is most likely useless, since the same thing can be achieved by proper commenting. If the parse methods contain a lot of calls, it makes sense to extract methods out of them. There's no rule about it, it's a skill you learn while you program (and of course depends a lot on what you view as beautiful code.
It's absolutely useless to recreating existing functionality.
Because these function is already implement in library.
If you talk about performance then both cases you are loading same line
double maxValue = Collections.max(data);
Performance is not matter in both cases because you are loading same code.
Normally, Java optimizes the virtual calls based on the number of implementations encountered on a given call side. This can be easily seen in the results of my benchmark, when you look at myCode, which is a trivial method returning a stored int. There's a trivial
static abstract class Base {
abstract int myCode();
}
with a couple of identical implementation like
static class A extends Base {
#Override int myCode() {
return n;
}
#Override public int hashCode() {
return n;
}
private final int n = nextInt();
}
With increasing number of implementations, the timing of the method call grows from 0.4 ns through 1.2 ns for two implementations to 11.6 ns and then grows slowly. When the JVM has seen multiple implementation, i.e., with preload=true the timings differ slightly (because of an instanceof test needed).
So far it's all clear, however, the hashCode behaves rather differently. Especially, it's 8-10 times slower in three cases. Any idea why?
UPDATE
I was curious if the poor hashCode could be helped by dispatching manually, and it could a lot.
A couple of branches did the job perfectly:
if (o instanceof A) {
result += ((A) o).hashCode();
} else if (o instanceof B) {
result += ((B) o).hashCode();
} else if (o instanceof C) {
result += ((C) o).hashCode();
} else if (o instanceof D) {
result += ((D) o).hashCode();
} else { // Actually impossible, but let's play it safe.
result += o.hashCode();
}
Note that the compiler avoids such optimizations for more than two implementation as most method calls are much more expensive than a simple field load and the gain would be small compared to the code bloat.
The original question "Why doesn't JIT optimize the hashCode like other methods" remains and hashCode2 proofs that it indeed could.
UPDATE 2
It looks like bestsss is right, at least with this note
calling hashCode() of any class extending Base is the same as calling Object.hashCode() and this is how it compiles in the bytecode, if you add an explicit hashCode in Base that would limit the potential call targets invoking Base.hashCode().
I'm not completely sure about what's going on, but declaring Base.hashCode() makes a hashCode competitive again.
UPDATE 3
OK, providing a concrete implementation of Base#hashCode helps, however, the JIT must know that it never gets called, as all subclasses defined their own (unless another subclass gets loaded, which can lead to a deoptimization, but this is nothing new for the JIT).
So it looks like a missed optimization chance #1.
Providing an abstract implementation of Base#hashCode works the same. This makes sense, as it provides ensures that no further lookup is needed as each subclass must provide its own (they can't simply inherit from their grandparent).
Still for more than two implementations, myCode is so much faster, that the compiler must be doing something subobtimal. Maybe a missed optimization chance #2?
hashCode is defined in java.lang.Object, so defining it in your own class doesn't do much at all. (still it's a defined method but it makes no difference)
JIT has several ways to optimize call sites (in this case hashCode()):
no overrides - static call (no virtual at all) - best case scenario with full optimizations
2 sites - ByteBuffer for instance: exact type check and then static dispatch. The type check is very simple but depending on the usage it may or may not be predicted by the hardware.
inline caches - when few different class instances have been used in the caller body, it's possible to keep them inlined too - that's it some methods might be inlined, some may be called via virtual tables. Inline budget is not very high. This is exactly the case in the question - a different method not named hashCode() would feature the inline caches as there are only four implementations, instead of the v-table
Adding more classes going through that caller body results in real virtual call as the compiler gives up.
The virtual calls are not inlined and require an indirection through the table of virtual methods and virtually ensured cache miss. The lack of inlining actually requires full function stubs with parameters passed through the stack. Overall when the real performance killer is the inability to inline and apply optimizations.
Please note: calling hashCode() of any class extending Base is the same as calling Object.hashCode() and this is how it compiles in the bytecode, if you add an explicit hashCode in Base that would limit the potential call targets invoking Base.hashCode().
Way too many classes (in JDK itself) have hashCode() overridden so in cases on not inlined HashMap alike structures the invocation is performed via vtable - i.e. slow.
As extra bonus: While loading new classes the JIT has to deoptimize existing call sites.
I may try to look up some sources, if anyone is interested in further reading
This is a known performance issue:
https://bugs.openjdk.java.net/browse/JDK-8014447
It has been fixed in JDK 8.
I can confirm the findings. See these results (recompilations omitted):
$ /extra/JDK8u5/jdk1.8.0_05/bin/java Main
overCode : 14.135000000s
hashCode : 14.097000000s
$ /extra/JDK7u21/jdk1.7.0_21/bin/java Main
overCode : 14.282000000s
hashCode : 54.210000000s
$ /extra/JDK6u23/jdk1.6.0_23/bin/java Main
overCode : 14.415000000s
hashCode : 104.746000000s
The results are obtained by calling methods of class SubA extends Base repeatedly.
Method overCode() is identical to hashCode(), both of which just return an int field.
Now, the interesting part: If the following method is added to class Base
#Override
public int hashCode(){
return super.hashCode();
}
execution times for hashCode aren't different from those for overCode any more.
Base.java:
public class Base {
private int code;
public Base( int x ){
code = x;
}
public int overCode(){
return code;
}
}
SubA.java:
public class SubA extends Base {
private int code;
public SubA( int x ){
super( 2*x );
code = x;
}
#Override
public int overCode(){
return code;
}
#Override
public int hashCode(){
return super.hashCode();
}
}
I was looking at your invariants for your test. It has scenario.vmSpec.options.hashCode set to 0. According to this slideshow (slide 37) that means Object.hashCode will use a random number generator. That might be why the JIT compiler is less interested in optimising calls to hashCode as it considers it likely it may have to resort to an expensive method call, which would offset any performance gains from avoiding a vtable lookup.
This may also be why setting Base to have its own hash code method improves performance as it prevents the possibility of falling through to Object.hashCode.
http://www.slideshare.net/DmitriyDumanskiy/jvm-performance-options-how-it-works
The semantics of hashCode() are more complex than regular methods, so the JVM and the JIT compiler must do more work when you call hashCode() than when you call a regular virtual method.
One specificity has an negative impact on performance : calling hashCode() on a null object is valid and returns zero. This requires one more branching than on a regular call which in itself can explain the performance difference you have constated.
Note that is is true it seems only from Java 7 due to the introduction of Object.hashCode(target) which has this semantic. It would be interesting to know on which version you tested this issue and if you would have the same on Java6 for instance.
Another specificity has a positive impact on performance : if you do not provide your own hasCode() implementation, the JIT compiler will use an inline hashcode computation code which is faster than a regular compiled Object.hashCode call.
E.
I'm extending and improving a Java application which also does long running searches with a small DSL (in detail it is used for Model-Finding, yes it's in general NP-Complete).
During this search I want to show a small progress bar on the console. Because of the generic structure of the DSL I cannot calculate the overall search space size. Therefore I can only output the progress of the first "backtracking" statement.
Now the question:
I can use a flag for each backtracking statement to indicate that this statement should report the progress. When evaluating the statement I can check the flag with an if-statement:
public class EvalStatement {
boolean reportProgress;
public EvalStatement(boolean report) {
reportProgress = report;
}
public void evaluate() {
int progress = 0;
while(someCondition) {
// do something
// maybe call other statement (tree structure)
if (reportProgress) {
// This is only executed by the root node, i. e.,
// the condition is only true for about 30 times whereas
// it is false millions or billions of times
++progress;
reportProgress(progress);
}
}
}
}
I can also use two different classes:
A class which does nothing
A subclass that is doing the output
This would look like this:
public class EvalStatement {
private ProgressWriter out;
public EvalStatement(boolean report) {
if (report)
out = new ProgressWriterOut();
else
out = ProgressWriter.instance;
}
public void evaluate() {
while(someCondition) {
// do something
// maybe call other statement (tree structure)
out.reportProgress(progress);
}
}
}
public class ProgressWriter {
public static ProgressWriter instance = new ProgressWriter();
public void reportProgress(int progress) {}
}
public class ProgressWriterOut extends ProgressWriter {
int progress = 0;
public void reportProgress(int progress) {
// This is only executed by the root node, i. e.,
// the condition is only true for about 30 times whereas
// it is false millions or billions of times
++progress;
// Put progress anywhere, e. g.,
System.out.print('#');
}
}
An now really the question(s):
Is the Java lookup of the method to call faster then the if statement?
In addition, would an interface and two independet classes be faster?
I know Log4J recommends to put an if-statement around log-calls, but I think the main reason is the construction of the parameters, espacially strings. I have only primitive types.
EDIT:
I clarified the code a little bit (what is called often... the usage of the singleton is irrelevant here).
Further, I made two long-term runs of the search where the if-statement respectively the operation call was hit 1.840.306.311 times on a machine doing nothing else:
The if version took 10h 6min 13sek (50.343 "hits" per second)
The or version took 10h 9min 15sek (50.595 "hits" per second)
I would say, this does not give a real answer, because the 0,5% difference is in the measuring tolerance.
My conclusion: They more or less behave the same, but the overriding approach could be faster in the long-term as guessed by Kane in the answers.
I think this is the text book definition of over-optimization. You're not really even sure you have a performance problem. Unless you're making MILLIONS of calls across that section it won't even show up in your hotspot reports if you profiled it. If statements, and methods calls are on the order of nanoseconds to execute. So in order for a difference between them you are talking about saving 1-10ns at the most. For that to even be perceived by a human as being slow it needs to be in the order of 100 milliseconds, and that's if they user is even paying attention like actively clicking, etc. If they're watching a progress bar they aren't even going to notice it.
Say we wanted to see if that added even 1s extra time, and you found one of those could save 10 ns (it's probably like a savings of 1-4ns). So that would mean you'd need that section to be called 100,000,000 times in order to save 1s. And I can guarantee you if you have 100 Million calls being made you'll find 10 other areas that are more expensive than the choice of if or polymorphism there. Seems sorta silly to debate the merits of 10ns on the off chance you might save 1s doesn't it?
I'd be more concerned about your usage of a singleton than performance.
I wouldn't worry about this - the cost is very small, output to the screen or computation would be much slower.
The only way to really answer this question is to try both and profile the code under normal circumstances. There are lots of variables.
That said, if I had to guess, I would say the following:
In general, an if statement compiles down to less bytecode than a method call, but with a JIT compiler optimizing, your method call may get inlined, which is no bytecode. Also, with branch-prediction of the if-statement, the cost is minimal.
Again, in general, using the interfaces will be faster than testing if you should report every time the loop is run. Over the long run, the cost of loading two classes, testing once, and instantiating one, is going to be less than running a particular test eleventy bajillion times. Over the long term.
Again, the better way to do this would be to profile the code on real world examples both ways, maybe even report back your results. However, I have a hard time seeing this being the performance bottleneck for your application... your time is probably better spent optimizing elsewhere if speed is a concern.
Putting anything on the monitor is orders of magnitude slower than either choice. If you really got a performance problem there (which I doubt) you'd need to reduce the number of calls to print.
I would assume that method lookup is faster than evaluating if(). In fact, also the version with the if needs a method lookup.
And if you really want to squeeze out every bit of performance, use private final methods in your ProgessWriter's, as this can allow the JVM to inline the method so there would be no method lookup, and not even a method call in the machine code derived from the byte code after it is finally compiled.
But, probably, they are both rather close in performance. I would suggest to test/profile, and then concentrate on the real performance issues.