Implicit StringBuilder size - java

I almost always use StringBuilder(int) to avoid silly reallocations with the tiny default capacity. Does any javac implementation do anything special to use an appropriate capacity for implicit uses of StringBuilder in concatenation? Would "abcdefghijklmnopqrstuvwxyz" + numbers + "!.?:" use at least an initial capacity of 30 given the literal lengths?

Yes, it is going to be that smart -- smarter, actually -- in Java 9. (This is something I can loosely claim I was a part of -- I made the original suggestion to presize StringBuilders, and that inspired the better, more powerful approach they ended up with.)
http://openjdk.java.net/jeps/280 specifically mentions, among other issues, that "sometimes we need to presize the StringBuilder appropriately," and that's a specific improvement being made.
(And the moral of the story is, write your code as simply and readably as possible and let the compiler and JIT take care of the optimizations that are properly their job.)

This may depend on your compiler version, settings etc.
To test this, I created a simple method:
public static void main(String[] args) {
System.out.println("abcdefghijklmnopqrstuvwxyz" + args[0] + "!.?:");
}
And after compiling and then running it through a decompiler, we get the following result:
System.out.println((new StringBuilder()).append("abcdefghijklmnopqrstuvwxyz").append(args[0]).append("!.?:").toString());
So it would seem that, at least with some javac versions, it just uses the default.
Manually inspecting the bytecode indicates the same thing.

Related

Why does the Java Compiler not inline these calls in a single if clause?

I have been pulled into a performance investigation of a code that is similar to the below:
private void someMethod(String id) {
boolean isHidden = someList.contains(id);
boolean isDisabled = this.checkIfDisabled(id);
if (isHidden && isDisabled) {
// Do something here
}
}
When I started investigating it, I was hoping that the compiled version would look like this:
private void someMethod(String id) {
if (someList.contains(id) && this.checkIfDisabled(id)) {
// Do something here
}
}
However, to my surprise, the compiled version looks exactly like the first one, with the local variables, which causes the method in isDisabled to always be called, and that's where the performance problem is in.
My solution was to inline it myself, so the method now short circuits at isHidden, but it left me wondering: Why isn't the Java Compiler smart enough in this case to inline these calls for me? Does it really need to have the local variables in place?
Thank you :)
First: the java compiler (javac) does almost no optimizations, that job is almost entirely done by the JVM itself at runtime.
Second: optimizations like that can only be done when there is no observable difference in behaviour of the optimized code vs. the un-optimized code.
Since we don't know (and the compiler presumably also doesn't know) if checkIfDisabled has any observable side-effects, it has to assume that it might. Therefore even when the return value of that method is known to not be needed, the call to the method can't be optimized away.
There is, however an option for this kind of optimization to be done at runtime: If the body (or bodies, due to polymorphism) of the checkIfDisabled method is simple enough then it's quite possible that the runtime can actually optimize away that code, if it recognizes that the calls never have a side-effect (but I don't know if any JVM actually does this specific kind of optimization).
But that optimization is only possible at a point where there is definite information about what checkIfDisabled does. And due to the dynamic class-loading nature of Java that basically means it's almost never during compile time.
Generally speaking, while some minor optimizations could possibly be done during compile time, the range of possible optimizations is much larger at runtime (due to the much increased amount of information about the code available), so the Java designers decided to put basically all optimization effort into the runtime part of the system.
The most-obvious solution to this problem is simply to rewrite the code something like this:
if (someList.contains(id)) {
if (this.checkIfDisabled(id)) {
// do something here
If, in your human estimation of the problem, one test is likely to mean that the other test does not need to be performed at all, then simply "write it that way."
Java compiler optimizations are tricky. Most optimizations are done at runtime by the JIT compiler. There are several levels of optimizations, the maximum number of optimizations by default will be made after 5000 method invocations. But it is rather problematic to see which optimizations are applied, since JIT compile the code directly into the platform's native code

Is there difference in compilers - java

Is there any differences in code optimization done by same versions of:
Oracle Java compiler
Apache Java compiler
IBM Java compiler
OpenJDK Java compiler.
If there is what code would demonstrate different optimizations? Or are they using same compiler? If there is no known optimization differences then where could I find resources on how to test compilers for different optimizations?
Is there any differences in code optimization done by same versions of: Oracle Java compiler Apache Java compiler IBM Java compiler OpenJDK Java compiler.
While compiler can be very different, the javac does almost not optimisations. The main optimisation is constant inlining and this is specified in the JLS and thus standard (except for any bugs)
If there is what code would demonstrate different optimizations?
You can do this.
final String w = "world";
String a = "hello " + w;
String b = "hello world";
String c = w;
String d = "hello " + c;
System.out.prinlnt(a == b); // these are the same String
System.out.prinlnt(c == b); // these are NOT the same String
In the first case, the constant was inlined and the String concatenated at compile time. In the second case the concatenation was performed at runtime and a new String created.
Or are they using same compiler?
No, but 99% of optimisations are performed at runtime by the JIT so these are the same for a given version of JVM.
If there is no known optimization differences then where could I find resources on how to test compilers for different optimizations?
I would be surprised if there is one as this doesn't sound very useful. The problem is that the JIT optimises pre built templates of byte code and if you attempt to optimise the byte code you can end up confusing the JIT and having slower code. i.e. there is no way to evaluated an optimisation without considering the JVM it will be run on.
No, they do not use the same compiler. I can't comment much about the optimizations and stuffs, but here's an example how the compilers are different in their working.
public class Test {
public static void main(String[] args) {
int x = 1L; // <- this cannot compile
}
}
If you use the standard java compiler, it'll throw an compilation error and the class file won't be created.
But if you use the eclipse compiler for java ECJ, it'll not only throw the same compilation error, but will also create a class file(YES, a class file for an uncompilable code, which makes ECJ, I wouldn't say wrong, but a bit tricky), which looks something like this.
public static void main(String[] paramArrayOfString)
{
throw new Error("Unresolved compilation problem: \n\tType mismatch: cannot convert from long to int.\n");
}
Having said that, this is just between 2 compilers. Other compilers may have their own way of working.
P.S: I took this example from here.
The only compilers that I have spent a great deal of time with are javac (which, as others have pointed out, does very little in terms of eager optimization) and the Eclipse compiler.
While writing a Java decompiler, I have observed a few (often frustrating) differences in how Eclipse compiles code, but not many. Some of them could be considered optimizations. Among them:
The Eclipse compiler appears to perform at least some duplicate code analysis. If two (or more?) blocks of code both branch to separate but equivalent blocks of code, the equivalent target blocks may be flattened into a single block with multiple entry jumps. I have never seen javac perform this type of optimization; the equivalent blocks would always be emitted. All of the examples I can recall happened to occur in switch statements. This optimization reduces the method size (and therefore class file size), which may improve load and verification time. It may even result in improved performance in interpreted mode (particularly if the interpreter in question performs inlining), but I imagine such an improvement would be slight. I doubt it would make a difference once the method has been JIT compiled. It also makes decompilation more difficult (grrr).
Basic blocks are often emitted in a completely different order from javac. This may simply be a side effect of the compiler's internal design, or it may be that the compiler is trying to optimize the code layout to reduce the number of jumps. This is the sort of optimization I would normally leave to the JIT, and that philosophy seems to work fine for javac.

Why do StringBuilders pop up when debugging String concatenation?

I am aware that String are immutable, and on when to use a StringBuilder or a StringBuffer. I also read that the bytecode for these two snippets would end up being the same:
//Snippet 1
String variable = "text";
this.class.getResourceAsStream("string"+variable);
//Snippet 2
StringBuilder sb = new StringBuilder("string");
sb.append("text");
this.class.getResourceAsStream(sb.toString());
But I obviously have something wrong. When debugging through Snippet 1 in eclipse, I am actually taken to the StringBuilder constructor and to the append method. I suppose I'm missing details on how bytecode is interpreted and how the debugger refers back to the lines in the source code; if anyone could explain this a bit, I'd really appreciate it. Also, maybe you can point out what's JVM specific and what isn't (I'm for example running Oracle's v6), Thanks!
Why do StringBuilders pop up when debugging String concatenation?
Because string concatenation (via the '+' operator) is typically compiled to code that uses a StringBuffer or StringBuilder to do the concatenation. The JLS explicitly permits this behaviour.
"An implementation may choose to perform conversion and concatenation in one step to avoid creating and then discarding an intermediate String object. To increase the performance of repeated string concatenation, a Java compiler may use the StringBuffer class or a similar technique to reduce the number of intermediate String objects that are created by evaluation of an expression." JLS 15.18.1.
(If your code is using a StringBuffer rather than a StringBuilder, it is probably because it was compiled using a really old Java compiler, or because you have specified a really old target JVM. The StringBuilder class is a relatively addition to Java. Older versions of the JLS used to mention StringBuffer instead of StringBuilder, IIRC.)
Also, maybe you can point out what's JVM specific and what isn't.
The bytecodes produced for "string" + variable" depend on how the Java compiler handles the concatenation. (In fact, all generated bytecodes are Java compiler dependent to some degree. The JLS and JVM specs do not dictate what bytecodes must be generated. The specifications are more about how the program should behave, and what individual bytecodes do.)
#supercat comments:
I wonder why string concatenation wouldn't use e.g. a String constructor overload which accepts two String objects, allocates a buffer of the proper combined size, and joins them? Or, when joining more strings, an overload which takes a String[]? Creating a String[] containing references to the strings to be joined should be no more expensive than creating a StringBuilder, and being able to create a perfect-sized backing store in one shot should be an easy performance win.
Maybe ... but I'd say probably not. This is a complicated area involving complicated trade-offs. The chosen implementation strategy for string concatenation has to work well across a wide range of different use-cases.
My understanding is that the original strategy was chosen after looking at a number of approaches, and doing some large-scale static code analysis and benchmarking to try to figure out which approach was best. I imagine they considered all of the alternatives that you proposed. (After all, they were / are smart people ...)
Having said that, the complete source code base for Java 6, 7 and 8 are available to you. That means that you could download it, and try some experiments of your own to see if your theories are right. If they are ... and you can gather solid evidence that they are ... then submit a patch to the OpenJDK team.
#StephenC I am still not convinced with the explanation. The compiler may do whatever optimization it wants to do but when you debug through the eclipse the source code view is hidden from compiler code and it should not jump one section of code to another code within the same source file.
The following description in the question suggests that the source code and byte code are not in sync. i.e., he is not running the latest code.
When debugging through Snippet 1 in eclipse, I am actually taken to the StringBuffer constructor and to the append method
and
how the debugger refers back to the lines in the source code

JDK compiler optimize use of anonymous classes with no instance variables?

I was curious, I see this kind of thing a lot:
Arrays.sort(array, new Comparator<Integer>() {
public int compare(Integer a, Integer b) {
return Math.abs(a) < Math.abs(b);
}
});
since the anonymous class created here has no instance variables, is the standard JDK compiler smart enough to only instantiate that anonymous class once and reuse it? Or is it advisable to instantiate that anonymous class in a static field and always pass the static Comparator object?
UPDATE: When I say "JDK compiler", I mean the JIT portion too. The above is also just an example. I was really curious if I should, as a best practice, create static fields for the above instead of inlining anonymous class instantiations. In some cases the performance/resource usage issue will be negligible. But other cases might not be...
javac will definitely not do such thing; that would violate the language semantics. JVM could optimize it in theory, but it's not that smart yet. A static one would be faster.
Ironically, such analysis and optimization would be easy for javac, and it can be done today, except it's forbidden to do so - the source says new, so javac must new.
Rumor has it that the coming lambda expression in java 8 will make an effort on this issue, in
Arrays.sort(array, (Integer a,Integer b) => Math.abs(a)<Math.abs(b) )
javac is required to analyze that the lambda is stateless, and a lazily created single instance is enough, and that's what it must compile the code into.
The answer is maybe.
javac will compile this into bytecode that is true to your code. Therefore, the class will be instantiate anew each time this code is called. However the JVM performs many sophisticated optimisations at runtime, and MIGHT - depending on many factors - perform some optimisations here.
In any case, unless you've noticed a real performance issue here, I'd recommend not trying to manually optimise.
You can use javap to check yourself; my suspicion is that it will create it on every call. Such optimizations are handled by hotspot but in this case since the anonymous only has methods (no fields to initialize) the overhead is very small.
The real question here is not what javac will do. As Steve McLeod says, javac will produce bytecode which matches your source code. The question is what hotspot will do at runtime. I imagine it will simple inline the whole lot (including the Math.abs() call).
The compiler does very little optimisation. The jit does most of it. It will optimise this code but it won't avoid creating Integer objects and unless you have an array of Integer this will be your bottle neck. So much so that any other optimisation it does won't matter.

Will the Java optimizer remove parameter construction for empty method calls?

Suppose I have code like :
log.info("Now the amount" + amount + " seems a bit high")
and I would replace the log method with a dummy implementation like :
class Logger {
...
public void info() {}
}
Will the optimizer inline it and will the dead code removal remove the parameter construction if no side effects are detected?
I would guess that the compiler (javac) will not, but that the just in time compiler very likely will.
For it to work however, it has to be able to deduct that whatever instructions you use to generate the parameter have no side effects. It should be able to do that for the special case of strings, but it might not for other methods.
To be sure, make a small benchmark that compares this with the code that Jesper suggested and see, whichever is faster or if they are equally fast.
Also see here: http://www.ibm.com/developerworks/java/library/j-jtp12214/index.html#3.0
The important answer is: it might do.
I don't think you should be relying on any particular behaviour of the optimiser. If your code runs acceptably fast without optimisation, then you may or may not get a performance boost, but the good news is your solution will be fine regardless. If performance is dreadful when non-optimised, it's not a good solution.
The problem with relying on the optimiser is that you're setting yourself up to conform to an invisible contract that you have no idea of. Hotspot will perform differently in other JDK/JRE versions, so there's no guarantee that just because it runs fine on your exact JVM, it'll run fine elsewhere. But beyond that, the exact optimisations that take place may depend on environmental issues such as the amount of free heap, the amount of cores on the machine, etc.
And even if you manage to confirm it works fine in your situation right now, you've just made your codebase incredibly unstable. I know for a fact that one of the optimisations/inlinings that Hotspot does depends on the number of subclasses loaded and used for a non-final class. If you start using another library/module which loads a second implementation of log - BANG, the optimisation gets unwound and performance is terrible again. And good luck working out how adding a third party library to your classpath toasts your app's internal performance...
Anyway, I don't think you're asking the real question. In the case you've described, the better solution is not to change the info method's implementation, but change the calls to no-ops (i.e. comment them out). If you want to do this across a whole slew of classes at once, at compile time, you can use a type of IFDEF like so:
public class Log
{
public static final boolean USE_INFO = true;
public void info()
{
...
}
...
}
and then in your class:
if (Log.USE_INFO)
{
log.info("Now the amount" + amount + " seems a bit high");
}
Now the compiler (javac, not Hotspot) can see that the boolean condition is constant and will elide it when compiling. If you set the boolean flag to false and recompile, then all of the info statements will be stripped from the file completely as javac can tell that they will never be called.
If you want to be able to enable/disable info logging at runtime, then instead of a constant boolean flag you'll need to use a method and here's there's nothing for it but to call the method every time. Depending on the implementation of the method, Hotspot may be able to optimise the check but as I mentioned above, don't count on it.

Categories