Today I was reading Antonio's Blog about toString() performance and there is a paragraph:
What used to be considered evil yesterday (“do not concatenate Strings with + !!!“), has become cool and efficient! Today the JVM compiles the + symbol into a string builder (in most cases). So, do not hesitate, use it.
Now I am confused, because he is saying Today the JVM compiles the + symbol into a string builder (in most cases), but I have never heard or seen(code) anything like this before.
Could someone please give example where JVM does this and in what conditions it happens?
The rule
“do not concatenate Strings with + !!!“
is wrong, because it is incomplete and therefore misleading.
The rule is
do not concatenate Strings with + in a loop
and that rule still holds. The original rule was never meant to be applied outside of loops!
A simple loop
String s = "";
for (int i = 0; i < 10000; i++) { s += i; }
System.out.println(s);
is still much still much slower than
StringBuilder sb = new StringBuilder();
for (int i = 0; i < 10000; i++) { sb.append(i); }
System.out.println(sb.toString());
because the Java compiler has to translate the first loop into
String s = "";
for (int i = 0; i < 1000; i++) { s = new StringBuilder(s).append(i).toString(); }
System.out.println(s);
Also the claim
Today the JVM compiles the + symbol into a string builder (in most cases).
is misleading at least, because this translation was already done with Java 1.0 (ok, not with StringBuilder but with StringBuffer, because StringBuilder was only added with Java5).
One could also argue that the claim
Today the JVM compiles the + symbol into a string builder (in most cases).
is simply wrong, because the compilation is not done by the JVM. It is done by the Java Compiler.
For the question: when does the Java compiler use StringBuilder.append() and when does it use some other mechanism?
The source code of the Java compiler (version 1.8) contains two places where String concationation through the + operator is handled.
the first place is String constant folding (http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/8u40-b25/com/sun/tools/javac/comp/ConstFold.java?av=f#314). In this case the compiler can calculate the resulting string and works with the resulting string.
the second place is where the compiler creates the code for assignment operations (http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/8u40-b25/com/sun/tools/javac/jvm/Gen.java?av=f#2056). In this case the compiler always emits code to create a StringBuilder
The conclusion is that for the Java compiler from the OpenJDK (which means the compiler distributed by Oracle) the phrase in most cases means always. (Though this could change with Java 9, or it could be that another Java compiler like the one that is included within Eclipse uses some other mechanism).
Holger is right in his comment that in java-9 + for String concatenation is going to change from a StringBuilder to a strategy chosen by the JRE via invokedynamic. There are 6 strategies that are possible for String concatenation in jdk-9:
private enum Strategy {
/**
* Bytecode generator, calling into {#link java.lang.StringBuilder}.
*/
BC_SB,
/**
* Bytecode generator, calling into {#link java.lang.StringBuilder};
* but trying to estimate the required storage.
*/
BC_SB_SIZED,
/**
* Bytecode generator, calling into {#link java.lang.StringBuilder};
* but computing the required storage exactly.
*/
BC_SB_SIZED_EXACT,
/**
* MethodHandle-based generator, that in the end calls into {#link java.lang.StringBuilder}.
* This strategy also tries to estimate the required storage.
*/
MH_SB_SIZED,
/**
* MethodHandle-based generator, that in the end calls into {#link java.lang.StringBuilder}.
* This strategy also estimate the required storage exactly.
*/
MH_SB_SIZED_EXACT,
/**
* MethodHandle-based generator, that constructs its own byte[] array from
* the arguments. It computes the required storage exactly.
*/
MH_INLINE_SIZED_EXACT
}
And the default one is not using a StringBuilder, it is MH_INLINE_SIZED_EXACT. It is actually pretty crazy how the implementation works, and it is trying to be highly optimized.
So, no the advice there as far as I can tell is bad. That by the way is the main effort that was put into by jdk by Aleksey Shipilev. He also added a big change into String internals in jdk-9 as they are now backed by a byte[] instead of char[]. This needed because ISO_LATIN_1 Strings can be encoded in a single byte (one character - one byte) so a lot of less space.
The statement, in this exact form, is just wrong, and it fits into the picture that the linked blog continues to write nonsense, like that you had to wrap references with Objects.toString(…) to handle null, e.g. "att1='" + Objects.toString(att1) + '\'' instead of just "att1='" + att1 + '\''. There is no need to do that and apparently, the author did never re-check these claims.
The JVM is not responsible for compiling the + operator, as this operator is merely a source code artifact. It’s the compiler, e.g. javac which is responsible, and while there is no guaranty about the compiled form, compilers are encouraged to use a builder by the Java Language Specification:
An implementation may choose to perform conversion and concatenation in one step to avoid creating and then discarding an intermediate String object. To increase the performance of repeated string concatenation, a Java compiler may use the StringBuffer class or a similar technique to reduce the number of intermediate String objects that are created by evaluation of an expression.
Note that even if a compiler does not perform this optimization, there still is no such thing as a + operator on the byte code level, so the compiler has to pick an operation, a JVM understands, e.g. using String.concat, which might be even faster than using a StringBuilder in the case you’re just concatenating exactly two strings.
Even assuming the worst compilation strategy for string concatenation (still being within the specification), it would be wrong to say to never concatenate strings with +, as when you are defining compile time constants, using + is the only choice, and, of course, a compile-time constant is usually more efficient than using a StringBuilder at runtime.
In practice, the + operator applied to non constant strings was compiled to a StringBuffer usage before Java 5 and to a StringBuilder usage in Java 5 to Java 8. When the compiled code is identical to the manual usage of StringBuffer resp. StringBuilder, there can’t be a performance difference.
The transition to Java 5, more than a decade ago, was the first time, where string concatenation via + had a clear win over manual StringBuffer use, as simply recompiling the concatenation code made it use the potentially faster StringBuilder internally, while the code manually dealing with StringBuffer needed to be rewritten to use StringBuilder, which had been introduced in that version.
Likewise, Java 9 is going to compile the string concatenation using an invokedynamic instruction allowing the JRE to bind it to actual code doing the operation, including optimizations not possible in ordinary Java code. So only recompiling the string concatenation code is needed to get this feature, while there is no equivalent manual usage for it.
That said, while the premise is wrong, i.e. string concatenation never was considered evil, the advice is correct, do not hesitate to use it.
There are only a few cases where you really might improve performance by dealing with a buffer manually, i.e. when you need a large initial capacity or concatenate a lot within loops and that code has been identified as an actual performance bottleneck by a profiling tool…
When you concatenate strings using + operator, compiler translates concatenation code to use StringBuffer for better performance. In order to improve performance StringBuffer is the better choice.
The quickest way of concatenate two string using + operator.
String str = "Java";
str = str + "Tutorial";
The compiler translates this code as:
String s1 = "Java";
StringBuffer sb = new StringBuffer(s1);
sb.append("Tutorial");
s1 = sb.toString();
So it is better to use StringBuffer OR String.format for concatenation
Using String.format
String s = String.format("%s %s", "Java", "Tutorial");
Related
This question already has answers here:
StringBuilder/StringBuffer vs. "+" Operator
(4 answers)
StringBuilder vs String concatenation in toString() in Java
(20 answers)
Closed 8 years ago.
In the past, I have been lead to believe that you should use StringBuilder and append(String) when building a string with variables, as opposed to string += split[i]. In what cases is this accurate? I ask because normally, if I was to write the following:
String[] split = args; // command line arguments or whatever
String myString = "";
for (int i = 0; i < split.length; i++) {
myString += split[i];
}
I am told by my IDE that it should be converted to use a StringBuilder instead. However, writing something like this:
StringBuilder build = new StringBuilder();
build.append("the ").append(build.toString()).append(" is bad").append(randomvar);
build.toString();
IntelliJ actually lists as a performance issue using a StringBuilder when I should be using a String. The fact that it's listed as a performance issue would indicate it could actually cause problems as opposed to just being a tiny bit slower.
I did notice that the first example is a loop and the second isn't - is a StringBuilder recommended for lots of concatenations but normal concatenation is better for non-looping situations (this also means in a loop the operator += would be used, whereas outside of a loop it could be "the " + build.toString() + " is bad" + randomVar - is += the problem as opposed to +?)
String concatenations are converted into calls to StringBuilder.append() behind the scenes.
String literal concatenations are (or at least can be) converted to individual String literals.
You're presumably using a String variable (not just two literals) inside the loop, so Java can't just replace that with a literal; it has to use a StringBuilder. That's why doing String concatenations in a loop should be done using a single StringBuilder, otherwise Java ends up creating another instance of StringBuilder every time the loop iterates.
On the other hand, something like this:
String animals = "cats " + "dogs " + "lizards ";
Will (or can be) replaced (by Java, not you) with a single String literal, so using a StringBuilder is actually counter-productive.
Beginning in java 1.5, the String + operator is translated into calls to StringBuilder.
In your example, the loop should be slower because the + operator creates a new StringBuilder instance each time through the loop.
The compiler will actually turn them both into the same form before compiling so neither will result in any performance difference. In this scenario you want to go with the shortest and most readable method available to you.
"An implementation may choose to perform conversion and concatenation
in one step to avoid creating and then discarding an intermediate
String object. To increase the performance of repeated string
concatenation, a Java compiler may use the StringBuffer class or a
similar technique to reduce the number of intermediate String objects
that are created by evaluation of an expression.
For primitive types, an implementation may also optimize away the
creation of a wrapper object by converting directly from a primitive
type to a string."
Source: http://docs.oracle.com/javase/specs/jls/se5.0/html/expressions.html#15.18.1.2
For little concats you can use the + operator with none issue. StringBuffer is indicated when we have large strings to be concatened, so with this class you can save memory and processor's time as well.
You can make a test trying to concat 1 million of words using + operator, and run the same teste using StringBuffer to see the different by yourself.
This question already has answers here:
StringBuilder vs String concatenation in toString() in Java
(20 answers)
Closed 8 years ago.
When should we use + for concatenation of strings, when is StringBuilder preferred and When is it suitable to use concat.
I've heard StringBuilder is preferable for concatenation within loops. Why is it so?
Thanks.
Modern Java compiler convert your + operations by StringBuilder's append. I mean to say if you do str = str1 + str2 + str3 then the compiler will generate the following code:
StringBuilder sb = new StringBuilder();
str = sb.append(str1).append(str2).append(str3).toString();
You can decompile code using DJ or Cavaj to confirm this :)
So now its more a matter of choice than performance benefit to use + or StringBuilder :)
However given the situation that compiler does not do it for your (if you are using any private Java SDK to do it then it may happen), then surely StringBuilder is the way to go as you end up avoiding lots of unnecessary String objects.
I tend to use StringBuilder on code paths where performance is a concern. Repeated string concatenation within a loop is often a good candidate.
The reason to prefer StringBuilder is that both + and concat create a new object every time you call them (provided the right hand side argument is not empty). This can quickly add up to a lot of objects, almost all of which are completely unnecessary.
As others have pointed out, when you use + multiple times within the same statement, the compiler can often optimize this for you. However, in my experience this argument doesn't apply when the concatenations happen in separate statements. It certainly doesn't help with loops.
Having said all this, I think top priority should be writing clear code. There are some great profiling tools available for Java (I use YourKit), which make it very easy to pinpoint performance bottlenecks and optimize just the bits where it matters.
P.S. I have never needed to use concat.
From Java/J2EE Job Interview Companion:
String
String is immutable: you can’t modify a String object but can replace it by creating a new instance. Creating a new instance is rather expensive.
//Inefficient version using immutable String
String output = "Some text";
int count = 100;
for (int i = 0; i < count; i++) {
output += i;
}
return output;
The above code would build 99 new String objects, of which 98 would be thrown away immediately. Creating new objects is not efficient.
StringBuffer/StringBuilder
StringBuffer is mutable: use StringBuffer or StringBuilder when you want to modify the contents. StringBuilder was added in Java 5 and it is identical in all respects to StringBuffer except that it is not synchronised, which makes it slightly faster at the cost of not being thread-safe.
//More efficient version using mutable StringBuffer
StringBuffer output = new StringBuffer(110);
output.append("Some text");
for (int i = 0; i < count; i++) {
output.append(i);
}
return output.toString();
The above code creates only two new objects, the StringBuffer and the final String that is returned. StringBuffer expands as needed, which is costly however, so it would be better to initialise the StringBuffer with the correct size from the start as shown.
If all concatenated elements are constants (example : "these" + "are" + "constants"), then I'd prefer the +, because the compiler will inline the concatenation for you. Otherwise, using StringBuilder is the most effective way.
If you use + with non-constants, the Compiler will internally use StringBuilder as well, but debugging becomes hell, because the code used is no longer identical to your source code.
My recommendation would be as follows:
+: Use when concatenating 2 or 3 Strings simply to keep your code brief and readable.
StringBuilder: Use when building up complex String output or where performance is a concern.
String.format: You didn't mention this in your question but it is my preferred method for creating Strings as it keeps the code the most readable / concise in my opinion and is particularly useful for log statements.
concat: I don't think I've ever had cause to use this.
Use StringBuilder if you do a lot of manipulation. Usually a loop is a pretty good indication of this.
The reason for this is that using normal concatenation produces lots of intermediate String object that can't easily be "extended" (i.e. each concatenation operation produces a copy, requiring memory and CPU time to make). A StringBuilder on the other hand only needs to copy the data in some cases (inserting something in the middle, or having to resize because the result becomes to big), so it saves on those copy operations.
Using concat() has no real benefit over using + (it might be ever so slightly faster for a single +, but once you do a.concat(b).concat(c) it will actually be slower than a + b + c).
Use + for single statements and StringBuilder for multiple statements/ loops.
The performace gain from compiler applies to concatenating constants.
The rest uses are actually slower then using StringBuilder directly.
There is not problem with using "+" e.g. for creating a message for Exception because it does not happen often and the application si already somehow screwed at the moment. Avoid using "+" it in loops.
For creating meaningful messages or other parametrized strings (Xpath expressions e.g.) use String.format - it is much better readable.
I suggest to use concat for two string concatination and StringBuilder otherwise, see my explanation for concatenation operator (+) vs concat()
UPDATES: thanks a lot to Gabe and Glenn for the detailed explanation. The test is wrote not for language comparison benchmark, just for my studying on VM optimization technologies.
I did a simple test to understand the performance of string concatenation between Java and Python.
The test is target for the default immutable String object/type in both languages. So I don't use StringBuilder/StringBuffer in Java test.
The test simply adds strings for 100k times. Java consumes ~32 seconds to finish, while Python only uses ~13 seconds for Unicode string and 0.042 seconds for non Unicode string.
I'm a bit surprise about the results. I thought Java should be faster than Python. What optimization technology does Python leverage to achieve better performance? Or String object is designed too heavy in Java?
OS: Ubuntu 10.04 x64
JDK: Sun 1.6.0_21
Python: 2.6.5
Java test did use -Xms1024m to minimize GC activities.
Java code:
public class StringConcateTest {
public static void test(int n) {
long start = System.currentTimeMillis();
String a = "";
for (int i = 0; i < n; i++) {
a = a.concat(String.valueOf(i));
}
long end = System.currentTimeMillis();
System.out.println(a.length() + ", time:" + (end - start));
}
public static void main(String[] args) {
for (int i = 0; i < 10; i++) {
test(1000 * 100);
}
}
}
Python code:
import time
def f(n):
start = time.time()
a = u'' #remove u to use non Unicode string
for i in xrange(n):
a = a + str(i)
print len(a), 'time', (time.time() - start)*1000.0
for j in xrange(10):
f(1000 * 100)
#Gabe's answer is correct, but needs to be shown clearly rather than hypothesized.
CPython (and probably only CPython) does an in-place string append when it can. There are limitations on when it can do this.
First, it can't do it for interned strings. That's why you'll never see this if you test with a = "testing"; a = a + "testing", because assigning a string literal results in an interned string. You have to create the string dynamically, as this code does with str(12345). (This isn't much of a limitation; once you do an append this way once, the result is an uninterned string, so if you append string literals in a loop this will only happen the first time.)
Second, Python 2.x only does this for str, not unicode. Python 3.x does do this for Unicode strings. This is strange: it's a major performance difference--a difference in complexity. This discourages using Unicode strings in 2.x, when they should be encouraging it to help the transition to 3.x.
And finally, there can be no other references to the string.
>>> a = str(12345)
>>> id(a)
3082418720
>>> a += str(67890)
>>> id(a)
3082418720
This explains why the non-Unicode version is so much faster in your test than the Unicode version.
The actual code for this is string_concatenate in Python/ceval.c, and works for both s1 = s1 + s2 and s1 += s2. The function _PyString_Resize in Objects/stringobject.c also says explicitly: The following function breaks the notion that strings are immutable. See also http://bugs.python.org/issue980695.
My guess is that Python just does a realloc on the string rather than create a new one with a copy of the old one. Since realloc takes no time when there is enough empty space following the allocation, it is very fast.
So how come Python can call realloc and Java can't? Python's garbage collector uses reference counting so it can tell that nobody else is using the string and it won't matter if the string changes. Java's garbage collector doesn't maintain reference counts so it can't tell whether any other reference to the string is extant, meaning it has no choice but to create a whole new copy of the string on every concatenation.
EDIT: Although I don't know that Python actually does call realloc on a concat, here's the comment for _PyString_Resize in stringobject.c indicating why it can:
The following function breaks the notion that strings are immutable:
it changes the size of a string. We get away with this only if there
is only one module referencing the object. You can also think of it
as creating a new string object and destroying the old one, only
more efficiently. In any case, don't use this if the string may
already be known to some other part of the code...
I don't think your test means a lot, since Java and Python handle strings differently (I am no expert in Python but I do know my way in Java). StringBuilders/Buffers exists for a reason in Java. The language designers didn't do any kind of more efficient memory management/manipulation exactly for this reason: there are other tools than the "String" object to do this kind of manipulation and they expect you to use them when you code.
When you do things the way they are meant to be done in Java, you will be surprised how fast the platform is... But I have to admit that I have been pretty much impressed by the performance of some Python applications I have tried recently.
I do not know the answer for sure. But here are some thoughts. First, Java internally stores strings as char [] arrays containing the UTF-16 encoding of the string. This means that every character in the strings takes at least two bytes. So just in terms of raw storage, Java would have to copy around twice as much data as python strings. Python unicode strings are therefore the better test because they are similarly capable. Perhaps python stores unicode strings as UTF-8 encoded bytes. In that case, if all you are storing in these are ASCII characters, then again you'd have Java using twice as much space and therefore doing twice as much copying. To get a better comparison you should concatenate strings containing more interesting characters that require two or more bytes in their UTF-8 encoding.
I ran Java code with a StringBuilder in place of a String and saw an average finish time of 10ms (high 34ms, low 5ms).
As for the Python code, using "Method 6" here (found to be the fastest method), I was able to achieve an average of 84ms (high 91ms, low 81ms) using unicode strings. Using non-unicode strings reduced these numbers by ~25ms.
As such, it can be said based on these highly unscientific tests that using the fastest available method for string concatenation, Java is roughly an order of magnitude faster than Python.
But I still <3 Python ;)
In my project there are some code snippets which uses StringBuffer objects, and the small part of it is as follows
StringBuffer str = new StringBuffer();
str.append("new " + "String()");
so i was confused with the use of append method and the + operator.
ie the following code could be written as
str.append("new ").append("String()");
So are the two lines above same?(functionally yes but) Or is there any particular usage of them? ie performance or readability or ???
thanks.
In that case it's more efficient to use the first form - because the compiler will convert it to:
StringBuffer str = new StringBuffer();
str.append("new String()");
because it concatenates constants.
A few more general points though:
If either of those expressions wasn't a constant, you'd be better off (performance-wise) with the two calls to append, to avoid creating an intermediate string for no reason
If you're using a recent version of Java, StringBuilder is generally preferred
If you're immediately going to append a string (and you know what it is at construction time), you can pass it to the constructor
Actually the bytecode compiler will replace all string concatenation which involve non constants in a Java program with invocations of StringBuffer. That is
int userCount = 2;
System.out.println("You are the " + userCount + " user");
will be rewritten as
int userCount = 2;
System.out.println(new StringBuffer().append("You are the ").append(userCount).append(" user").toString());
That is at least what is observable when decompiling java class files compiled with JDK 5 or 6. See this post.
The second form is most efficient in terms of performance because there is only one string object that is created and is appended to the stringbuffer.
The first form creates three string objects 1) for "new" 2)for "new String" 3) for the concatenated result of 1) and 2). and this third string object is concatenated to the string buffer.
Unless you are working with concurrent systems, use StringBuilder instead of StringBuffer. Its faster but not thread-safe :)
It also shares the same API so its more or less a straight find/replace-
Is conversion to String in Java using
"" + <int value>
bad practice? Does it have any drawbacks compared to String.valueOf(...)?
Code example:
int i = 25;
return "" + i;
vs:
int i = 25;
return String.valueOf(i);
Update: (from comment)
And what about Integer.toString(int i) compared to String.valueOf(...)?
I would always prefer the String.valueOf version: mostly because it shows what you're trying to do. The aim isn't string concatenation - it's conversion to a string, "the string value of i".
The first form may also be inefficient - depending on whether the compiler spots what you're doing. If it doesn't, it may be creating a new StringBuffer or StringBuilder and appending the value, then converting it to a string.
Funnily enough, I have an article about this very topic - written years and years ago; one of the first Java articles on my web site, IIRC.
There is also Integer.toString(int i), which gives you the option of getting the string as a hex value as well (by passing a second param of 16).
Edit I just checked the source of String class:
public static String valueOf(int i) {
return Integer.toString(i, 10);
}
And Integer class:
public static String toString(int i, int radix) {
if (radix < Character.MIN_RADIX || radix > Character.MAX_RADIX)
radix = 10;
/* Use the faster version */
if (radix == 10) {
return toString(i);
}
...
If you call String.valueOf(i), it calls Integer.toString(i, 10), which then calls Integer.toString(i).
So Integer.toString(i) should be very slighty faster than String.valueOf(i), since you'd be cutting out two function calls. (Although the first function call could be optimized away by the compiler.)
Of course, a readability argument could still be made for String.valueOf(), since it allows you to change the type of the argument (and even handles nulls!), and the performance difference is negligible.
Definitely use String.valueOf(i).
Although I'm not sure of the optimizations on the compiler side, worst case scenario if you use "" + :
"" creates a new empty string.
"" + creates a StringBuilder (Java 1.5-16)
"" is appended to the StringBuilder, then
In other words, there is a lot of overhead that occurs if you use string addition. This is why it is not recommended to use the + operator on strings in loops. In general, always use Boolean.valueOf, Integer.valueOf, String.valueOf... etc, when possible. You'll save both on memory and on overhead.
Regardless of any performance considerations I think the first variant is really ugly. IMHO it's a shame that this kind of "dynamic casting" is even possible in Java.
Yes, it is IMHO a bad practice.
It would require to memory allocations (unless compiler and/or JIT optimize them). What's more, it will make less evident, what this code tries to do.
Personally I dislike the style of "" + i, but that is really a preference/coding standards thing. Ideally the compiler would optimize those into equivalent code (although you would have to decompile to see if it actually does), but technically, without optimization, "" + i is more inefficient because it creates a StringBuilder object that wasn't needed.
Right off the bat all I can think of is that in the your first example more String objects will be created than in the second example (and an additional StringBuilder to actually perform the concatenation).
But what you are actualy trying to do is create a String object from a int not concatenate a String with an int, so go for the:
String.valueOf(...);
option,
So yes your first option is bad practice!
I wonder what is best for static final variables contributing to compile-time constants:
public static final int VIEW_TYPE_LABEL_FIELD = 1;
public static final int VIEW_TYPE_HEADER_FIELD = ;
...
List <String[]> listViewInfo = new ArrayList<>();
listViewInfo.add(new String[]{"Label/Field view", String.valueOf(VIEW_TYPE_LABEL_FIELD)});
listViewInfo.add(new String[]{"Header/Field view", "" + VIEW_TYPE_LABEL_FIELD});
The compiler can potentially replace the String expressions with a constant. Is one or the other more recognizable as a compile-time constant? Maybe easier for the ("" + ..) construct?