String Concatenation and Autoboxing in Java - java

When you concatenate a String with a primitive such as int, does it autobox the value first.
ex.
String string = "Four" + 4;
How does it convert the value to a string in Java?

To see what the Java compiler produces it is always useful to use javap -c to show the actual bytecode produced:
For example the following Java code:
String s1 = "Four" + 4;
int i = 4;
String s2 = "Four" + i;
would produce the following bytecode:
0: ldc #2; //String Four4
2: astore_1
3: iconst_4
4: istore_2
5: new #3; //class java/lang/StringBuilder
8: dup
9: invokespecial #4; //Method java/lang/StringBuilder."<init>":()V
12: ldc #5; //String Four
14: invokevirtual #6; //Method java/lang/StringBuilder.append:(Ljava/lang/
String;)Ljava/lang/StringBuilder;
17: iload_2
18: invokevirtual #7; //Method java/lang/StringBuilder.append:(I)Ljava/lan
g/StringBuilder;
21: invokevirtual #8; //Method java/lang/StringBuilder.toString:()Ljava/la
ng/String;
24: astore_3
25: return
From this we can see:
In the case of "Four" + 4, the Java compiler (I was using JDK 6) was clever enough to deduce that this is a constant, so there is no computational effort at runtime, as the string is concatenated at compile time
In the case of "Four" + i, the equivalent code is new StringBuilder().append("Four").append(i).toString()
Autoboxing is not involved here as there is an StringBuilder.append(int) method which according to the docs is using String.valueOf(int) to create the string representation of the integer.

The java compiler actually creates a StringBuilder1 and invokes the append() method. It can be seen in the byte-code:
22 invokespecial java.lang.StringBuilder(java.lang.String) [40]
...
29 invokevirtual java.lang.StringBuilder.append(int) : java.lang.StringBuilder [47]
32 invokevirtual java.lang.StringBuilder.toString() : java.lang.String [51]
Nevertheless, the behavior is identical to boxing and then invoking toString(): "Four" + new Integer(4).toString() - which I believe what the language designers had in mind.
(1) To be exact, the compiler is already concatting the string literal and int literal to a single string literal "Four4". You can see it in the byte code in the following line in byte-code:
0 ldc <String "Four4"> [19]

According to http://jcp.org/aboutJava/communityprocess/jsr/tiger/autoboxing.html, autoboxing is done on the primitive type whenever a reference type is needed(such as the Integer class in this case)
So the int will be converted into an Integer then that integer objects toString() method is called and its result is appended to the preceding string.

Related

Can we use the + sign to add a string literal in a StringBuffer?

StringBuffer sb = new StringBuffer();
sb.append("New "+"Delhi");
and other is:
sb.append("New ").append("Delhi");
both will print "New Delhi".
Which one is better and why?
Because some times to save time I use "+" instead of ".append".
sb.append("New "+"Delhi"):
public class Test {
public Test();
Code:
0: aload_0
1: invokespecial #1 // Method java/lang/Object."<init>":()V
4: return
public static void main(java.lang.String[]);
Code:
0: new #2 // class java/lang/StringBuffer
3: dup
4: invokespecial #3 // Method java/lang/StringBuffer."<init>":()V
7: astore_1
8: aload_1
9: ldc #4 // String New Delhi
11: invokevirtual #5 // Method java/lang/StringBuffer.append:(Ljava/lang/String;)Ljava/lang/StringBuffer;
14: pop
15: return
}
sb.append("New ").append("Delhi"):
Compiled from "Test.java"
public class Test {
public Test();
Code:
0: aload_0
1: invokespecial #1 // Method java/lang/Object."<init>":()V
4: return
public static void main(java.lang.String[]);
Code:
0: new #2 // class java/lang/StringBuffer
3: dup
4: invokespecial #3 // Method java/lang/StringBuffer."<init>":()V
7: astore_1
8: aload_1
9: ldc #4 // String New
11: invokevirtual #5 // Method java/lang/StringBuffer.append:(Ljava/lang/String;)Ljava/lang/StringBuffer;
14: ldc #6 // String Delhi
16: invokevirtual #5 // Method java/lang/StringBuffer.append:(Ljava/lang/String;)Ljava/lang/StringBuffer;
19: pop
20: return
}
As the above bytecode, for static string:
when using "+", the javac compiler will auto concat it a String.
when using "append", the javac compiler will auto expand as two String variables.
so for static string, the "+" is good for using.
"+" sign is used to add a string at the end of another string. Now, as per your question, whenever append() is used with String Buffer to append character sequence or string at that time append function internally performing to concatenate string using "+" sign.
any string append operation is converted into StringBuilder internally like
"The answer is: " + value
is converted into :
new StringBuilder("The answer is: ")).append(value).toString()
If any expression getting concatenated is not constant , .append is a better approach.
so in your case doesn't matter performance wise which way you write. Only '+' will improve readability of your code.
Constant string concatenations will be replaced at compile-time.
You should use a Stringbuilder/Stringbuffer if you concatenate variable strings e.g. variables, especially when you do the concatenations in a loop.

JVM bug? Cached Object field value cause ArrayIndexOutOfBoundsException

This is kind of strange, but code speaks more then words, so look at the test to see what I'm doing. In my current setup (Java 7 update 21 on Windows 64 bit) this test fails with ArrayIndexOutOfBoundsException, but replacing the test method code with the commented code, it the works. And I wonder if there is any part of the Java specification that would explain why.
It seems to me, as "michael nesterenko" suggested, that the value of the array field is cached in the stack, before calling the method, and not updated on return from the call. I can't tell if it's a JVM bug or a documented "optimisation". No multi-threading or "magic" involved.
public class TestAIOOB {
private String[] array = new String[0];
private int grow(final String txt) {
final int index = array.length;
array = Arrays.copyOf(array, index + 1);
array[index] = txt;
return index;
}
#Test
public void testGrow() {
//final int index = grow("test");
//System.out.println(array[index]);
System.out.println(array[grow("test")]);
}
}
This is well defined by the Java Language Specification: to evaluate x[y], first x is evaluated, and then y is evaluated. In your case, x evaluates to a String[] with zero elements. Then, y modifies a member variable, and evaluates to 0. Trying to access the 0th element of the already-returned array fails. The fact that the member array changes has no bearing on the array lookup, because we're looking at the String[] that array referenced at the time we evaluated it.
This behavior is mandated by the JLS. Per 15.13.1, "An array access expression is evaluated using the following procedure: First, the array reference expression is evaluated. If this evaluation completes abruptly, then the array access completes abruptly for the same reason and the index expression is not evaluated. Otherwise, the index expression is evaluated. [...]".
Compare the compiled Java code by using javap -c TestAIOOB
Uncommented code:
public void testGrow();
Code:
0: getstatic #6; //Field java/lang/System.out:Ljava/io/PrintStream;
3: aload_0
4: getfield #3; //Field array:[Ljava/lang/String;
7: aload_0
8: ldc #7; //String test
10: invokespecial #8; //Method grow:(Ljava/lang/String;)I
13: aaload
14: invokevirtual #9; //Method java/io/PrintStream.println:(Ljava/lang/St
ing;)V
17: return
Commented code:
public void testGrow();
Code:
0: aload_0
1: ldc #6; //String test
3: invokespecial #7; //Method grow:(Ljava/lang/String;)I
6: istore_1
7: getstatic #8; //Field java/lang/System.out:Ljava/io/PrintStream;
10: aload_0
11: getfield #3; //Field array:[Ljava/lang/String;
14: iload_1
15: aaload
16: invokevirtual #9; //Method java/io/PrintStream.println:(Ljava/lang/Str
ing;)V
19: return
In the first the getfield happens before the call to grow and in the second it happens after.

String object creation in java [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Questions about Java’s String pool
I have a doubt in java Strings object creation.
String s1 = "Hello"+"world";
String s2 = s1+"Java";
in this program how many String objects will be created and how ?please explain it.
Thanks.
The answer is 3
Two String objects will be created once per JVM start:
"Helloworld"
"Java"
Both will be interned, because they are constants (known at compile time).
They will be reused every time this code runs. A StringBuilder will be created to concatenate the two String above. References to them will be assigned to s1 and s2.
Here's the bytecode for the code:
0: ldc #37; //String Helloworld
2: astore_1
3: new #39; //class java/lang/StringBuilder
6: dup
7: aload_1
8: invokestatic #41; //Method java/lang/String.valueOf:(Ljava/lang/Object;)Ljava/lang/String;
11: invokespecial #47; //Method java/lang/StringBuilder."<init>":(Ljava/lang/String;)V
14: ldc #50; //String Java
16: invokevirtual #52; //Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
19: invokevirtual #56; //Method java/lang/StringBuilder.toString:()Ljava/lang/String;
22: astore_2
23: return
You can't really say, how many Strings are created, since there's several differences due to the different implementations of the JVM.
As String is an immutable class, the naive answer is 5. But with some optimization (e.g. using a StringBuffer/ StringBuilder there would only be 2 Strings.
As concats would be summarized via append()-calls.
Edit: As the're some different answers here an explanation why I said 5:
"Hello"
"world"
(s1) "Helloworld"
"Java"
(s2) "HelloworldJava"
if you look at the compiled code, you can easily guess:
String s1 = "Helloworld";
String s2 = (new StringBuilder(String.valueOf(s1))).append("Java").toString();
We can't accurately know by just looking at source code as many optimizations are done by the compiler before execution.
Here we see that 1 String object is created for s1, and another String object for s2. Here 2 string literals are there in the string pool: "Helloworld" and "Java"
If you decompile your Program.class you will see the real code
String s1 = "Helloworld";
String s2 = (new StringBuilder(String.valueOf(s1))).append("Java").toString();
10 objects it seems, because inside each String there is char[] value this is a separate object + another char[] inside StringBuilder
The answer is 3.
You can view the deassembled result by:
javap -verbose YourClass
The Constant pool includes:
...
const #17 = Asciz Helloworld;
...
const #30 = Asciz Java;
...
It means two strings ("Helloworld" and "Java") are compile-time constant expression which will be interned into constant pool automatically.
The code:
Code:
Stack=3, Locals=3, Args_size=1
0: ldc #16; //String Helloworld
2: astore_1
3: new #18; //class java/lang/StringBuilder
6: dup
7: aload_1
8: invokestatic #20; //Method java/lang/String.valueOf:(Ljava/lang/Object;)Ljava/lang/String;
11: invokespecial #26; //Method java/lang/StringBuilder."<init>":(Ljava/lang/String;)V
14: ldc #29; //String Java
16: invokevirtual #31; //Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
19: invokevirtual #35; //Method java/lang/StringBuilder.toString:()Ljava/lang/String;
22: astore_2
23: return
It indicates that s2 is created by StringBuilder.append() and toString().
To make this more interesting, javac can optimize the code in constant folding. You can guess the count of strings created by the following code:
final String s1 = "Hello" + "world";
String s2 = s1 + "Java";
"final" means s1 is constant which can help javac to build the value of s2 and intern s2. So the count of string created here is 2.
Yup, there will be be five String objects created. Strings are immutable; followings are the steps -
1. First "Hello"
2. then another objects "Hello World"
3. then another object "Java"
4. then s1+"Java"
and then finally s2 would be created.
Actually no String objects will be created just two String literals.
When Strings are initialized like you have they are literals not objects. If you wanted to create String objects you would do the following
String a = new String("abcd");

How many objects are created

I was having a discussion about usage of Strings and StringBuffers in Java. How many objects are created in each of these two examples?
Ex 1:
String s = "a";
s = s + "b";
s = s + "c";
Ex 2:
StringBuilder sb = new StringBuilder("a");
sb.append("b");
sb.append("c");
In my opinion, Ex 1 will create 5 and Ex 2 will create 4 objects.
I've used a memory profiler to get the exact counts.
On my machine, the first example creates 8 objects:
String s = "a";
s = s + "b";
s = s + "c";
two objects of type String;
two objects of type StringBuilder;
four objects of type char[].
On the other hand, the second example:
StringBuffer sb = new StringBuffer("a");
sb.append("b");
sb.append("c");
creates 2 objects:
one object of type StringBuilder;
one object of type char[].
This is using JDK 1.6u30.
P.S. To the make the comparison fair, you probably ought to call sb.toString() at the end of the second example.
In terms of objects created:
Example 1 creates 8 objects:
String s = "a"; // No object created
s = s + "b"; // 1 StringBuilder/StringBuffer + 1 String + 2 char[] (1 for SB and 1 for String)
s = s + "c"; // 1 StringBuilder/StringBuffer + 1 String + 2 char[] (1 for SB and 1 for String)
Example 2 creates 2 object:
StringBuffer sb = new StringBuffer("a"); // 1 StringBuffer + 1 char[] (in SB)
sb.append("b"); // 0
sb.append("c"); // 0
To be fair, I did not know that new char[] actually created an Object in Java (but I knew they were created). Thanks to aix for pointing that out.
You can determine the answer by analyzing the java bytecode (use javap -c). Example 1 creates two StringBuilder objects (see line #4) and two String objects (see line #7), while example 2 creates one StringBuilder object (see line #2).
Note that you must also take the char[] objects into account (since arrays are objects in Java). String and StringBuilder objects are both implemented using an underlying char[]. Thus, example 1 creates eight objects and example 2 creates two objects.
Example 1:
public static void main(java.lang.String[]);
Code:
0: ldc #2; //String a
2: astore_1
3: new #3; //class java/lang/StringBuilder
6: dup
7: invokespecial #4; //Method java/lang/StringBuilder."<init>":()V
10: aload_1
11: invokevirtual #5; //Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
14: ldc #6; //String b
16: invokevirtual #5; //Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
19: invokevirtual #7; //Method java/lang/StringBuilder.toString:()Ljava/lang/String;
22: astore_1
23: new #3; //class java/lang/StringBuilder
26: dup
27: invokespecial #4; //Method java/lang/StringBuilder."<init>":()V
30: aload_1
31: invokevirtual #5; //Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
34: ldc #8; //String c
36: invokevirtual #5; //Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
39: invokevirtual #7; //Method java/lang/StringBuilder.toString:()Ljava/lang/String;
42: astore_1
43: return
}
Example 2:
public static void main(java.lang.String[]);
Code:
0: new #2; //class java/lang/StringBuilder
3: dup
4: ldc #3; //String a
6: invokespecial #4; //Method java/lang/StringBuilder."<init>":(Ljava/lang/String;)V
9: astore_1
10: aload_1
11: ldc #5; //String b
13: invokevirtual #6; //Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
16: pop
17: aload_1
18: ldc #7; //String c
20: invokevirtual #6; //Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
23: pop
24: return
}
The answer is tied to specific implementations of the language (compiler and runtime libraries). Even to the presence of specific optimization options or not. And, of course, version of the implementation (and, implicitly, the JLS it is compliant with). So, it's better to speak in term of minima and maxima. In fact, this exercise gives a better
For Ex1, the minimum number of objects is 1 (the compiler realizes that there are only constants involved and produces only code for String s= "abc" ; ). The maximum could be just anything, depending on implementation, but a reasonable estimation is 8 (also given in another answer as the number produced by certain configuration).
For Ex2, the minimum number of objects is 2. The compiler has no way of knowing if we have replaced StringBuilder with a custom version with different semantics, so it will not optimize. The maximum could be around 6, for an extremely memory-conserving StringBuilder implementation that expands a backing char[] array one character at a time, but in most cases it will be 2 too.

What's could be the right test case for evaluating assignment operators performance vs concrete operations?

Simply I am trying to figure out what is the fast way to assign a value like
somevar+=1;
or
somevar=somevar+1;
time ago in situations with text instead of integers I encountered some performance decrease using text+="sometext" instead of text.append("sometext") or text=text+"sometext", the problem is that I did not find anymore the source code where I annotated my considerations. So theoretically what's the fastest way?
The code background is set into a fast loop, nearly real time.
If you have something like this:
Collection<String> strings = ...;
String all = "";
for (String s : strings) all += s;
... then it's equivalent to:
Collection<String> strings = ...;
String all = "";
for (String s : strings) all = new StringBuilder(all).append(s).toString();
Each loops creates a new StringBuilder which is essentially a copy of all, appends a copy of s to it, and then copies the result of the concatenation to a new String. Obviously, using a single StringBuilder saves a lot of unnecessary allocations:
Collection<String> strings = ...;
StringBuilder sb = new StringBuilder();
for (String s : strings) sb.append(s);
String all = sb.toString();
As for x += y versus x = x + y, they compile to the same thing.
class Concat {
public String concat1(String a, String b) {
a += b;
return a;
}
public String concat2(String a, String b) {
a = a + b;
return a;
}
}
Compile it with javac, and then disassemble it with javap:
$ javap -c Concat
Compiled from "Concat.java"
class Concat extends java.lang.Object{
Concat();
Code:
0: aload_0
1: invokespecial #1; //Method java/lang/Object."<init>":()V
4: return
public java.lang.String concat1(java.lang.String, java.lang.String);
Code:
0: new #2; //class java/lang/StringBuilder
3: dup
4: invokespecial #3; //Method java/lang/StringBuilder."<init>":()V
7: aload_1
8: invokevirtual #4; //Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
11: aload_2
12: invokevirtual #4; //Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
15: invokevirtual #5; //Method java/lang/StringBuilder.toString:()Ljava/lang/String;
18: astore_1
19: aload_1
20: areturn
public java.lang.String concat2(java.lang.String, java.lang.String);
Code:
0: new #2; //class java/lang/StringBuilder
3: dup
4: invokespecial #3; //Method java/lang/StringBuilder."<init>":()V
7: aload_1
8: invokevirtual #4; //Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
11: aload_2
12: invokevirtual #4; //Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
15: invokevirtual #5; //Method java/lang/StringBuilder.toString:()Ljava/lang/String;
18: astore_1
19: aload_1
20: areturn
}
Personally, I'd favor += because with it, you make a clearer statement of intent - "I want to add the content of b to a". Any variations in performance between the two forms are with a 100% certainty the result of something outside of your code (e.g., GC pauses, random cache misses or something like it).
Bad compilers might also have a slightly easier time to optimize the += form (which is irrelevant to you, since even if javac would be crappy, HotSpot sure isn't).
Strings are immutable. Hence, concatenating 2 Strings causes a new String object to be created. Using StringBuilder will gain you a lot of performance.
However, primitive numeric types are not immutable and not heap allocated. Whatever you do with them tends to be really fast no matter what :)
String 'arithmetics' internal are rather complicated - just because they happen quite often and the compilers tries to optimize whereever it can.
First of all, append("something") is not a String method. It is a method from the StringBuffer or StringBuilder class.
The other expressions are quite equivalent. I'm pretty sure one can't look at the pattern and decide that one is faster in general.
In General and because Strings are immutable, a concatenation will create a new String for each String literal, a new String for each intermediate result and one new String for the result. So this
String s = "one" + "two" + "three";
will require five String objects ("one", "two", "three", "onetwo", "onwtwothree").
The first compiler optimization is 'interning' the literals. In short: "one", "two" and "three" are not created but reused.
A second compiler optimization is replacing the String addition by StringBuilder operation (if it makes sense). So the concatenation could be replaced by
StringBuilder sb = new StringBuilder("one");
sb.append("two");
sb.append("three");
String s = sb.toString();
(just to illustrate - a real compiler might not use the strategy for this example code, it might concatenate the the literals during compilation already...)

Categories