How to find all naive ("+" based) string concatenations in large Java codebase? - java

We have a huge code base and we suspect that there are quite a few "+" based string concats in the code that might benefit from the use of StringBuilder/StringBuffer. Is there an effective way or existing tools to search for these, especially in Eclipse?
A search by "+" isn't a good idea since there's a lot of math in the code, so this needs to be something that actually analyzes the code and types to figure out which additions involve strings.

I'm pretty sure FindBugs can detect these. If not, it's still extremely useful to have around.
Edit: It can indeed find concatenations in a loop, which is the only time it really makes a difference.

Just make sure you really understand where it's actually better to use StringBuilder. I'm not saying you don't know, but there are certainly plenty of people who would take code like this:
String foo = "Your age is: " + getAge();
and turn it into:
StringBuilder builder = new StringBuilder("Your age is: ");
builder.append(getAge());
String foo = builder.toString();
which is just a less readable version of the same thing. Often the naive solution is the best solution. Likewise some people worry about:
String x = "long line" +
"another long line";
when actually that concatenation is performed at compile-time.
As nsander's quite rightly said, find out if you've got a problem first...

Why not use a profiler to find the "naive" string concatenations that actually matter? Only switch over to the more verbose StringBuffer if you actually need it.

Chances are you will make your performance worse and your code less readable. The compiler already makes this optimization, and unless you are in a loop, it will generally do a better job. Furthermore, in JDK 8 they may come out with StringUberBuilder, and all your code which uses StringBuilder will run slower, while the "+" concatenated strings will benefit from the new class.
“We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%.” - Donald Knuth

IntelliJ can find these using "structural search". You search for "$a + $b" and set the characteristics of both $a and $b as type java.lang.String.
However, if you have IntelliJ, it likely has a built in inspection that will do a better job of finding what you want anyway.

I suggest using a profiler. This is really a performance question and if you can't make the code show up with reasonable test data there is unlikely to be any value in changing it.

Jon Skeet (as always) and the others have already said all that is needed but I would really like to emphasize that maybe you are hunting for a non existing performance improvement...
Take a look at this code:
public class StringBuilding {
public static void main(String args[]) {
String a = "The first part";
String b = "The second part";
String res = a+b;
System.gc(); // Inserted to make it easier to see "before" and "after" below
res = new StringBuilder().append(a).append(b).toString();
}
}
If you compile it and disassemble it with javap, this is what you get.
public static void main(java.lang.String[]);
Code:
0: ldc #2; //String The first part
2: astore_1
3: ldc #3; //String The second part
5: astore_2
6: new #4; //class java/lang/StringBuilder
9: dup
10: invokespecial #5; //Method java/lang/StringBuilder."<init>":()V
13: aload_1
14: invokevirtual #6; //Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
17: aload_2
18: invokevirtual #6; //Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
21: invokevirtual #7; //Method java/lang/StringBuilder.toString:()Ljava/lang/String;
24: astore_3
25: invokestatic #8; //Method java/lang/System.gc:()V
28: new #4; //class java/lang/StringBuilder
31: dup
32: invokespecial #5; //Method java/lang/StringBuilder."<init>":()V
35: aload_1
36: invokevirtual #6; //Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
39: aload_2
40: invokevirtual #6; //Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
43: invokevirtual #7; //Method java/lang/StringBuilder.toString:()Ljava/lang/String;
46: astore_3
47: return
As you can see, 6-21 are pretty much identical to 28-43. Not much of an optimization, right?
Edit: The loop issue is valid though...

Instead of searching for just a + search for "+ and +" those will find the vast majority probably. cases where you are concatenating multiple variables will be tougher.

If you have a huge code base you probably have lots of hotspots, which may or may not involve "+" concatenation. Just run your usual profiler, and fix the big ones, regardless of what kind of construct they are.
It would be an odd approach to fix just one class of (potential) bottleneck, rather than fixing the actual bottlenecks.

With PMD, you can write rules with XPath or using a Java syntax. It might be worth investigating whether it can match the string concatenation operator—it certainly seems within the purview of static analysis. This is such a vague idea, I'm going to make this "community wiki"; if anyone else wants to elaborate (or create their own answer along these lines), please do!

Forget it - your JVM most likely does it already - see the JLS, 15.18.1.2 Optimization of String Concatenation:
An implementation may choose to perform conversion and concatenation in one step to avoid creating and then discarding an intermediate String object. To increase the performance of repeated string concatenation, a Java compiler may use the StringBuffer class or a similar technique to reduce the number of intermediate String objects that are created by evaluation of an expression.

Related

Should if(a&&b) take more time than if(a) if(b)?

I used the following simple logic to answer a question like this one:
1: if(a) // 1 operation
2: if (b) // 1 operation
and
1: if(a && b) // 1, 1(&&), 1 => 3 operations.
So, 2 operations versus 3, but in the first example the compiler needs to call another instruction to be executed.
Is this logic true?.
Does it depend on the compiler?.
Does calling an empty instruction like only ; cost the compiler some noticable time?.
This also discuss the same problem but not considering this logic.
Please help us to clarify this issue.
There are two methods to answer such a question precisely:
1.) Look at the IL code (and/or) aassembly code produce and count the CPU cycles needed to execute this code (Hint: this is not for beginners)
2.) Build a small test programm which executes both variants a large number of time, use StopWatch() to create a uesful and readable timing output, run it several times.
3.) Speculate about what you think the optimization step of the compiler is able to do and what this software will do, argue with others for hours
I assumed the compiler would produce the same byte code for your two cases. So I tested this with two different source files:
public class Test1 {
public static void main(String[] args) {
if (args[0].equals("a"))
if (args[1].equals("b"))
System.out.println("Foo");
}
}
and...
public class Test2 {
public static void main(String[] args) {
if (args[0].equals("a") && args[1].equals("b"))
System.out.println("Foo");
}
}
Inspecting their byte code with javap -c Test1 etc., the results are identical:
public static void main(java.lang.String[]);
Code:
0: aload_0
1: iconst_0
2: aaload
3: ldc #2 // String a
5: invokevirtual #3 // Method java/lang/String.equals:(Ljava/lang/Object;)Z
8: ifeq 30
11: aload_0
12: iconst_1
13: aaload
14: ldc #4 // String b
16: invokevirtual #3 // Method java/lang/String.equals:(Ljava/lang/Object;)Z
19: ifeq 30
22: getstatic #5 // Field java/lang/System.out:Ljava/io/PrintStream;
25: ldc #6 // String Foo
27: invokevirtual #7 // Method java/io/PrintStream.println:(Ljava/lang/String;)V
30: return
Consequently, the performance would be identical. Although I welcome comments if anyone can think of an example where different byte code is produced.
My results are using Oracle's javac from Java 1.7. Results could be different with other compilers, although I suspect they won't be for this case.
There are 2 approaches to think about your question:
the Java language definition
language defines that the 2nd example will use short-circuit execution which means in the case the if statements will contain any code, the 2nd example may execute faster
the JVM optimization
the JVM runtime will remove the dead code blocks if it can proof they don't have any side-effects

What is the difference when concatenating a String as a variable with a Character vs concatenating with an other String?

When i see something (pseudo 1-liner) like this:
str1 + "a" + str2
Is it much worse (or better/equal) than the following (pseudo 1-liner)?
str1 + 'a' + str2
Update: Better example (by #QPaysTaxes) to reduce confusion regarding my original example.
What i tried:
Various stuff for the past 10 years programming Java but i never managed to realy see whats under the hood - e.g. i would assume the second is slightly "faster/better" because there is no String-Object(s) created for the slash-sign and/or the garbage collector of Java has to handle less.
I once prepared for the Java Certificates and might would have been able to argue better back in that time but it seems even thus its my daily business the "theory" about Java must be keept up to date as well... I know without any better explanation than my assumptation that indexOf('c') should be used rather than indexOf("C") and i wondered if the same counts for String-concatenation.
I also googled a bit but as my title might imply i am not quite good to describe what i am looking for without a example. I am sorry for this and the possibility this handicap just produced a duplicate.
What i will try:
Based on the accepted answer here String concatenation: concat() vs "+" operator i hope to be able to have a start to see whats under the hood and one day be able to argue/ answer such questions that profund.
Based on the accepted answer here I hope to be able to have a start to
see whats under the hood.
Let's have a look at the generated bytecode when concatenating a String with a Character:
String str1 = "a" + "test";
String str2 = 'a' + "test";
0: ldc #2 // String atest
2: astore_1
3: ldc #2 // String atest
5: astore_2
as you can see, there is no difference, the compiler will convert it to the same bytecode.
Now let's have a look at the generated bytecode when concatenating a Character to a String variable.
String str1 = "a" + str3; //str3 is a String
String str2 = 'a' + str3;
7: invokespecial #4 // Method java/lang/StringBuilder."<init>":()V
10: ldc #5 // String a
12: invokevirtual #6 // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
15: aload_1
16: invokevirtual #6 // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
19: invokevirtual #7 // Method java/lang/StringBuilder.toString:()Ljava/lang/String;
22: astore_2
23: new #3 // class java/lang/StringBuilder
26: dup
27: invokespecial #4 // Method java/lang/StringBuilder."<init>":()V
30: bipush 97
32: invokevirtual #8 // Method java/lang/StringBuilder.append:(C)Ljava/lang/StringBuilder;
35: aload_1
36: invokevirtual #6 // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
39: invokevirtual #7 // Method java/lang/StringBuilder.toString:()Ljava/lang/String;
As you can see, there is a little difference.
10: ldc #5 // String a
ldc push a constant #index from a constant pool (String, int or float) onto the stack.
Therefore, if you are concatenating directly with a variable, concatenating a Character will generate less bytecode, that is what is under the hood.
Now for the performance issue, this wont represent any signifiant performance difference as the JIT compiler optimize most of the temporary objects, unless you specified when running your program to disable the JIT compiler using -Djava.compiler=NONE.
I prefer to use "a" instead of 'a' to make sure the result is a String.
Consider this:
public static void main(String... args) {
String s = "foo";
int i = 1;
Object arg = s + '/' + i;
log(arg);
}
private static void log(Object... args) {
MessageFormat format = new MessageFormat("bar {0}");
String message = format.format(args);
System.out.println(message); // or write to a log or something
}
Assume you decide you don’t need s in the message anymore and change the third line in the main method to:
Object arg = '/' + i;
Then arg will contain just a number, because char + int does not concatenate, but add the values.
If you construct a filename you sure will use it afterwards. That in most cases involves access to a physical media which is magnitudes slower than anything you can do wrong with concatenating your Strings. So, do what is maintable and don't worry about performance in this particular case.
My advice when building filenames is to use the File class or Path that will automatically make sure to get path separators right.
EDIT: As you point out in your comment, your question is about the general case. Just look at the source. StringBuilder.append(String) ends up doing a System.arraycopy() in String.getChars() whilst StringBuilder.append(char) directly copies a single character. So in theory, StringBuilder.append(char) will be faster.
However, you'd have to benchmark this to see if it makes any difference in practice.
I'm not sure if either of the options is better in terms of performance, but I can think of another issue to consider, that would prefer the first snippet.
The compiler can better protect you against typos if you append primitives instead of the String representation of those primitives.
Consider:
String plus10 = "plus" + 10;
If you type by mistake
String plus10 = "plus" + 1O;
The compiler will give you an error.
If, on the other hand, you type
String plus10 = "plus" + "1O";
The compiler will have no problem with that.
The same goes for appending chars
String plus = "x" + '++' + "y";
will not compile while
String plus = "x" + "++" + "y";
will pass compilation.
Of course it would be better to use constants and not hard coded values (and to append to a StringBuilder instead of using String concatenation), but even for the constants I would prefer primitive types over Strings, as they give you one more level of protection against errors.
Looking at the source code often helps to understand what is happening.
String s = s1 + s2
Will execute:
String s = new StringBuilder(s1).append(s2).toString();
Now look into the source code for append(char) and append(string) of the class StringBuilder:
http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/lang/AbstractStringBuilder.java#AbstractStringBuilder.append%28char%29
http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/lang/AbstractStringBuilder.java#AbstractStringBuilder.append%28java.lang.String%29
You will see that append(string) performs more checks to see if the string is null or empty. However, you probably will not notice any difference.
There is no any significant difference in performance actually. An average it will take the same time to do string concatenation.
However, internally Java compiler replaces + operator with StringBuilder at compile time.
So when using + operator with char, compiler will convert it into a StringBuilder internally and use .append(char). The same will happen with a string, with a difference that it will use .append(String).
And as I mentioned above, there is no difference an average. Simple test will show that time difference is close to 0. So this is really matter of readability. And from readability perspective, if you are concentrating strings, it's better to keep the type same, and use String even for single characters, rather than char
This is what's under the hood: String str = s1 + "/"; essentially creates 2 new separate String objects (str and new String("/")).
This is no problem for small software, but think about it memory-wise if you were to create 2 String objects (keep in mind: objects reserve 1 entry in the Stack plus contents kept in the Heap) for n > 500.000 database entries.
Using single quotes, like String str = s1 + '/', will result in another process entirely. '/' stands for the numeric ASCii character representation value of whatever single character is written between the quotes. This operation has a constant (O(1)) runtime (think instant array access) and will naturally be faster than creating and referencing objects.
As lots of people have suggested already, using a StringBuilder object for String concatenation is much easier on memory than building strings with the + operator.

Does System.out.println() create a String If we concatenate?

I was asked this question today in an interview.
Can someone explain me the right answer ?
Here is the code.
String s1= "hellow";
String s2= "Hellow again";
System.out.println(s1+s2);
How many strings are created in the above code ?
I think it will be 3.Any suggestions?
The string literals "hellow" and "Hellow again" are in the string pool.
Now when we concatenate with s1 + s2, what really happens is the following:
new StringBuilder(s1).append(s2).toString()
which itself creates a new String (see for yourself). So it depends what you mean by "create"; if you're asking how many String objects exist at the very end of the snippet, then the answer is 3. But note that the string produced by s1+s2 is not retained and is therefore eligible to be garbage collected after it is printed.
This answer is essentially an extension of arshajii's answer
The answer all depends on what your interviewer(s) meant by "create", and technically also depends on what other code is present.
If you disassemble the bytecode generated by just that snippet, you get this:
public static void main(java.lang.String[]);
Code:
0: ldc #2 // String hellow
2: astore_1
3: ldc #3 // String Hellow again
5: astore_2
6: getstatic #4 // Field java/lang/System.out:Ljava/io/PrintStream;
9: new #5 // class java/lang/StringBuilder
12: dup
13: invokespecial #6 // Method java/lang/StringBuilder."<init>":()V
16: aload_1
17: invokevirtual #7 // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
20: aload_2
21: invokevirtual #7 // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
24: invokevirtual #8 // Method java/lang/StringBuilder.toString:()Ljava/lang/String;
27: invokevirtual #9 // Method java/io/PrintStream.println:(Ljava/lang/String;)V
30: return
Because "hellow" and "Hellow again" are string literals in the source code, they get placed into the constant pool at compile time, and so are present at program startup. As you can see, the strings "hellow" and "Hellow again" are simply loaded (ldc == load constant). They are not created by the above code snippet. The only String that's created is the one from the StringBuilder.
Now, if you declare the fields final, you get this:
public static void main(java.lang.String[]);
Code:
0: getstatic #2 // Field java/lang/System.out:Ljava/io/PrintStream;
3: ldc #3 // String hellowHellow again
5: invokevirtual #4 // Method java/io/PrintStream.println:(Ljava/lang/String;)V
8: return
Based on this, you can also argue that no String objects are created by the above code snippet, as the compiler can optimize this statement. This is probably not the answer that the interviewers were looking for, since it depends on final being present.
Strings are immutable so s1+s2 creates a new String instance. If you want to avoid this you should use a StringBuffer.
The operator hierarchy in Java states, that String concatenation is being evaluated before a method call. Therefore the expression in brackets first creates a new String and is then being written to the standard output.
My answer: 3
The answer is implementation dependent, and it also depends on the context ... and what you mean by creation.
The execution of those lines of code creates one String object for the concatenation, and possibly others within the println call1. These creations would happen each time your application executes those lines.
At least a further two String objects will be created when the code is loaded and String objects are created for the String literals. However, that is a once off ...
1 - In some class libraries, println(s) is implemented as print(s + newline). Hence the println call may create another String object.
And the right answer for an interview question is probably not "3" but some (but not necessarily all) of the discussion above. They'd definitely want you to know that + creates a string; probably want some awareness that the constants s1 and s2 are created/exist as compile-time constants; and may be suitably impressed if you knew about the effect of adding final. Such questions are often not about getting the right answer, but thinking about them in the right way.
3 is the answer.
But use a StringBuilder (as StringBuffer is useless thread safe version of the StringBuilder).
You should use the StringBuilder.append method, instead of the + operator.

String valueOf vs concatenation with empty string

I am working in Java code optimization. I'm unclear about the difference between String.valueOf or the +"" sign:
int intVar = 1;
String strVar = intVar + "";
String strVar = String.valueOf(intVar);
What is the difference between line 2 and 3?
public void foo(){
int intVar = 5;
String strVar = intVar+"";
}
This approach uses StringBuilder to create resultant String
public void foo();
Code:
0: iconst_5
1: istore_1
2: new #2; //class java/lang/StringBuilder
5: dup
6: invokespecial #3; //Method java/lang/StringBuilder."<init>":()V
9: iload_1
10: invokevirtual #4; //Method java/lang/StringBuilder.append:(I)Ljava/lan
g/StringBuilder;
13: ldc #5; //String
15: invokevirtual #6; //Method java/lang/StringBuilder.append:(Ljava/lang/
String;)Ljava/lang/StringBuilder;
18: invokevirtual #7; //Method java/lang/StringBuilder.toString:()Ljava/la
ng/String;
21: astore_2
22: return
public void bar(){
int intVar = 5;
String strVar = String.valueOf(intVar);
}
This approach invokes simply a static method of String to get the String version of int
public void bar();
Code:
0: iconst_5
1: istore_1
2: iload_1
3: invokestatic #8; //Method java/lang/String.valueOf:(I)Ljava/lang/Stri
ng;
6: astore_2
7: return
which in turn calls Integer.toString()
Ask yourself the purpose of the code. Is it to:
Concatenate an empty string with a value
Convert a value to a string
It sounds much more like the latter to me... which is why I'd use String.valueOf. Whenever you can make your code read in the same way as you'd describe what you want to achieve, that's a good thing.
Note that this works for all types, and will return "null" when passed a null reference rather than throwing a NullPointerException. If you're using a class (not an int as in this example) and you want it to throw an exception if it's null (e.g. because that represents a bug), call toString on the reference instead.
Using String.valueOf(int), or better, Integer.toString(int) is relatively more efficient for the machine. However, unless performance is critical (in which case I wouldn't suggest you use either) Then ""+ x is much more efficient use of your time. IMHO, this is usually more important. Sometimes massively more important.
In other words, ""+ wastes an object, but Integer.toString() creates several anyway. Either your time is more important or you want to avoid creating objects at all costs. You are highly unlikely to be in the position that creating several objects is fine, but creating one more is not.
I'd prefer valueOf(), because I think it's more readable and explicit.
Any concerns about performance are micro-optimizations that wouldn't be measurable. I wouldn't worry about them until I could take a measurement and see that they made a difference.
Well, if you look into the JRE source code, Integer.getChars(...) is the most vital method which actually does the conversion from integer to char[], but it's a package-private method.
So the question is how to get this method called with minimum overhead.
Following is an overview of the 3 approaches by tracing the calls to our target method, please look into the JRE source code to understand this better.
"" + intVar compiles to :
new StringBuilder() => StringBuilder.append(int) => Integer.getChars(...)
String.valueOf(intVar) => Integer.toString(intVar) => Integer.getChars(...)
Integer.toString(intVar) => Integer.getChars(...)
The first method unnecessarily creates one extra object i.e. the StringBuilder.
The second simply delegates to third method.
So you have the answer now.
PS: Various compile time and runtime optimizations come into play here. So actual performance benchmarks may say something else depending on different JVM implementations which we can't predict, so I generally prefer the approach which looks efficient by looking at the source code.
The first line is equivalent to
String strVal = String.valueOf(intVar) + "";
so that there is some extra (and pointless) work to do. Not sure if the compiler optimizes away concatenations with empty string literals. If it does not (and looking at #Jigar's answer it apparently does not), this will in turn become
String strVal = new StringBuilder().append(String.valueOf(intVar))
.append("").toString();
So you should really be using String.valueOf directly.
From the point of optimization , I will always prefer the String.valueOf() between the two. The first one is just a hack , trying to trick the conversion of the intVar into a String because the + operator.
Even though answers here are correct in general, there's one point that is not mentioned.
"" + intVar has better performance compared to String.valueOf() or Integer.toString(). So, if performance is critical, it's better to use empty string concatenation.
See this talk by Aleksey Shipilëv. Or these slides of the same talk (slide #24)
Concatenating Strings and other variables actually uses String.valueOf() (and StringBuilder) underneath, so the compiler will hopefully discard the empty String and produce the same bytecodes in both cases.
String strVar1 = intVar+"";
String strVar2 = String.valueOf(intVar);
strVar1 is equvalent to strVar2, but using int+emptyString ""
is not elegant way to do it.
using valueOf is more effective.

+ operator for String in Java [duplicate]

This question already has answers here:
How does the String class override the + operator?
(7 answers)
Closed 9 years ago.
I saw this question a few minutes ago, and decided to take a look in the java String class to check if there was some overloading for the + operator.
I couldn't find anything, but I know I can do this
String ab = "ab";
String cd = "cd";
String both = ab + cd; //both = "abcd"
Where's that implemented?
From the Fine Manual:
The Java language provides special support for the string concatenation operator ( + ), and for conversion of other objects to strings. String concatenation is implemented through the StringBuilder(or StringBuffer) class and its append method. String conversions are implemented through the method toString, defined by Object and inherited by all classes in Java. For additional information on string concatenation and conversion, see Gosling, Joy, and Steele, The Java Language Specification.
See String Concatenation in the JLS.
The compiler treats your code as if you had written something like:
String both = new StringBuilder().append(ab).append(cd).toString();
Edit: Any reference? Well, if I compile and decompile the OP's code, I get this:
0: ldc #2; //String ab
2: astore_1
3: ldc #3; //String cd
5: astore_2
6: new #4; //class java/lang/StringBuilder
9: dup
10: invokespecial #5; //Method java/lang/StringBuilder."<init>":()V
13: aload_1
14: invokevirtual #6; //Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
17: aload_2
18: invokevirtual #6; //Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
21: invokevirtual #7; //Method java/lang/StringBuilder.toString:()Ljava/lang/String;
24: astore_3
25: return
So, it's like I said.
Most of the answers here are correct (it's handled by the compiler, + is converted to .append()...)
I wanted to add that everyone should take a look at the source code for String and append at some point, it's pretty impressive.
I believe it came down to something like:
"a"+"b"+"c"
=
new StringBuilder().append("a").append("b").append("c")
But then some magic happens. This turns into:
Create a string array of length 3
copy a into the first position.
copy b into the second
copy c into the third
Whereas most people believe that it will create a 2 character array with "ab", and then throw it away when it creates a three character array with "abc". It actually understands that it's being chained and does some manipulation outside what you would assume if these were simple library calls.
There is also a trick where if you have the string "abc" and you ask for a substring that turns out to be "bc", they CAN share the exact same underlying array. You'll notice that there is a start position, end position and "shared" flag.
In fact, if it's not shared, it's possible for it to extend the length of a string array and copy the new characters in when appending.
Now I'm just being confusing. Read the source code--it's fairly cool.
Very Late Edit:
The part about sharing the underlying array isn't quite true any more. They had to de-optimize String a little because people were downloading giant strings, taking a tiny sub-string and keeping it. This was holding the entire underlying array in storage, it couldn't be GC'd until all sub-references were dropped.
It is handled by the compiler.
This is special behavior documented in the language specification.
15.18.1 String Concatenation Operator +
If only one operand expression is of
type String, then string conversion is
performed on the other operand to
produce a string at run time. The
result is a reference to a String
object (newly created, unless the
expression is a compile-time constant
expression (§15.28))that is the
concatenation of the two operand
strings. The characters of the
left-hand operand precede the
characters of the right-hand operand
in the newly created string. If an
operand of type String is null, then
the string "null" is used instead of
that operand.
It's done at the language level. The Java Language Specification is very specific about what string addition must do.
String is defined as a standard type just like int, double, float, etc. on compiler level. Essentially, all compilers have operator overloading. Operator overloading is not defined for Developers (unlike in C++).
Interestingly enough: This question was logged as a bug: http://bugs.sun.com/view_bug.do?bug_id=4905919

Categories