This question has been asked many times on StackOverflow but none of them were based on performance.
In Effective Java book it's given that
If String s = new String("stringette"); occurs in a loop or in a
frequently invoked method, millions of String instances can be created
needlessly.
The improved version is simply the following:
String s = "stringette"; This version uses a single String instance, rather than
creating a new one each time it is executed.
So, I tried both and found significant improvement in performance:
for (int j = 0; j < 1000; j++) {
String s = new String("hello World");
}
takes about 399 372 nanoseconds.
for (int j = 0; j < 1000; j++) {
String s = "hello World";
}
takes about 23 000 nanoseconds.
Why is there so much performance improvement? Is there any compiler optimization happening inside?
In the first case, a new object is being created in each iteration, in the second case, it's always the same object, being retrieved from the String constant pool.
In Java, when you do:
String bla = new String("xpto");
You force the creation of a new String object, this takes up some time and memory.
On the other hand, when you do:
String muchMuchFaster = "xpto"; //String literal!
The String will only be created the first time (a new object), and it'll be cached in the String constant pool, so every time you refer to it in it's literal form, you're getting the exact same object, which is amazingly fast.
Now you may ask... what if two different points in the code retrieve the same literal and change it, aren't there problems bound to happen?!
No, because Strings, in Java, as you may very well know, are immutable! So any operation that would mutate a String returns a new String, leaving any other references to the same literal happy on their way.
This is one of the advantages of immutable data structures, but that's another issue altogether, and I would write a couple of pages on the subject.
Edit
Just a clarification, the constant pool isn't exclusive to String types, you can read more about it here, or if you google for Java constant pool.
http://docs.oracle.com/javase/specs/jvms/se7/jvms7.pdf
Also, a little test you can do to drive the point home:
String a = new String("xpto");
String b = new String("xpto");
String c = "xpto";
String d = "xpto";
System.out.println(a == b);
System.out.println(a == c);
System.out.println(c == d);
With all this, you can probably figure out the results of these Sysouts:
false
false
true
Since c and d are the same object, the == comparison holds true.
The performance difference is in fact much greater: HotSpot has an easy time compiling the entire loop
for (int j = 0; j < 1000; j++)
{String s="hello World";}
out of existence so the runtime is a solid 0. This, however, happens only after the JIT compiler kicks in; that's what warmup is for, a mandatory procedure when microbenchmarking anything on the JVM.
This is the code I ran:
public static void timeLiteral() {
for (int j = 0; j < 1_000_000_000; j++)
{String s="hello World";}
}
public static void main(String... args) {
for (int i = 0; i < 10; i++) {
final long start = System.nanoTime();
timeLiteral();
System.out.println((System.nanoTime() - start) / 1000);
}
}
And this is a typical output:
1412
38
25
1
1
0
0
1
0
1
You can observe the JIT taking effect very soon.
Note that I don't iterate one thousand, but one billion times in the inner method.
as already have been answered the second retrieves the instance from the String pool (remember Strings are immutable).
Additionally you should check the intern() method which enables you to put new String() into a pool in case you do not know the constant value of the string in runtime: e.g:
String s = stringVar.intern();
or
new String(stringVar).intern();
I will add additional fact, you should know that additionally to the String object more info exist in the pool (the hashcode): this enables fast hashMap search by String in the relevant data Strtuctures (instead of recreating the hashcode each time)
The JVM maintains a pool of references to unique String objects that are literals. In your new String example you are wrapping the literals with an instance of each.
See http://www.precisejava.com/javaperf/j2se/StringAndStringBuffer.htm
Related
I am confused between two codes, why the second one I am going to give here is more efficient than the first one.
Both of the codes just reverse a String, but first code is slower than the other and I am not able to understand why.
The first code is:
String reverse1(String s) {
String answer = "";
for(int j = s.length() - 1; j >= 0; j--) {
answer += s.charAt(j);
}
return answer;
}
The second code is:
String reverse2(String s) {
char answer[] = new char[s.length()];
for(int j = s.length() - 1; j >= 0; j--) {
answer[s.length() - j - 1] = s.charAt(j);
}
return new String(answer);
}
And I'm not able to understand how the second code is more efficient than the first one, I'd appreciate any insight on this.
The first code declares
String answer;
Strings are immutable. Therefore, every append operation reallocates the entire string, copies it, then copies in the new character.
The second code declares
char answer[];
Arrays are mutable, so each iteration copies only a single character. The final string is created once, not once per iteration of the loop.
Your question is perhaps difficult to answer exactly, in part because the answer would depend on the actual implementation of the first version. This, in turn, would depend on what version of Java you are using, and what the compiler decided to do.
Assuming that the compiler keeps the first version verbatim as you wrote it, then yes, the first version might be more inefficient, because it would require allocating a new string for each step in the reversal process. The second version, on the contrary, just maintains a single array of characters.
However, if the compiler is smart enough to use a StringBuilder, then the answer changes. Consider the following first version:
String reverse1(String s) {
StringBuilder answer = new StringBuilder();
for (int j = s.length() - 1; j >= 0; j--)
answer.append(s.charAt(j));
return answer;
}
Under the hood, StringBuilder is implemented using a character array. So calling StringBuilder#append is somewhat similar to the second version, i.e. it just adds new characters to the end of the buffer.
So, if your first version executes using literal String, then it is more inefficient than the second version, but using StringBuilder it might be on par with the second version.
String is immutable. Whenever you do answer += s.charAt(j); it creates a new object. Try printing GC logs using -XX:+PrintGCDetails and see if the latency is caused by minor GC.
String object is immutable and every time you made an add operation you create another object, allocating space and so on, so it's quite inefficient when you need to concatenate many strings.
Your char array method fits your specific need well, but if you need more generic string concatenation support, you could consider StringBuilder
In this code you are creating a new String object in each loop iteration,because String is immutable class
String reverse1(String s) {
String answer = "";
for (int j = s.length() - 1; j >= 0; j--)
answer += s.charAt(j);
return answer;
}
In this code you have already allocated memory to char array,Your code will create only single String at last line, so it is more efficient
String reverse2(String s) {
char answer[] = new char[s.length()];
for (int j = s.length() - 1; j >= 0; j--)
answer[s.length() - j - 1] = s.charAt(j);
return new String(answer);
}
Why is the second code more efficient than the first one?
String is immuable, by answer += s.charAt(j); you are creating a new instance of String in each loop, which makes your code slow.
Instead of String, you are suggested to use StringBuilder in a single thread context, for both performance and readablity(might be a little slower than fix-sized char array but has a better readablity):
String reverse1(String s) {
StringBuilder answer = new StringBuilder("");
for (int j = s.length() - 1; j >= 0; j--)
answer.append(s.charAt(j));
return answer.toString();
}
The JVM treats strings as immutable. Hence, every time you append to the existing string, you are actually create a new string! This means that a new string object has to be created in heap for every loop iteration. Creating an object and maintaining its lifecycle has its overhead. Add to that the garbage collection of the discarded strings (the string created in the previous iteration won't have a reference to it in the next, and hence, it is collected by the JVM).
You should consider using a StringBuilder. I ran some tests and the time taken by the StringBuilder code is not much smaller than that of the fixed-length array.
There are some nuances to how the JVM treats strings. There are things like string interning that the JVM does so that it does not have to create a new object for multiple strings with the same content. You might want to look into that.
I was always told strings in java are immutable, unless your going to use the string builder class or string writter class.
Take a look at this practice question I found online,
Given a string and a non-negative int n, return a larger string that is n copies of the original string.
stringTimes("Hi", 2) → "HiHi"
stringTimes("Hi", 3) → "HiHiHi"
stringTimes("Hi", 1) → "Hi"
and the solution came out to be
Solution:
public String stringTimes(String str, int n) {
String result = "";
for (int i=0; i<n; i++) {
result = result + str; // could use += here
}
return result;
}
As you see in the solution our 3rd line assigns the string , then we change it in our for loop. This makes no sense to me! (I answered the question in another nooby way ) Once I saw this solution I knew I had to ask you guys.
Thoughts? I know im not that great at programming but I haven't seen this type of example here before, so I thought I'd share.
The trick to understanding what's going on is the line below:
result = result + str;
or its equivalent
result += str;
Java compiler performs a trick on this syntax - behind the scene, it generates the following code:
result = result.concat(str);
Variable result participates in this expression twice - once as the original string on which concat method is called, and once as the target of an assignment. The assignment does not mutate the original string, it replaces the entire String object with a new immutable one provided by concat.
Perhaps it would be easier to see if we introduce an additional String variable into the code:
String temp = result.concat(str);
result = temp;
Once the first line has executed, you have two String objects - temp, which is "HiHi", and result, which is still "Hi". When the second line is executed, result gets replaced with temp, acquiring a new value of "HiHi".
If you use Eclipse, you could make a breakpoint and run it step by step. You will find the id (find it in "Variables" View) of "result" changed every time after java did
result = result + str;
On the other hand, if you use StringBuffer like
StringBuffer result = new StringBuffer("");
for(int i = 0; i < n; i++){
result.append(str);
}
the id of result will not change.
String objects are indeed immutable. result is not a String, it is a reference to a String object. In each iteration, a new String object is created and assigned to the same reference. The old object with no reference is eventually destroyed by a garbage collector. For a simple example like this, it is a possible solution. However, creating a new String object in each iteration in a real-world application is not a smart idea.
I am try to execute below lines of code. My understanding was both both #1 and #2 should generate string in String pool and hence there should not be any difference in both the executions, but when I analysed the Heap dump, in case of intern() string were being generated in String pool(can be interpreted by limited number of string objects) but in case of #1 String are being generated on Heap(as large number of string objects are there in heap dump) and system is going out of memory faster than the previous case. Can somebody explain why it is so? I am using java 6 to execute below lines of code.
import java.util.LinkedList;
public class LotsOfStrings {
private static final LinkedList<String> LOTS_OF_STRINGS = new LinkedList<String>();
public static void main(String[] args) throws Exception {
int iteration = 0;
while (true) {
for (int i = 0; i < 100; i++) {
for (int j = 0; j < 1000; j++) {
String s= "String " + j;
LOTS_OF_STRINGS.add(s); // #1
//LOTS_OF_STRINGS.add(new String("String " + j).intern()); //#2
}
}
iteration++;
System.out.println("Survived Iteration: " + iteration);
Thread.sleep(100);
}
}
Heap dump object screenshot in case if intern
intern
Heap dump object screenshot in case of #1
string
If you create a String without interning it, it just goes to the heap. So there can be multiple copies of equal strings. If you intern the string, there will be only one string for each equality class.
Creating the string "String" + j multiple times for the same j is much more memory consuming without interning the strings.
Interning saves memory, but it also can slow down the program, because every string is held in some kind of HashSet and creating a string implies looking up if it already exists in that HashSet.
Note: Some strings are interned automatically, e.g. String literals in Source Code.
In the past, I've always used printf to format printing to the console but the assignment I currently have (creating an invoice report) wants us to use StringBuilder, but I have no idea how to do so without simply using " " for every gap needed. For example... I'm supposed to print this out
Invoice Customer Salesperson Subtotal Fees Taxes Discount Total
INV001 Company Eccleston, Chris $ 2357.60 $ 40.00 $ 190.19 $ -282.91 $ 2304.88
But I don't know how to get everything to line up using the StringBuilder. Any advice?
StringBuilder aims to reduce the overhead associated with creating strings.
As you may or may not know, strings are immutable. What this means that something like
String a = "foo";
String b = "bar";
String c = a + b;
String d = c + c;
creates a new string for each line. If all we are concerned about is the final string d, the line with string c is wasting space because it creates a new String object when we don't need it.
String builder simply delays actually building the String object until you call .toString(). At that point, it converts an internal char[] to an actual string.
Let's take another example.
String foo() {
StringBuilder sb = new StringBuilder();
for (int i = 0; i < 100; i++)
sb.append(i);
return sb.toString();
}
Here, we only create one string. StringBuilder will keep track of the chars you have added to your string in its internal char[] value. Note that value.length will generally be larger than the total chars you have added to your StringBuilder, but value might run out of room for what you're appending if the string you are building gets too big. When that happens, it'll resize, which just means replacing value with a larger char[], and copying over the old values to the new array, along with the chars of whatever you appended.
Finally, when you call sb.toString(), the StringBuilder will call a String constructor that takes an argument of a char[].
That means only one String object was created, and we only needed enough memory for our char[] and to resize it.
Compare with the following:
String foo() {
String toReturn = "";
for (int i = 0; i < 100; i++)
toReturn += "" + i;
toReturn;
}
Here, we have 101 string objects created (maybe more, I'm unsure). We only needed one though! This means that at every call, we're disposing the original string toReturn represented, and creating another string.
With a large string, especially, this is very expensive, because at every call you need to first acquire as much memory as the new string needs, and dispose of as much memory as the old string had. It's not a big deal when things are kept short, but when you're working with entire files this can easily become a problem.
In a nutshell: if you're working appending / removing information before finalizing an output: use a StringBuilder. If your strings are very short, I think it is OK to just concatenate normally for convenience, but this is up to you to define what "short" is.
I'm learning Java and am wondering what's the best way to modify strings here (both for performance and to learn the preferred method in Java). Assume you're looping through a string and checking each character/performing some action on that index in the string.
Do I use the StringBuilder class, or convert the string into a char array, make my modifications, and then convert the char array back to a string?
Example for StringBuilder:
StringBuilder newString = new StringBuilder(oldString);
for (int i = 0; i < oldString.length() ; i++) {
newString.setCharAt(i, 'X');
}
Example for char array conversion:
char[] newStringArray = oldString.toCharArray();
for (int i = 0; i < oldString.length() ; i++) {
myNameChars[i] = 'X';
}
myString = String.valueOf(newStringArray);
What are the pros/cons to each different way?
I take it that StringBuilder is going to be more efficient since the converting to a char array makes copies of the array each time you update an index.
I say do whatever is most readable/maintainable until you you know that String "modification" is slowing you down. To me, this is the most readable:
Sting s = "foo";
s += "bar";
s += "baz";
If that's too slow, I'd use a StringBuilder. You may want to compare this to StringBuffer. If performance matters and synchronization does not, StringBuilder should be faster. If sychronization is needed, then you should use StringBuffer.
Also it's important to know that these strings are not being modified. In java, Strings are immutable.
This is all context specific. If you optimize this code and it doesn't make a noticeable difference (and this is usually the case), then you just thought longer than you had to and you probably made your code more difficult to understand. Optimize when you need to, not because you can. And before you do that, make sure the code you're optimizing is the cause of your performance issue.
What are the pros/cons to each different way. I take it that StringBuilder is going to be more efficient since the convering to a char array makes copies of the array each time you update an index.
As written, the code in your second example will create just two arrays: one when you call toCharArray(), and another when you call String.valueOf() (String stores data in a char[] array). The element manipulations you are performing should not trigger any object allocations. There are no copies being made of the array when you read or write an element.
If you are going to be doing any sort of String manipulation, the recommended practice is to use a StringBuilder. If you are writing very performance-sensitive code, and your transformation does not alter the length of the string, then it might be worthwhile to manipulate the array directly. But since you are learning Java as a new language, I am going to guess that you are not working in high frequency trading or any other environment where latency is critical. Therefore, you are probably better off using a StringBuilder.
If you are performing any transformations that might yield a string of a different length than the original, you should almost certainly use a StringBuilder; it will resize its internal buffer as necessary.
On a related note, if you are doing simple string concatenation (e.g, s = "a" + someObject + "c"), the compiler will actually transform those operations into a chain of StringBuilder.append() calls, so you are free to use whichever you find more aesthetically pleasing. I personally prefer the + operator. However, if you are building up a string across multiple statements, you should create a single StringBuilder.
For example:
public String toString() {
return "{field1 =" + this.field1 +
", field2 =" + this.field2 +
...
", field50 =" + this.field50 + "}";
}
Here, we have a single, long expression involving many concatenations. You don't need to worry about hand-optimizing this, because the compiler will use a single StringBuilder and just call append() on it repeatedly.
String s = ...;
if (someCondition) {
s += someValue;
}
s += additionalValue;
return s;
Here, you'll end up with two StringBuilders being created under the covers, but unless this is an extremely hot code path in a latency-critical application, it's really not worth fretting about. Given similar code, but with many more separate concatenations, it might be worth optimizing. Same goes if you know the strings might be very large. But don't just guess--measure! Demonstrate that there's a performance problem before you try to fix it. (Note: this is just a general rule for "micro optimizations"; there's rarely a downside to explicitly using a StringBuilder. But don't assume it will make a measurable difference: if you're concerned about it, you should actually measure.)
String s = "";
for (final Object item : items) {
s += item + "\n";
}
Here, we're performing a separate concatenation operation on each loop iteration, which means a new StringBuilder will be allocated on each pass. In this case, it's probably worth using a single StringBuilder since you may not know how large the collection will be. I would consider this an exception to the "prove there's a performance problem before optimizing rule": if the operation has the potential to explode in complexity based on input, err on the side of caution.
Which option will perform the best is not an easy question.
I did a benchmark using Caliper:
RUNTIME (NS)
array 88
builder 126
builderTillEnd 76
concat 3435
Benchmarked methods:
public static String array(String input)
{
char[] result = input.toCharArray(); // COPYING
for (int i = 0; i < input.length(); i++)
{
result[i] = 'X';
}
return String.valueOf(result); // COPYING
}
public static String builder(String input)
{
StringBuilder result = new StringBuilder(input); // COPYING
for (int i = 0; i < input.length(); i++)
{
result.setCharAt(i, 'X');
}
return result.toString(); // COPYING
}
public static StringBuilder builderTillEnd(String input)
{
StringBuilder result = new StringBuilder(input); // COPYING
for (int i = 0; i < input.length(); i++)
{
result.setCharAt(i, 'X');
}
return result;
}
public static String concat(String input)
{
String result = "";
for (int i = 0; i < input.length(); i++)
{
result += 'X'; // terrible COPYING, COPYING, COPYING... same as:
// result = new StringBuilder(result).append('X').toString();
}
return result;
}
Remarks
If we want to modify a String, we have to do at least 1 copy of that input String, because Strings in Java are immutable.
java.lang.StringBuilder extends java.lang.AbstractStringBuilder. StringBuilder.setCharAt() is inherited from AbstractStringBuilder and looks like this:
public void setCharAt(int index, char ch) {
if ((index < 0) || (index >= count))
throw new StringIndexOutOfBoundsException(index);
value[index] = ch;
}
AbstractStringBuilder internally uses the simplest char array: char value[]. So, result[i] = 'X' is very similar to result.setCharAt(i, 'X'), however the second will call a polymorphic method (which probably gets inlined by JVM) and check bounds in if, so it will be a bit slower.
Conclusions
If you can operate on StringBuilder until the end (you don't need String back) - do it. It's the preferred way and also the fastest. Simply the best.
If you want String in the end and this is the bottleneck of your program, then you might consider using char array. In benchmark char array was ~25% faster than StringBuilder. Be sure to properly measure execution time of your program before and after optimization, because there is no guarantee about this 25%.
Never concatenate Strings in the loop with + or +=, unless you really know what you do. Usally it's better to use explicit StringBuilder and append().
I'd prefer to use StringBuilder class where original string is modified.
For String manipulation, I like StringUtil class. You'll need to get Apache commons dependency to use it