Why is StringBuilder much faster than String?

Why is StringBuilder much faster than String? - java

Why is StringBuilder much faster than string concatenation using the + operator? Even though that the + operator internally is implemented using either StringBuffer or StringBuilder.
public void shortConcatenation(){
long startTime = System.currentTimeMillis();
while (System.currentTimeMillis() - startTime <= 1000){
character += "Y";
}
System.out.println("short: " + character.length());
}
//// using String builder
public void shortConcatenation2(){
long startTime = System.currentTimeMillis();
StringBuilder sb = new StringBuilder();
while (System.currentTimeMillis() - startTime <= 1000){
sb.append("Y");
}
System.out.println("string builder short: " + sb.length());
}
I know that there are a lot of similar questions posted here, but these don't really answer my question.

Do you understand how it works internally?
Every time you do stringA += stringB; a new string is created an assigned to stringA, so it will consume memory (a new string instance!) and time (copy the old string + new characters of the other string).
StringBuilder will use an array of characters internally and when you use the .append() method it will do several things:
check if there are any free space for the string to append
again some internal checks and run a System.arraycopy to copy the characters of the string in the array.
Personally, I think the allocation of a new string every time (creating a new instance of string, put the string, etc.) could be very expensive in terms of memory and speed (in while/for, etc. especially).
In your example, use a StringBuilder is better, but if you need (example) something simple like a .toString(),
public String toString() {
return StringA + " - " + StringB;
}
makes no differences (well, in this case it is better you avoid StringBuilder overhead which is useless here).

Strings in Java are immutable. This means that methods that operate on strings cannot ever change the value of a string. String concatenation using += works by allocating memory for an entirely new string that is the concatenation of the two previous ones, and replacing the reference with this new string. Each new concatenation requires the construction of an entirely new String object.
In contrast, the StringBuilder and StringBuffer classes are implemented as a mutable sequence of characters. This means that as you append new Strings or characters onto a StringBuilder, it simply updates its internal array to reflect the changes you've made. This means that new memory is only allocated when the string grows past the buffer already existing in a StringBuilder.

I can list a very nice example for understanding the same (I mean I felt it's a nice example). Check the code here taken from a LeetCode problem: https://leetcode.com/problems/remove-outermost-parentheses/
1: Using String
public String removeOuterParentheses(String S) {
String a = "";
int num = 0;
for(int i=0; i < S.length()-1; i++) {
if(S.charAt(i) == '(' && num++ > 0) {
a += "(";
}
if(S.charAt(i) == ')' && num-- > 1) {
a += ")";
}
}
return a;
}
And now, using StringBuilder.
public String removeOuterParentheses(String S) {
StringBuilder sb = new StringBuilder();
int a = 0;
for(char ch : S.toCharArray()) {
if(ch == '(' && a++ > 0) sb.append('(');
if(ch == ')' && a-- > 1) sb.append(')');
}
return sb.toString();
}
The performance of both varies by a huge margin. The first submission uses String while the latter one uses StringBuilder.
As explained above the theory is the same. String by property is immutable and synchronous,i.e. its state cannot be changed. The second, for example, is expensive owing to the creation of a new memory allocation whenever a concatenation function or "+" is used. It will consume a lot of heap and in return be slower. In comparison StringBuilder is mutable, it will only append and not create an overload on the memory consumed.

Related

StringBuilder - Append 2 char's or 1 String

When working with a StringBuilder, I often append 2 char values to a StringBuilder using StringBuilder#append(char) twice, rather than StringBuilder#append(String).
I.e.:
StringBuilder builder = new StringBuilder();
builder.append(' ').append('t'); // would append(" t") work better here?
return builder.toString();
I would like to know:
Which approach is better performance-wise
Which approach is more common and why
I have already read through Using character instead of String for single-character values in StringBuffer append but it does not answer my question.
That question pertains to whether appending a single character (append('c')) is better than a single-character string (append("c")). I already understand why appending a single character is better than a single-character string, but I do not know whether appending a two-character string (append("ab")) is better than twice appending each character (append('a').append('b')).

In my testing, both of them seemed to take about the same time, however appending a string might be slightly slower (maybe 10 or so nanoseconds)
However, appending a string is much more popular as it's easier to use/understand.

This is really interesting to figure out this one.
As we all know the array is fast and internally String is using Character Array for storing the values.
internally both the method called the super.append(XXX) method ofAbstractStringBuilderclass.
if you see the code of append in AbstractStringBuilder for String and CharSeq
public AbstractStringBuilder append(String str) {
if (str == null) str = "null";
int len = str.length();
ensureCapacityInternal(count + len);
str.getChars(0, len, value, count);
count += len;
return this;
}
public AbstractStringBuilder append(CharSequence s, int start, int end) {
if (s == null)
s = "null";
if ((start < 0) || (start > end) || (end > s.length()))
throw new IndexOutOfBoundsException(
"start " + start + ", end " + end + ", s.length() "
+ s.length());
int len = end - start;
ensureCapacityInternal(count + len);
for (int i = start, j = count; i < end; i++, j++)
value[j] = s.charAt(i);
count += len;
return this;
}
These are the method internally called when you call append method.
Both the method calls the ensureCapacityInternal method to expand the array. So let's leave this method call as it is.
Now, the main difference comes in the next line of code.
The method with String args calls the getChars method. which internally call the System.arraycopy method, it's a native method and we can't predict the complexity of that method. it's based on the OS/JVM.
CharSeq method uses a for loop till the length of input charSequence.
for (int i = start, j = count; i < end; i++, j++)
value[j] = s.charAt(i);
i.e. it's completixity is depend on the length of input.
As I study, Other posts related to System.arraycopy method. They all say that it's effective than copying an array by a loop. even in an Effective Java Programing book.
So finally opinion, I would suggest that if the input is of small length then use the CharSequence only. Why waste the JVM for the small length String.
If you have long length string like a statement then go for a method with String args. Also, remember Space complexity increases in this case. i.e. String is immutable and you are creating more String every time in a pool. String.valueof(), (String)obj are examples.
Edited:
public AbstractStringBuilder append(char c) { ensureCapacityInternal(count + 1); value[count++] = c; return this; }
This method used when the args is char.
And seems that. It's more fast then others.
Because of assignment at count++ index of char. This method only contain the system.arraycopy method, which is common in all other method for ensuringcapacity.
Hope this will help. :)

Why does using arrays instead of string give less memory consumption and time executing?

Given the string in the form of char array. Modify it the way that all the exclamation point symbols '!' are shifted to the start of the array, and all ohters are in the same order. Please write a method with a single argument of type char[]. Focus on either memory and time consumption of alghoritm.
Feedback that i've received: it was possible to use working with arrays instead of strings. Where can i find info about memory?
public static String formatString(char[] chars) {
StringBuilder exclamationSymbols = new StringBuilder();
StringBuilder otherSymbols = new StringBuilder();
for (char c : chars) {
if (c == '!') {
exclamationSymbols.append(c);
} else {
otherSymbols.append(c);
}
}
return (exclamationSymbols.toString() + otherSymbols.toString());
}

You can do this faster using a char[] than a StringBuilder because:
a StringBuilder is just a wrapper around a char[], so there's no way it can be faster. The indirection means it will be slower.
you know exactly how long the result will be, so you can allocate the minimum-sized char[] that you'll need. With a StringBuilder, you can pre-size it, but with two StringBuilders you can't exactly, so you either have to over-allocate the length (e.g. make both the same length as chars) or rely on StringBuilder resizing itself internally (which will be slower than not; and it uses moer memory).
My idea would be to use two integer pointers to point to the next position that you'll write a char to in the string: one starts at the start of the array, the other starts at the end; as you work your way through the input, the two pointers will move closer together.
Once you've processed the entire input, the portion of the result array corresponding to the "end pointer" will be backwards, so reverse it.
You can do it like this:
char[] newChars = new char[chars.length];
int left = 0;
int right = chars.length;
for (char c : chars) {
if (c == '!') {
newChars[left++] = c;
} else {
newChars[--right] = c;
}
}
// Reverse the "otherSymbols".
for (int i = right, j = newChars.length - 1; i < j; ++i, --j) {
char tmp = newChars[i];
newChars[i] = newChars[j];
newChars[j] = tmp;
}
return new String(newChars);
Ideone demo

Memory efficient method String suffix

I have a task which is to create a memory efficient method that takes a String consisting of numbers and removes any beginning zeros.
For instance "001112" becomes "1112".
public static String hej (String v)
{
StringBuilder h = new StringBuilder(v);
while(true)
{
if (h.charAt(0) == '0')
h.deleteCharAt(0);
else
break;
}
return h.toString();
}
This is my solution. Of course it does the work, but my question is, is it memory efficient to use the StringBuilder or is it more efficient to use the String itself and for instance v.substring()? Cant find too much information about what is more efficient. If anyone has links to some documentation, please share them.
Cheers

Using the String.substring(int) method will be the least memory used
public static String hej(String input)
{
int i;
for(i = 0; i < input.length(); i++)
if(input.charAt(i) != '0')
break;
return input.substring(i);
}
Source code from String:
public String substring(int beginIndex) {
if (beginIndex < 0) {
throw new StringIndexOutOfBoundsException(beginIndex);
}
int subLen = value.length - beginIndex;
if (subLen < 0) {
throw new StringIndexOutOfBoundsException(subLen);
}
return (beginIndex == 0) ? this : new String(value, beginIndex, subLen);
}
This calls the String(char[], int, int) constructor
public String(char value[], int offset, int count) {
if (offset < 0) {
throw new StringIndexOutOfBoundsException(offset);
}
if (count <= 0) {
if (count < 0) {
throw new StringIndexOutOfBoundsException(count);
}
if (offset <= value.length) {
this.value = "".value;
return;
}
}
// Note: offset or count might be near -1>>>1.
if (offset > value.length - count) {
throw new StringIndexOutOfBoundsException(offset + count);
}
this.value = Arrays.copyOfRange(value, offset, offset+count);
}
Using a StringBuilder uses a bit of memory to create the StringBuilder for the size of the input, while using String.substring(int) will just use up as much memory as needed to represent the modified input

If your string were to have n amount of leading zeros, then using String instead of StringBuilder would consume n times more memory. You know that String creates a new space in memory everytime some char changes in it so StringBuilder is the way to go.
Keep in mind
Every string builder has a capacity. As long as the length of the
character sequence contained in the string builder does not exceed the
capacity, it is not necessary to allocate a new internal buffer. If
the internal buffer overflows, it is automatically made larger.
Oracle Docs
So
String
String is immutable ( once created can not be changed )object . The
object created as a String is stored in the Constant String Pool .
Every immutable object in Java is thread safe ,that implies String is
also thread safe . String can not be used by two threads
simultaneously. String once assigned can not be changed.
String demo = " hello " ; // The above object is stored in constant
string pool and its value can not be modified.
demo="Bye" ; //new "Bye" string is created in constant pool and
referenced by the demo variable // "hello" string still
exists in string constant pool and its value is not overrided but we
lost reference to the "hello"string
StringBuffer
StringBuffer is mutable means one can change the value of the object .
The object created through StringBuffer is stored in the heap .
StringBuffer has the same methods as the StringBuilder , but each
method in StringBuffer is synchronized that is StringBuffer is thread
safe .
Due to this it does not allow two threads to simultaneously access
the same method . Each method can be accessed by one thread at a time
.
But being thread safe has disadvantages too as the performance of the
StringBuffer hits due to thread safe property . Thus StringBuilder is
faster than the StringBuffer when calling the same methods of each
class.
StringBuffer value can be changed , it means it can be assigned to the
new value . Nowadays its a most common interview question ,the
differences between the above classes . String Buffer can be converted
to the string by using toString() method.
StringBuffer demo1 = new StringBuffer("Hello") ; // The above object
stored in heap and its value can be changed . demo1=new
StringBuffer("Bye"); // Above statement is right as it modifies the
value which is allowed in the StringBuffer
Java Hungry

Best way to modify an existing string? StringBuilder or convert to char array and back to string?

I'm learning Java and am wondering what's the best way to modify strings here (both for performance and to learn the preferred method in Java). Assume you're looping through a string and checking each character/performing some action on that index in the string.
Do I use the StringBuilder class, or convert the string into a char array, make my modifications, and then convert the char array back to a string?
Example for StringBuilder:
StringBuilder newString = new StringBuilder(oldString);
for (int i = 0; i < oldString.length() ; i++) {
newString.setCharAt(i, 'X');
}
Example for char array conversion:
char[] newStringArray = oldString.toCharArray();
for (int i = 0; i < oldString.length() ; i++) {
myNameChars[i] = 'X';
}
myString = String.valueOf(newStringArray);
What are the pros/cons to each different way?
I take it that StringBuilder is going to be more efficient since the converting to a char array makes copies of the array each time you update an index.

I say do whatever is most readable/maintainable until you you know that String "modification" is slowing you down. To me, this is the most readable:
Sting s = "foo";
s += "bar";
s += "baz";
If that's too slow, I'd use a StringBuilder. You may want to compare this to StringBuffer. If performance matters and synchronization does not, StringBuilder should be faster. If sychronization is needed, then you should use StringBuffer.
Also it's important to know that these strings are not being modified. In java, Strings are immutable.
This is all context specific. If you optimize this code and it doesn't make a noticeable difference (and this is usually the case), then you just thought longer than you had to and you probably made your code more difficult to understand. Optimize when you need to, not because you can. And before you do that, make sure the code you're optimizing is the cause of your performance issue.

What are the pros/cons to each different way. I take it that StringBuilder is going to be more efficient since the convering to a char array makes copies of the array each time you update an index.
As written, the code in your second example will create just two arrays: one when you call toCharArray(), and another when you call String.valueOf() (String stores data in a char[] array). The element manipulations you are performing should not trigger any object allocations. There are no copies being made of the array when you read or write an element.
If you are going to be doing any sort of String manipulation, the recommended practice is to use a StringBuilder. If you are writing very performance-sensitive code, and your transformation does not alter the length of the string, then it might be worthwhile to manipulate the array directly. But since you are learning Java as a new language, I am going to guess that you are not working in high frequency trading or any other environment where latency is critical. Therefore, you are probably better off using a StringBuilder.
If you are performing any transformations that might yield a string of a different length than the original, you should almost certainly use a StringBuilder; it will resize its internal buffer as necessary.
On a related note, if you are doing simple string concatenation (e.g, s = "a" + someObject + "c"), the compiler will actually transform those operations into a chain of StringBuilder.append() calls, so you are free to use whichever you find more aesthetically pleasing. I personally prefer the + operator. However, if you are building up a string across multiple statements, you should create a single StringBuilder.
For example:
public String toString() {
return "{field1 =" + this.field1 +
", field2 =" + this.field2 +
...
", field50 =" + this.field50 + "}";
}
Here, we have a single, long expression involving many concatenations. You don't need to worry about hand-optimizing this, because the compiler will use a single StringBuilder and just call append() on it repeatedly.
String s = ...;
if (someCondition) {
s += someValue;
}
s += additionalValue;
return s;
Here, you'll end up with two StringBuilders being created under the covers, but unless this is an extremely hot code path in a latency-critical application, it's really not worth fretting about. Given similar code, but with many more separate concatenations, it might be worth optimizing. Same goes if you know the strings might be very large. But don't just guess--measure! Demonstrate that there's a performance problem before you try to fix it. (Note: this is just a general rule for "micro optimizations"; there's rarely a downside to explicitly using a StringBuilder. But don't assume it will make a measurable difference: if you're concerned about it, you should actually measure.)
String s = "";
for (final Object item : items) {
s += item + "\n";
}
Here, we're performing a separate concatenation operation on each loop iteration, which means a new StringBuilder will be allocated on each pass. In this case, it's probably worth using a single StringBuilder since you may not know how large the collection will be. I would consider this an exception to the "prove there's a performance problem before optimizing rule": if the operation has the potential to explode in complexity based on input, err on the side of caution.

Which option will perform the best is not an easy question.
I did a benchmark using Caliper:
RUNTIME (NS)
array 88
builder 126
builderTillEnd 76
concat 3435
Benchmarked methods:
public static String array(String input)
{
char[] result = input.toCharArray(); // COPYING
for (int i = 0; i < input.length(); i++)
{
result[i] = 'X';
}
return String.valueOf(result); // COPYING
}
public static String builder(String input)
{
StringBuilder result = new StringBuilder(input); // COPYING
for (int i = 0; i < input.length(); i++)
{
result.setCharAt(i, 'X');
}
return result.toString(); // COPYING
}
public static StringBuilder builderTillEnd(String input)
{
StringBuilder result = new StringBuilder(input); // COPYING
for (int i = 0; i < input.length(); i++)
{
result.setCharAt(i, 'X');
}
return result;
}
public static String concat(String input)
{
String result = "";
for (int i = 0; i < input.length(); i++)
{
result += 'X'; // terrible COPYING, COPYING, COPYING... same as:
// result = new StringBuilder(result).append('X').toString();
}
return result;
}
Remarks
If we want to modify a String, we have to do at least 1 copy of that input String, because Strings in Java are immutable.
java.lang.StringBuilder extends java.lang.AbstractStringBuilder. StringBuilder.setCharAt() is inherited from AbstractStringBuilder and looks like this:
public void setCharAt(int index, char ch) {
if ((index < 0) || (index >= count))
throw new StringIndexOutOfBoundsException(index);
value[index] = ch;
}
AbstractStringBuilder internally uses the simplest char array: char value[]. So, result[i] = 'X' is very similar to result.setCharAt(i, 'X'), however the second will call a polymorphic method (which probably gets inlined by JVM) and check bounds in if, so it will be a bit slower.
Conclusions
If you can operate on StringBuilder until the end (you don't need String back) - do it. It's the preferred way and also the fastest. Simply the best.
If you want String in the end and this is the bottleneck of your program, then you might consider using char array. In benchmark char array was ~25% faster than StringBuilder. Be sure to properly measure execution time of your program before and after optimization, because there is no guarantee about this 25%.
Never concatenate Strings in the loop with + or +=, unless you really know what you do. Usally it's better to use explicit StringBuilder and append().

I'd prefer to use StringBuilder class where original string is modified.
For String manipulation, I like StringUtil class. You'll need to get Apache commons dependency to use it

Most efficient way to fill a String with a specified length with a specified character?

Basically given an int, I need to generate a String with the same length containing only the specified character. Related question here, but it relates to C# and it does matter what's in the String.
This question, and my answer to it are why I am asking this one. I'm not sure what's the best way to go about it performance wise.
Example
Method signature:
String getPattern(int length, char character);
Usage:
//returns "zzzzzz"
getPattern(6, 'z');
What I've tried
String getPattern(int length, char character) {
String result = "";
for (int i = 0; i < length; i++) {
result += character;
}
return result;
}
Is this the best that I can do performance-wise?

You should use StringBuilder instead of concatenating chars this way. Use StringBuilder.append().
StringBuilder will give you better performance. The problem with concatenation the way you are doing is each time a new String (string is immutable) is created then the old string is copied, the new string is appended, and the old String is thrown away. It's a lot of extra work that over a period of type (like in a big for loop) will cause performance degradation.

StringUtils from commons-lang or Strings from guava are your friends. As already stated avoid String concatenations.
StringUtils.repeat("a", 3) // => "aaa"
Strings.repeat("hey", 3) // => "heyheyhey"

Use primitive char arrays & some standard util classes like Arrays
public class Test {
static String getPattern(int length, char character) {
char[] cArray = new char[length];
Arrays.fill(cArray, character);
// return Arrays.toString(cArray);
return new String(cArray);
}
static String buildPattern(int length, char character) {
StringBuilder sb= new StringBuilder(length);
for (int i = 0; i < length; i++) {
sb.append(character);
}
return sb.toString();
}
public static void main(String args[]){
long time = System.currentTimeMillis();
getPattern(10000000,'c');
time = System.currentTimeMillis() - time;
System.out.println(time); //prints 93
time = System.currentTimeMillis();
buildPattern(10000000,'c');
time = System.currentTimeMillis() - time;
System.out.println(time); //prints 188
}
}
EDIT Arrays.toString() gave lower performance since it eventually used a StringBuilder, but the new String did the magic.

Yikes, no.
A String is immutable in java; you can't change it. When you say:
result += character;
You're creating a new String every time.
You want to use a StringBuilder and append to it, then return a String with its toString() method.

I think it would be more efficient to do it like following,
String getPattern(int length, char character)
{
char[] list = new char[length];
for(int i =0;i<length;i++)
{
list[i] = character;
}
return new string(list);
}

Concatenating a String is never the most efficient, since String is immutable, for better performance you should use StringBuilder, and append()
String getPattern(int length, char character) {
StringBuilder sb= new StringBuilder(length)
for (int i = 0; i < length; i++) {
sb.append(character);
}
return sb.toString();
}

Performance-wise, I think you'd have better results creating a small String and concatenating (using StringBuilder of course) until you reach the request size: concatenating/appending "zzz" to "zzz" performs probably betters than concatenating 'z' three times (well, maybe not for such small numbers, but when you reach 100 or so chars, doing ten concatenations of 'z' followed by ten concatenations of "zzzzzzzzzz" is probably better than 100 concatenatinos of 'z').
Also, because you ask about GWT, results will vary a lot between DevMode (pure Java) and "production mode" (running in JS in the browser), and is likely to vary depending on the browser.
The only way to really know is to benchmark, everything else is pure speculation.
And possibly use deferred binding to use the most performing variant in each browser (that's exactly how StringBuilder is emulated in GWT).

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.