This code from effective java in Item 51: Beware the performance of string concatenation
public String statement() {
StringBuilder b = new StringBuilder(numItems() * LINE_WIDTH);
for (int i = 0; i < numItems(); i++)
b.append(lineForItem(i));
return b.toString();
}
can anyone explain to me what is LINE_WIDTH ? what its value ? in this case
Thank so much
I don't know the referenced code, however, from reading the code I would assume:
numItems() returns the number of items which are going to be placed into the string, and
LINE_WIDTH, is the approximate length (or possibly exact) of a line which will be appended to the string for each item.
The purpose of the code is to reserve enough space in advance of building the string, to prevent new space having to be reallocated during the process of building the string, thus saving in time during the process of building the string.
Related
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed last year.
Improve this question
I am trying to solve this problem and I worked out a solution but its too slow, this is my approach:
loop to add all strings in an array
loop to concatenate strings based on inputs
print out final concatenated string
Which works, but I need a faster approach. I am thinking of trying to determine the order of concatenation of each string, then just print strings based on the order. But I have no idea how to determine the order in the first place. Does anyone have any advice? Thank you!
[SOLVED] thank you guys for the help! I was able to pass the test cases.
Don’t implement this task literally.
The entire task is designed such that you end up with all N input strings being concatenated in one string, the order determined by the numbers read from System.in.
But the result is not the concatenated string. The result is the output you produce by printing this concatenated string. So you get the same result, i.e. the same output, when you just print the original strings (without separators or line breaks) in the correct order, as if they were concatenated.
Kattio io = new Kattio(System.in, System.out);
int numStrs = io.getInt();
final class StringNode {
final String s;
StringNode last = this, next;
StringNode(String s) {
this.s = s;
}
void append(StringNode s) {
last.next = s;
last = s.last;
}
}
StringNode[] array = new StringNode[numStrs];
for(int i = 0; i < numStrs; i++) array[i] = new StringNode(io.getWord());
int idx = 0;
for(int j = 0; j < numStrs - 1; j++) {
int a = io.getInt() - 1, b = io.getInt() - 1;
array[a].append(array[b]);
idx = a;
}
for(StringNode n = array[idx]; n != null; n = n.next) System.out.print(n.s);
System.out.println();
The main issue with performing the string concatenation literally, is the copying of the characters, potentially over and over again, depending on the given order. In the worst case, you get an order which let’s you copy again the same data you’ve just copied, for every step, ending up at O(N²) time complexity.
The approach above skips the entire task of creating new strings or copying characters, making each step as cheap as two variable assignments.
Note that even if you’re going back to implement the task literally, i.e. to produce a single String, you can do it by replacing the final printing loop by a loop which appends all strings to a single StringBuilder. This will still be a linear operation, as appending all strings in the already determined final order implies copying each string only once, instead of repeatedly.
But as long as the success is measured by the output written to System.out, you don’t need to construct a final String object.
Java's strings are immutable. This means that each concatenation results in a fully new copy where both a and b are copied.
You need to use Java's StringBuilder and convert the end result to a normal string. StringBuilders truely append one to the other without copying the first one. They work like a dynamic length array under the hood.
I was always told strings in java are immutable, unless your going to use the string builder class or string writter class.
Take a look at this practice question I found online,
Given a string and a non-negative int n, return a larger string that is n copies of the original string.
stringTimes("Hi", 2) → "HiHi"
stringTimes("Hi", 3) → "HiHiHi"
stringTimes("Hi", 1) → "Hi"
and the solution came out to be
Solution:
public String stringTimes(String str, int n) {
String result = "";
for (int i=0; i<n; i++) {
result = result + str; // could use += here
}
return result;
}
As you see in the solution our 3rd line assigns the string , then we change it in our for loop. This makes no sense to me! (I answered the question in another nooby way ) Once I saw this solution I knew I had to ask you guys.
Thoughts? I know im not that great at programming but I haven't seen this type of example here before, so I thought I'd share.
The trick to understanding what's going on is the line below:
result = result + str;
or its equivalent
result += str;
Java compiler performs a trick on this syntax - behind the scene, it generates the following code:
result = result.concat(str);
Variable result participates in this expression twice - once as the original string on which concat method is called, and once as the target of an assignment. The assignment does not mutate the original string, it replaces the entire String object with a new immutable one provided by concat.
Perhaps it would be easier to see if we introduce an additional String variable into the code:
String temp = result.concat(str);
result = temp;
Once the first line has executed, you have two String objects - temp, which is "HiHi", and result, which is still "Hi". When the second line is executed, result gets replaced with temp, acquiring a new value of "HiHi".
If you use Eclipse, you could make a breakpoint and run it step by step. You will find the id (find it in "Variables" View) of "result" changed every time after java did
result = result + str;
On the other hand, if you use StringBuffer like
StringBuffer result = new StringBuffer("");
for(int i = 0; i < n; i++){
result.append(str);
}
the id of result will not change.
String objects are indeed immutable. result is not a String, it is a reference to a String object. In each iteration, a new String object is created and assigned to the same reference. The old object with no reference is eventually destroyed by a garbage collector. For a simple example like this, it is a possible solution. However, creating a new String object in each iteration in a real-world application is not a smart idea.
Recently I've had an interview and I was asked a strange(at least for me) question:
I should write a method which would inverse a string.
public static String myReverse(String str){
...
}
The problem is that str is a very very huge object (2/3 of memory).
I suppose only one solution:
create a new String where I will store the result, then reverse 1/2 of the source String. After using reflection, clear the second half(already reversed) of the source string underlying array and then continue to reverse.
Am I right?
Any other solutions?
If you are using reflection anyway, you could access the underlying character array of the string and reverse it in place, by traversing from both ends and swapping the chars at each end.
public static String myReverse(String str){
char[] content;
//Fill content with reflection
for (int a = 0, b = content.length - 1; a < b; a++, b--) {
char temp = content[b];
content[b] = content[a];
content[a] = temp;
}
return str;
}
I unfortunately can't think of a way that doesn't use reflection.
A String is internally a 16-bit char array. If we know the character set to be ASCII, meaning each char maps to a single byte, we can encode the string to a 8-bit byte array at only 50% of the memory cost. This fully utilizes the available memory during the transition. Then we let go of the input string to reclaim 2/3 of the memory, reverse the byte array and reconstruct the string.
public static String myReverse(String str) {
byte[] bytes = str.getBytes("ASCII");
// memory at full capacity
str = null;
// memory at 1/3 capacity
for (int i = 0; i < bytes.length / 2; i++) {
byte tmp = bytes[i];
bytes[i] = bytes[bytes.length - i - 1];
bytes[bytes.length - i - 1] = tmp;
}
return new String(bytes, "ASCII");
}
This, of course, assumes you have a little extra memory available for temporary objects created by the encoding process, array headers, etc.
It's unlikely that you can do that without using tricks like reflection or assuming that the String is stored in an efficient way (for example knowing it to be only ASCII characters). The problems in your way is that in java Strings are immutable. The other is the likely implementation of garbage collection.
The problem with the likely implementation of garbage collection is that the memory is reclaimed after the object can no longer be accessed. This means that there would be a brief period where both the input and the output of a transformation would need to occupy memory.
For example one could try to reverse the string by successively build the result and cut down the original string:
rev = rev + orig.substring(0,1);
orig = orig.substring(1);
But this relies oth that the previos incarnation of rev or orig respectively is collected as the new incarnation of rev or orig is being created so that they never occupy up to 2/3 of the memory at the same time.
To be more general one would study such a process. During the process there would be a set of objects that evolve throughout the process, both the set it self and (some of) the objects. At the start the original string would be in the set and at the end the reversed string would be there. It's clear that due to information content the total size of the objects in the set can never be lower than the original. The crucial point here is that the original string have to be deleted at some point. Before that time at most 50% of the information may exist in the other objects. So we need a construct that would at the same time delete a String object as it retains more than half of the information therein.
Such a construct would need you basically to call a method to an object returning another object an in the process remove the object as the result is being constructed. It's unlikely that the implementation would work in that way.
Your approach seem to rely on that String are indeed mutable somehow, and then there would be no problem in just reversing the string in place without having to use a lot of memory. You don't need to copy out anything there, you can do the whole thing in place: swap the [j] and then [len-1-j] (for all j<(len-1)/2)
In the past, I've always used printf to format printing to the console but the assignment I currently have (creating an invoice report) wants us to use StringBuilder, but I have no idea how to do so without simply using " " for every gap needed. For example... I'm supposed to print this out
Invoice Customer Salesperson Subtotal Fees Taxes Discount Total
INV001 Company Eccleston, Chris $ 2357.60 $ 40.00 $ 190.19 $ -282.91 $ 2304.88
But I don't know how to get everything to line up using the StringBuilder. Any advice?
StringBuilder aims to reduce the overhead associated with creating strings.
As you may or may not know, strings are immutable. What this means that something like
String a = "foo";
String b = "bar";
String c = a + b;
String d = c + c;
creates a new string for each line. If all we are concerned about is the final string d, the line with string c is wasting space because it creates a new String object when we don't need it.
String builder simply delays actually building the String object until you call .toString(). At that point, it converts an internal char[] to an actual string.
Let's take another example.
String foo() {
StringBuilder sb = new StringBuilder();
for (int i = 0; i < 100; i++)
sb.append(i);
return sb.toString();
}
Here, we only create one string. StringBuilder will keep track of the chars you have added to your string in its internal char[] value. Note that value.length will generally be larger than the total chars you have added to your StringBuilder, but value might run out of room for what you're appending if the string you are building gets too big. When that happens, it'll resize, which just means replacing value with a larger char[], and copying over the old values to the new array, along with the chars of whatever you appended.
Finally, when you call sb.toString(), the StringBuilder will call a String constructor that takes an argument of a char[].
That means only one String object was created, and we only needed enough memory for our char[] and to resize it.
Compare with the following:
String foo() {
String toReturn = "";
for (int i = 0; i < 100; i++)
toReturn += "" + i;
toReturn;
}
Here, we have 101 string objects created (maybe more, I'm unsure). We only needed one though! This means that at every call, we're disposing the original string toReturn represented, and creating another string.
With a large string, especially, this is very expensive, because at every call you need to first acquire as much memory as the new string needs, and dispose of as much memory as the old string had. It's not a big deal when things are kept short, but when you're working with entire files this can easily become a problem.
In a nutshell: if you're working appending / removing information before finalizing an output: use a StringBuilder. If your strings are very short, I think it is OK to just concatenate normally for convenience, but this is up to you to define what "short" is.
I'm learning Java and am wondering what's the best way to modify strings here (both for performance and to learn the preferred method in Java). Assume you're looping through a string and checking each character/performing some action on that index in the string.
Do I use the StringBuilder class, or convert the string into a char array, make my modifications, and then convert the char array back to a string?
Example for StringBuilder:
StringBuilder newString = new StringBuilder(oldString);
for (int i = 0; i < oldString.length() ; i++) {
newString.setCharAt(i, 'X');
}
Example for char array conversion:
char[] newStringArray = oldString.toCharArray();
for (int i = 0; i < oldString.length() ; i++) {
myNameChars[i] = 'X';
}
myString = String.valueOf(newStringArray);
What are the pros/cons to each different way?
I take it that StringBuilder is going to be more efficient since the converting to a char array makes copies of the array each time you update an index.
I say do whatever is most readable/maintainable until you you know that String "modification" is slowing you down. To me, this is the most readable:
Sting s = "foo";
s += "bar";
s += "baz";
If that's too slow, I'd use a StringBuilder. You may want to compare this to StringBuffer. If performance matters and synchronization does not, StringBuilder should be faster. If sychronization is needed, then you should use StringBuffer.
Also it's important to know that these strings are not being modified. In java, Strings are immutable.
This is all context specific. If you optimize this code and it doesn't make a noticeable difference (and this is usually the case), then you just thought longer than you had to and you probably made your code more difficult to understand. Optimize when you need to, not because you can. And before you do that, make sure the code you're optimizing is the cause of your performance issue.
What are the pros/cons to each different way. I take it that StringBuilder is going to be more efficient since the convering to a char array makes copies of the array each time you update an index.
As written, the code in your second example will create just two arrays: one when you call toCharArray(), and another when you call String.valueOf() (String stores data in a char[] array). The element manipulations you are performing should not trigger any object allocations. There are no copies being made of the array when you read or write an element.
If you are going to be doing any sort of String manipulation, the recommended practice is to use a StringBuilder. If you are writing very performance-sensitive code, and your transformation does not alter the length of the string, then it might be worthwhile to manipulate the array directly. But since you are learning Java as a new language, I am going to guess that you are not working in high frequency trading or any other environment where latency is critical. Therefore, you are probably better off using a StringBuilder.
If you are performing any transformations that might yield a string of a different length than the original, you should almost certainly use a StringBuilder; it will resize its internal buffer as necessary.
On a related note, if you are doing simple string concatenation (e.g, s = "a" + someObject + "c"), the compiler will actually transform those operations into a chain of StringBuilder.append() calls, so you are free to use whichever you find more aesthetically pleasing. I personally prefer the + operator. However, if you are building up a string across multiple statements, you should create a single StringBuilder.
For example:
public String toString() {
return "{field1 =" + this.field1 +
", field2 =" + this.field2 +
...
", field50 =" + this.field50 + "}";
}
Here, we have a single, long expression involving many concatenations. You don't need to worry about hand-optimizing this, because the compiler will use a single StringBuilder and just call append() on it repeatedly.
String s = ...;
if (someCondition) {
s += someValue;
}
s += additionalValue;
return s;
Here, you'll end up with two StringBuilders being created under the covers, but unless this is an extremely hot code path in a latency-critical application, it's really not worth fretting about. Given similar code, but with many more separate concatenations, it might be worth optimizing. Same goes if you know the strings might be very large. But don't just guess--measure! Demonstrate that there's a performance problem before you try to fix it. (Note: this is just a general rule for "micro optimizations"; there's rarely a downside to explicitly using a StringBuilder. But don't assume it will make a measurable difference: if you're concerned about it, you should actually measure.)
String s = "";
for (final Object item : items) {
s += item + "\n";
}
Here, we're performing a separate concatenation operation on each loop iteration, which means a new StringBuilder will be allocated on each pass. In this case, it's probably worth using a single StringBuilder since you may not know how large the collection will be. I would consider this an exception to the "prove there's a performance problem before optimizing rule": if the operation has the potential to explode in complexity based on input, err on the side of caution.
Which option will perform the best is not an easy question.
I did a benchmark using Caliper:
RUNTIME (NS)
array 88
builder 126
builderTillEnd 76
concat 3435
Benchmarked methods:
public static String array(String input)
{
char[] result = input.toCharArray(); // COPYING
for (int i = 0; i < input.length(); i++)
{
result[i] = 'X';
}
return String.valueOf(result); // COPYING
}
public static String builder(String input)
{
StringBuilder result = new StringBuilder(input); // COPYING
for (int i = 0; i < input.length(); i++)
{
result.setCharAt(i, 'X');
}
return result.toString(); // COPYING
}
public static StringBuilder builderTillEnd(String input)
{
StringBuilder result = new StringBuilder(input); // COPYING
for (int i = 0; i < input.length(); i++)
{
result.setCharAt(i, 'X');
}
return result;
}
public static String concat(String input)
{
String result = "";
for (int i = 0; i < input.length(); i++)
{
result += 'X'; // terrible COPYING, COPYING, COPYING... same as:
// result = new StringBuilder(result).append('X').toString();
}
return result;
}
Remarks
If we want to modify a String, we have to do at least 1 copy of that input String, because Strings in Java are immutable.
java.lang.StringBuilder extends java.lang.AbstractStringBuilder. StringBuilder.setCharAt() is inherited from AbstractStringBuilder and looks like this:
public void setCharAt(int index, char ch) {
if ((index < 0) || (index >= count))
throw new StringIndexOutOfBoundsException(index);
value[index] = ch;
}
AbstractStringBuilder internally uses the simplest char array: char value[]. So, result[i] = 'X' is very similar to result.setCharAt(i, 'X'), however the second will call a polymorphic method (which probably gets inlined by JVM) and check bounds in if, so it will be a bit slower.
Conclusions
If you can operate on StringBuilder until the end (you don't need String back) - do it. It's the preferred way and also the fastest. Simply the best.
If you want String in the end and this is the bottleneck of your program, then you might consider using char array. In benchmark char array was ~25% faster than StringBuilder. Be sure to properly measure execution time of your program before and after optimization, because there is no guarantee about this 25%.
Never concatenate Strings in the loop with + or +=, unless you really know what you do. Usally it's better to use explicit StringBuilder and append().
I'd prefer to use StringBuilder class where original string is modified.
For String manipulation, I like StringUtil class. You'll need to get Apache commons dependency to use it