space complexity string builder - java

In the example below, if I take in a String s, would the space complexity be O(n) or O(1)? and if I were to append only vowels, would it still be O(n)?
String s = "dfgdfgdfga";
StringBuilder sb = new StringBuilder();
for (int i = 0;i <s.length(); i++) {
sb.append(s.charAt(i));
}
return sb.toString();

Space complexity boils down to: how much "memory" will you need to store things?
Your code intends to basically copy all contents of String s into StringBuilder sb. Again: copy all chars from a to sb.
Of course that boils down to O(n), with n representing the fact that you need more memory when you copy more characters. If you start making selections, you still have O(n).
O(1) means: constant requirements. Which is simply not possible (space wise) when talking about making a copy.

Each time you append the StringBuilder it checks if the builder array is filled, if required copies the contents of original array to new array of increased size. So the space requirement increases linearly with the length of String. Hence the space complexity is O(n).
It does not matter if the letters are vowels, because the String Builder stores characters and not references to characters.
Although you might be interested in creating an implementation of String Builder that stores references to Character objects, but then it is more expensive to store references than the chars and also StringBuilder is final.

Related

What is the time complexity of this code execution?

I have to print out the numbers of occurences of characters inside a string . I have used something like:
String str="This is sample string";
HashSet<Character> hc= new HashSet<Character>();
for (int i = 0; i < str.length(); i++) {
if(!Character.isSpaceChar(str.charAt(i)) && hc.add( str.charAt(i)) ) {
int countMatches = StringUtils.countMatches(str, str.charAt(i));
System.out.println(str.charAt(i)+" occurs at "+countMatches +" times");
}
}
It is a kind of solution, but how do I analyze the time complexity? I am beginner so please guide me through the learning process.
First of all, if you are looking for a decent introduction to complexity analysis, the following one looks pretty good:
A Gentle Introduction to Algorithm Complexity Analysis by Dionysis Zindros.
I recommend that you read it all, carefully, and take the time to do the exercises embedded in the page.
The complexity of your code is not trivial.
On the face of it, the loop will execute N times, where N is the length of the input string. But then if we look at what the loop does, it can do one of three things:
if the character is a space, nothing else is done
if the character is not a space, it is added (or re-added) to the hashmap
if the character was added, countMatches is called.
The complexity of doing nothing is O(1).
The complexity of adding an entry to the map is O(1).
The complexity of calling countMatches is O(N), because it is looking at every character of the string.
Now, if we think about what the code is doing, we can easily identify the best and worst cases.
The best case occurs when all N characters of a string are a space. This gives O(N) repetitions of a O(1) loop body, giving a best-case complexity of O(N).
The worst case occurs when all N characters are different. This gives O(N) repetitions of an O(N) loop body, giving a worst-case complexity of O(N^2). (You would think ... but read on!)
What about the average case? That is difficult if we don't know more about the nature of the input strings.
If the characters are randomly chosen, the probability of repeated characters is small, and the probability of space characters is small.
If the character are alphabetic text, then the spaces are more frequent, and so are repetitions. Indeed, for English text characters are likely to be limited to upper and lowercase Latin letters (52) plus a handful of punctuation characters. So you might expect about 60 map entries for a long string and performance that converges rapidly to O(N).
Finally, even the worst-case is not really O(N^2). A String is a sequence of char values, and Java char values are restricted to the range 0 to 65535. So after 2^16 distinct characters, all characters must repeat, and thus even the worst-case goes to O(N) as N goes to infinity.
(I did mention that this was non-trivial? 😀 )
What you need to do here is reason how many steps have to be taken in relationship to the length of the String.
For every character in the String it has to call countMatches once. Every call of countMatches has to loop over every character of the String again to count them.
The other operations (determining the length of the String, adding to the HashSet, retrieving a character from a String by index, checking the whitespaceness, printing the answers) are assumed to be constant-time and do not matter.
The fact that some of the characters will be skipped (because they are whitespace or already in the HashSet) does not reduce the complexity for an unrestricted String. You can assume the worst case of all characters being different.
So that is O(n^2), where n is the length of the String.
You can improve it to O(n) by changing your HashSet to a HashMap of counters. Then you only need a single pass over the String instead of two nested passes.

Algorithm, Big O notation: Is this function O(n^2) ? or O(n)?

This is code from a algorithm book, "Data structures and Algorithms in Java, 6th Edition." by by Michael T. GoodRich, Roberto Tamassia, and Michael H. Goldwasser
public static String repeat1(char c, int n)
{
String answer = "";
for(int j=0; j < n; j++)
{
answer += c;
}
return answer;
}
According to the authors, the Big O notation of this algorithm is O(n^2) with reason:
"The command, answer += c, is shorthand for answer = (answer + c). This
command does not cause a new character to be added to the existing String
instance; instead it produces a new String with the desired sequence of
characters, and then it reassigns the variable, answer, to refer to that new
string. In terms of efficiency, the problem with this interpretation is that
the creation of a new string as a result of a concatenation, requires time
that is proportional to the length of the resulting string. The first time
through this loop, the result has length 1, the second time through the loop
the result has length 2, and so on, until we reach the final string of length
n."
However, I do not understand, how this code can have O(n^2) as its number of primitive operations just doubles each iteration regardless of the value of n(excluding j < n and j++).
The statement answer += c requires two primitive operations each iteration regardless of the value n, therefore I think the equation for this function supposed to be 4n + 3. (Each loop operates j
Or, is the sentence,"In terms of efficiency, the problem with this interpretation is that the creation of a new string as a result of a concatenation, requires time that is proportional to the length of the resulting string.," just simply saying that creating a new string as a result of a concatenation requires proportional time to its length regardless of the number of primitive operations used in the function? So the number of primitive operations does not have big effects on the running time of the function because the built-in code for concatenated String assignment operator's running time runs in O(n^2).
How can this function be O(n^2)?
Thank you for your support.
During every iteration of the loop, the statement answer += c; must copy each and every character already in the string answer to a new string.
E.g. n = 5, c = '5'
First loop: answer is an empty string, but it must still create a new string. There is one operation to append the first '5', and answer is now "5".
Second loop: answer will now point to a new string, with the first '5' copied to a new string with another '5' appended, to make "55". Not only is a new String created, one character '5' is copied from the previous string and another '5' is appended. Two characters are appended.
"n"th loop: answer will now point to a new string, with n - 1 '5' characters copied to a new string, and an additional '5' character appended, to make a string with n 5s in it.
The number of characters copied is 1 + 2 + ... + n = n(n + 1)/2. This is O(n2).
The efficient way to constructs strings like this in a loop in Java is to use a StringBuilder, using one object that is mutable and doesn't need to copy all the characters each time a character is appended in each loop. Using a StringBuilder has a cost of O(n).
Strings are immutable in Java. I believe this terrible code is O(n^2) for that reason and only that reason. It has to construct a new String on each iteration. I'm unsure if String concatenation is truly linearly proportional to the number of characters (it seems like it should be a constant time operation since Strings have a known length). However if you take the author's word for it then iterating n times with each iteration taking a time proportional to n, you get n^2. StringBuilder would give you O(n).
I mostly agree with it being O(n^2) in practice, but consider:
Java is SMART. In many cases it uses StringBuilder instead of string for concatenation under the covers. You can't just assume it's going to copy the underlying array every time (although it almost certainly will in this case).
Java gets SMARTER all the time. There is no reason it couldn't optimize that entire loop based on StringBuilder since it can analyze all your code and figure out that you don't use it as a string inside that loop.
Further optimizations can happen--Strings currently use an array AND an length AND a shared flag (And maybe a start location so that splits wouldn't require copying, I forget, but they changed that split implementation anyway)--so appending into an oversized array and then returning a new string with a reference to the same underlying array but a higher end without mutating the original string is altogether possible (by design, they do stuff like this already to a degree)...
So I think the real question is, is it a great idea to calculate O() based on a particular implementation of a language-level construct?
And although I can't say for sure what the answer to that is, I can say it would be a REALLY BAD idea to optimize on the assumption that it was O(n^2) unless you absolutely needed it--you could take away java's ability to speed up your code later by hand optimizing today.
ps. this is from experience. I had to optimize some java code that was the UI for a spectrum analyzer. I saw all sorts of String+ operations and figured I'd clean them all up with .append(). It saved NO time because Java already optimizes String+ operations that are not in a loop.
The complexity becomes O(n^2) because each time the string increase the length by one and to create it each time you need n complexity. Also, the outer loop is n in complexity. So the exact complexity will be (n * (n+1))/2 which is O(n^2)
For example,
For abcdefg
a // one length string object is created so complexity is 1
ab // similarly complexity is 2
abc // complexity 3 here
abcd // 4 now.
abcde // ans so on.
abcdef
abcedefg
Now, you see the total complexity is 1 + 2 + 3 + 4 + ... + n = (n * (n+1))/2. In big O notation it's O(n^2)
Consider the length of the string as "n" so every time we need to add the element at the end so iteration for the string is "n" and also we have the outer for loop so "n" for that, So as a result we get O(n^2).
That is because:
answer += c;
is a String concatenation. In java Strings are immutable.
It means concatenated string is created by creating a copy of original string and appending c to it. So a simple concatenation operation is O(n) for n sized String.
In first iteration, answer length is 0, in second iteration its 1, in third its 2 and so on.
So you're doing these operations every time i.e.
1 + 2 + 3 + ... + n = O(n^2)
For string manipulations StringBuilder is the preferred way i.e. it appends any character in O(1) time.

Difference between length() and capacity() methods in StringBuilder

I was trying to figure out when to use or why capacity() method is different from length() method of StringBuilder or StringBuffer classes.
I have searched on Stack Overflow and managed to come up with this answer, but I didn't understand its distinction with length() method. I have visited this website also but this helped me even less.
StringBuilder is for building up text. Internally, it uses an array of characters to hold the text you add to it. capacity is the size of the array. length is how much of that array is currently filled by text that should be used. So with:
StringBuilder sb = new StringBuilder(1000);
sb.append("testing");
capacity() is 1000 (there's room for 1000 characters before the internal array needs to be replaced with a larger one), and length() is 7 (there are seven meaningful characters in the array).
The capacity is important because if you try to add more text to the StringBuilder than it has capacity for, it has to allocate a new, larger buffer and copy the content to it, which has memory use and performance implications*. For instance, the default capacity of a StringBuilder is currently 16 characters (it isn't documented and could change), so:
StringBuilder sb = new StringBuilder();
sb.append("Singing:");
sb.append("I am the very model of a modern Major General");
...creates a StringBuilder with a char[16], copies "Singing:" into that array, and then has to create a new array and copy the contents to it before it can add the second string, because it doesn't have enough room to add the second string.
* (whether either matters depends on what the code is doing)
The length of the string is always less than or equal to the capacity of the builder. The length is the actual size of the string stored in the builder, and the capacity is the maximum size that it can currently fit.
The builder’s capacity is automatically increased if more characters are added to exceed its capacity. Internally, a string builder is an array of characters, so the builder’s capacity is the size of the array. If the builder’s capacity is exceeded, the array is replaced by a new array. The new array size is 2 * (the previous array size + 1).
Since you are new to Java, I would suggest you this tip also regarding StringBuilder's efficiency:
You can use newStringBuilder(initialCapacity) to create a StringBuilder with a specified initial capacity. By carefully choosing the initial capacity, you can make your program more efficient. If the capacity is always larger than the actual length of the builder, the JVM will never need to reallocate memory for the builder. On the other hand, if the capacity is too large, you will waste memory space. You can use the trimToSize() method to reduce the capacity to the actual size.
I tried to explain it the best terms I could so I hope it was helpful.

Difference between multiple System.out.print() and concatenation

Basically, I was wondering which approach is better practice,
for(int i = 0; i < 10000; i++){
System.out.print("blah");
}
System.out.println("");
or
String over_9000_blahs = "";
for(int i = 0; i < 10000; i++){
over_9000_blahs += "blah";
}
System.out.println(over_9000_blahs);
or is there an even better way that I'm not aware of?
Since you are only writing to the System.out the first approach is better BUT if performance are important to you use the method below (System.out.println is synchronized and using locking - can read more about it here and here ) .
If you want to use the "big string" later or improve performance, it's cleaner to use StringBuilder.
(see below) , anycase String + will translate to StringBuilder by the compiler (more details here)
StringBuilder stringBuilder = new StringBuilder();
for(int i = 0; i < 10000; i++){
stringBuilder.append("bla");
}
System.out.println(stringBuilder.toString());
You want to use StringBuilder if you're concatenating string in a (larger count) loop.
for(int i = 0; i < 10000; i++){
over_9000_blahs += "blah";
}
What this does is for each iteration:
Creates a new StringBuilder internally with internal char array large enough to accommodate the intermediate result (over_9000_blahs)
Copies the characters from over_9000_blahs into the internal array
Copies the characters from "blah"
Creates a new String, copying the characters from internal array again
So that is two copies of the increasingly long string per iteration - that means quadratic time complexity.
Since System.out.println() might be synchronized, there's a chance that calling it repeatedly will be slower than using StringBuilder (but my guess would be it won't be slower than concatenating the string in the loop using +=).
So the StringBuilder approach should be the best of the three.
By performance order:
StringBuilder - The fastest. Basically, it just adding the words into a array of characters. When capacity is not enough then it multiply it. Should occur no more than log(10000) times.
System.out.print - It has bad performance comparing to StringBuilder because we need to lock out 10000 times. In addition, print creates new char[writeBufferSize] 10000 times while in the StringBuilder option we do all that 1 time only!
Concatenating strings. Creating many (and later also big) objects, starting some 'i' the memory management will impact the performance badly.
EDIT:
To be more accurate, because the question was about the difference between option 2 and option 3 and it is very clear why Stringbuilder is fast.
We can say that every iteration in the second approach takes K time, because the code is the same and the length of the string is the same for every iteration. At the end of execution, the second option will take 10000*K time for 10000 iterations. We can't say the same about the third approach because the length of the string is always increasing for each iteration. So the time for allocating the objects and garbage collecting them increasing. What I'm trying to say is that the execution time does not increased linearly in the third option.
So it is possible that for low NumberOfIterations we won't see the difference between the two last approaches. But we know that starting a specific NumberOfIterations the second option is always better than the third one.
In this case, I'd say the first one is better. Java uses StringBuilders for string concatenations to increase performance, but since Java doesn't know you are repeatedly doing concatenations with a loop like in the second case, the first case would be better.
If you want only to sysout your values - the result is same.
Second option will create many strings in memory, which GC (Garbage Collector) will take care of. (But in newer versions of Java this problem don't occurs because concating will be transformed behined the scenes to StringBuilder solution below)
If you want use your string later, after sysout, you should check StringBuilder class and append method:
StringBuilder sb = new StringBuilder();
for(int i = 0; i < 10000; i++){
sb.append("blah");
}
System.out.println(sb);

Why do we need capacity in StringBuilder

As we know, there is a attribute in StringBuilder called capacity, it is always larger than the length of StringBuilder object. However, what is capacity used for? It will be expanded if the length is larger than the capacity. If it does something, can someone give an example?
You can use the initial capacity to save the need to re-size the StringBuilder while appending to it, which costs time.
If you know if advance how many characters would be appended to the StringBuilder and you specify that size when you create the StringBuilder, it will never have to be re-sized while it is being used.
If, on the other hand, you don't give an initial capacity, or give a too small intial capacity, each time that capacity is reached, the storage of the StringBuilder has to be increased, which involves copying the data stored in the original storage to the larger storage.
The string builder has to store the string that is being built somewhere. It does so in an array of characters. The capacity is the length of this array. Once the array would overflow, a new (longer) array is allocated and contents are transferred to it. This makes the capacity rise.
If you are not concerned about performance, simply ignore the capacity. The capacity might get interesting once you are about to construct huge strings and know their size upfront. Then you can request a string builder with a capacity being equal to the expected size (or slightly larger if you are not sure about the size).
Example when building a string with a content size of 1 million:
StringBuilder sb = new StringBuilder(1000000);
for(int i = 0; i < 1000000; i++){
sb.append("x");
}
Initializing the string builder with one million will make it faster in comparison to a default string builder which has to copy its array repeatedly.
StringBuilder is backed by an array of characters. The default capacity is 16 + the length of the String argument. If you append to the StringBuilder and the number of characters cannot be fit in the array, then the capacity will have to be changed which will take time. So, if you have some idea about the the number of characters that you might have, initialize the capacity.
The answer is: performance. As the other answers already say, StringBuilder uses an internal array of some original size (capacity). Every time the building up string gets to large for the array to hold it, StringBuilder has to allocate a new, larger array, copy the data from the previous array to the new one and delete the previous array.
If you know beforehand what size the resulting string might be and pass that information to the constructor, StringBuilder can create a large enough array right away and thus can avoid the allocating and copying.
While for small strings, the performance gain is negelectible, it make quite a difference if you build really large strings.

Categories