As we know, there is a attribute in StringBuilder called capacity, it is always larger than the length of StringBuilder object. However, what is capacity used for? It will be expanded if the length is larger than the capacity. If it does something, can someone give an example?
You can use the initial capacity to save the need to re-size the StringBuilder while appending to it, which costs time.
If you know if advance how many characters would be appended to the StringBuilder and you specify that size when you create the StringBuilder, it will never have to be re-sized while it is being used.
If, on the other hand, you don't give an initial capacity, or give a too small intial capacity, each time that capacity is reached, the storage of the StringBuilder has to be increased, which involves copying the data stored in the original storage to the larger storage.
The string builder has to store the string that is being built somewhere. It does so in an array of characters. The capacity is the length of this array. Once the array would overflow, a new (longer) array is allocated and contents are transferred to it. This makes the capacity rise.
If you are not concerned about performance, simply ignore the capacity. The capacity might get interesting once you are about to construct huge strings and know their size upfront. Then you can request a string builder with a capacity being equal to the expected size (or slightly larger if you are not sure about the size).
Example when building a string with a content size of 1 million:
StringBuilder sb = new StringBuilder(1000000);
for(int i = 0; i < 1000000; i++){
sb.append("x");
}
Initializing the string builder with one million will make it faster in comparison to a default string builder which has to copy its array repeatedly.
StringBuilder is backed by an array of characters. The default capacity is 16 + the length of the String argument. If you append to the StringBuilder and the number of characters cannot be fit in the array, then the capacity will have to be changed which will take time. So, if you have some idea about the the number of characters that you might have, initialize the capacity.
The answer is: performance. As the other answers already say, StringBuilder uses an internal array of some original size (capacity). Every time the building up string gets to large for the array to hold it, StringBuilder has to allocate a new, larger array, copy the data from the previous array to the new one and delete the previous array.
If you know beforehand what size the resulting string might be and pass that information to the constructor, StringBuilder can create a large enough array right away and thus can avoid the allocating and copying.
While for small strings, the performance gain is negelectible, it make quite a difference if you build really large strings.
Related
In the example below, if I take in a String s, would the space complexity be O(n) or O(1)? and if I were to append only vowels, would it still be O(n)?
String s = "dfgdfgdfga";
StringBuilder sb = new StringBuilder();
for (int i = 0;i <s.length(); i++) {
sb.append(s.charAt(i));
}
return sb.toString();
Space complexity boils down to: how much "memory" will you need to store things?
Your code intends to basically copy all contents of String s into StringBuilder sb. Again: copy all chars from a to sb.
Of course that boils down to O(n), with n representing the fact that you need more memory when you copy more characters. If you start making selections, you still have O(n).
O(1) means: constant requirements. Which is simply not possible (space wise) when talking about making a copy.
Each time you append the StringBuilder it checks if the builder array is filled, if required copies the contents of original array to new array of increased size. So the space requirement increases linearly with the length of String. Hence the space complexity is O(n).
It does not matter if the letters are vowels, because the String Builder stores characters and not references to characters.
Although you might be interested in creating an implementation of String Builder that stores references to Character objects, but then it is more expensive to store references than the chars and also StringBuilder is final.
I was trying to figure out when to use or why capacity() method is different from length() method of StringBuilder or StringBuffer classes.
I have searched on Stack Overflow and managed to come up with this answer, but I didn't understand its distinction with length() method. I have visited this website also but this helped me even less.
StringBuilder is for building up text. Internally, it uses an array of characters to hold the text you add to it. capacity is the size of the array. length is how much of that array is currently filled by text that should be used. So with:
StringBuilder sb = new StringBuilder(1000);
sb.append("testing");
capacity() is 1000 (there's room for 1000 characters before the internal array needs to be replaced with a larger one), and length() is 7 (there are seven meaningful characters in the array).
The capacity is important because if you try to add more text to the StringBuilder than it has capacity for, it has to allocate a new, larger buffer and copy the content to it, which has memory use and performance implications*. For instance, the default capacity of a StringBuilder is currently 16 characters (it isn't documented and could change), so:
StringBuilder sb = new StringBuilder();
sb.append("Singing:");
sb.append("I am the very model of a modern Major General");
...creates a StringBuilder with a char[16], copies "Singing:" into that array, and then has to create a new array and copy the contents to it before it can add the second string, because it doesn't have enough room to add the second string.
* (whether either matters depends on what the code is doing)
The length of the string is always less than or equal to the capacity of the builder. The length is the actual size of the string stored in the builder, and the capacity is the maximum size that it can currently fit.
The builder’s capacity is automatically increased if more characters are added to exceed its capacity. Internally, a string builder is an array of characters, so the builder’s capacity is the size of the array. If the builder’s capacity is exceeded, the array is replaced by a new array. The new array size is 2 * (the previous array size + 1).
Since you are new to Java, I would suggest you this tip also regarding StringBuilder's efficiency:
You can use newStringBuilder(initialCapacity) to create a StringBuilder with a specified initial capacity. By carefully choosing the initial capacity, you can make your program more efficient. If the capacity is always larger than the actual length of the builder, the JVM will never need to reallocate memory for the builder. On the other hand, if the capacity is too large, you will waste memory space. You can use the trimToSize() method to reduce the capacity to the actual size.
I tried to explain it the best terms I could so I hope it was helpful.
Reading in a lot of data from a file. There may be 100 different data objects with necessary headings, but there can be well over 300,000 values stored in each of these data objects. The values need to be stored in the same order that they are read in. This is the constructor for the data object:
public Data(String heading, ArrayList<Float> values) {
this.heading = heading;
this.values = values;
}
What would be the quickest way to store and retrieve these values sequentially in RAM?
Although in your comments you mention "quickness", without specifying what operation needs to be "quick", your main concern seems to be heap memory consumption.
Let's assume 100 groups of 300,000 numbers (you've used words like "may be" and "well over" but this will do as an example).
That's 30,000,000 numbers to store, plus 100 headings and some structural overhead for grouping.
A primitive Java float is 32 bits, that is 4 bytes. So at an absolute minimum, you're going to need 30,000,000 * 4 bytes == 120MB.
An array of primitives - float[30000000] - is just all the values concatenated into a contiguous chunk of memory, so will consume this theoretical minumum of 120MB -- plus a few bytes of once-per-array overhead that I won't go into detail about here.
A java Float wrapper object is 12 bytes. When you store an object (rather than a primitive) in an array, the reference itself is 4 bytes. So an array of Float - Float[30000000] will consume 30,000,000 * (12 + 4) == 480MB.
So, you can cut your memory use by more than half by using primitives rather than wrappers.
An ArrayList is quite a light wrapper around an array of Object and so has about the same memory costs. The once-per-list overheads are too small to have an impact compared to the elements, at these list sizes. But there are some caveats:
ArrayList can only store Objects, not primitives, so if you choose a List you're stuck with the 12-bytes-per-element overhead of Float.
There are some third-party libraries that provide lists of primitives - see: Create a List of primitive int?
The capacity of an ArrayList is dynamic, and to achieve this, if you grow the list to be bigger than its backing array, it will:
create a new array, 50% bigger than the old array
copy the contents of the old array into the new array (this sounds expensive, but hardware is very fast at doing this)
discard the old array
This means that if the backing array happens to have 30 million elements, and is full, ArrayList.add() will replace the array with one of 45 million elements, even if your List only needs 30,000,001.
You can avoid this if you know the needed capacity in advance, by providing the capacity in the constructor.
You can use ArrayList.trimToSize() to drop unneeded capacity and claw some memory back after you've filled the ArrayList.
If I was striving to use as little heap memory as possible, I would aim to store my lists of numbers as arrays of primitives:
class Data {
String header;
float[] values;
}
... and I would just put these into an ArrayList<Data>.
With this structure, you have O(1) access to arbitrary values, and you can use Arrays.binarySearch() (if the values are sorted) to find by value within a group.
If at all possible, I would find out the size of each group before reading the values, and initialise the array to the right size. If you can, make your input file format facilitate this:
while(line = readLine()) {
if(isHeader(line)) {
ParsedHeader header = new ParsedHeader(line);
currentArray = new float[header.size()];
arrayIndex = 0;
currentGroup = new Group(header.name(), currentArray);
groups.add(currentGroup);
} else if (isValue(line)) {
currentArray[arrayIndex++] = parseValue(line);
}
}
If you can't change the input format, consider making two passes through the file - once to discover group lengths, once again to fill your arrays.
If you have to consume the file in one pass, and the file format can't provide group lengths before groups, then you'll have to do something that allows a "list" to grow arbitrarily. There are several options:
Consume each group into an ArrayList<Float> - when the group is complete, convert it into an array[float]:
float[] array = new float[list.size()];
int i = 0;
for (Float f : list) {
array[i] = f; // auto-unboxes Float to float
}
Use a third-party list-of-float library class
Copy the logic used by ArrayList to replace your array with a bigger one when needed -- http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/util/ArrayList.java#ArrayList.ensureCapacity%28int%29
Any number of approaches discussed in Computer Science textbooks, for example a linked list of arrays.
However none of this considers your reasons for slurping all these numbers into memory in the first place, nor whether this store meets your needs when it comes to processing the numbers.
You should step back and consider what your actual data processing requirement is, and whether slurping into memory is the best approach.
See whether you can do your processing by storing only a slice of data at a time, rather than storing the whole thing in memory. For example, to calculate max/min/mean, you don't need every number to be in memory -- you just need to keep a running total.
Or, consider using a lightweight database library.
You could use a RedBlack BST, which will be an extremely efficient way to store/retrieve data. This relies on nodes that link to other nodes, so there's no limit to the size of the input, as long as you have enough memory for java.
public class example{
public static void main(String args[]) {
StringBuffer s1 = new StringBuffer(10);
s1.insert(0,avaffffffffffffffffffffffffffffffffffffffffffvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv");
System.out.println(s1);
}
}
the output of this code is coming as avaffffffffffffffffffffffffffffffffffffffffffvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv.
what is the use of parameter 10 in the StringBuffer class's method?
if 10 is the size of Buffer and 0 is the offset of insert method then how will we get the whole string as an output?
From JavaDoc:
A string buffer is like a String, but can be modified. At any
point in time it contains some particular sequence of characters, but
the length and content of the sequence can be changed through certain
method calls
10 is just initial capacity (continue reading JavaDoc):
Every string buffer has a capacity. As long as the length of the character sequence contained in the string buffer does not exceed the capacity, it is not necessary to allocate a new internal buffer array. If the internal buffer overflows, it is automatically made larger.
Read the docs:
capacity - the initial capacity.
So it is not the 'size'.
what is the use of parameter 10 in the StringBuffer class's method? if 10 is the size of Buffer and 0 is the offset of insert method then how will we get the whole string as an output?
The answer is that you can make the initial capacity smaller if you know that it is not likely to be used up. When a string buffer is created, memory has to be allocated. The default size is 16, but if you only want to use it for one character, you can specify that is initial capacity is 1, and since it will only resize when you add more than one character to it, you can avoid wasting memory.
The same applies to the parameters in things like HashSet(n). It will resize if you add elements too it, but if you know exactly how many elements it is going to have you can save a little memory and the operations needed for resizing by specifying its size exactly.
When I run this code:
StringBuffer name = new StringBuffer("stackoverflow.com");
System.out.println("Length: " + name.length() + ", capacity: " + name.capacity());
it gives output:
Length: 17, capacity: 33
Obvious length is related to number of characters in string, but I am not sure what capacity is?
Is that number of characters that StringBuffer can hold before reallocating space?
See: JavaSE 6 java.lang.StringBuffer capacity()
But your assumption is correct:
The capacity is the amount of storage available for newly inserted characters, beyond which an allocation will occur
It's the size of internal buffer. As Javadoc says:
Every string buffer has a capacity. As long as the length of the
character sequence contained in the string buffer does not exceed the
capacity, it is not necessary to allocate a new internal buffer array.
If the internal buffer overflows, it is automatically made larger.
Yes, you're correct, see the JavaDoc for more information:
As long as the length of the character sequence contained in the string buffer does not exceed the capacity, it is not necessary to allocate a new internal buffer array. If the internal buffer overflows, it is automatically made larger.
Internally StringBuffer uses a char array in order to store characters. Capacity is the initial size of that char array.
More INFO can be found from http://download.oracle.com/javase/6/docs/api/java/lang/StringBuffer.html
From http://download.oracle.com/javase/6/docs/api/java/lang/StringBuffer.html#capacity%28%29
public int capacity()
Returns the current capacity. The capacity is the amount of storage available for newly inserted characters, beyond which an allocation will occur.
Also from the same document
As of release JDK 5, this class has been supplemented with an equivalent class designed for use by a single thread, StringBuilder. The StringBuilder class should generally be used in preference to this one, as it supports all of the same operations but it is faster, as it performs no synchronization.
Yes, it's exactly that. You can think of StringBuffer as being a bit like a Vector<char> in that respect (except obviously you can't use char as a type argument in Java...)
Every string buffer has a capacity. As long as the length of the
character sequence contained in the string buffer does not exceed the
capacity, it is not necessary to allocate a new internal buffer array.
If the internal buffer overflows, it is automatically made larger.
From: http://download.oracle.com/javase/1.4.2/docs/api/java/lang/StringBuffer.html
StringBuffer has a char[] in which it keeps the strings that you append to it. The amount of memory currently allocated to that buffer is the capacity. The amount currently used is the length.
Taken from the official J2SE documentation
The capacity is the amount of storage available for newly inserted characters, beyond which an allocation will occur.
Its generally length+16, which is the minimum allocation, but once the number of character ie its size exceed the allocated one, StringBuffer also increases its size (by fixed amount), but by how much amount will be assigned,we can't calculate it.
"Every string buffer has a capacity. As long as the length of the character sequence contained in the string buffer does not exceed the capacity, it is not necessary to allocate a new internal buffer array. If the internal buffer overflows, it is automatically made larger."
http://download.oracle.com/javase/1.3/docs/api/java/lang/StringBuffer.html
-see capacity() and ensurecapacity()
Capacity is amount of storage available for newly inserted characters.It is different from length().The length() returns the total number of characters and capacity returns value 16 by default if the number of characters are less than 16.But if the number of characters are more than 16 capacity is number of characters + 16.
In this case,no of characters=17
SO,Capacity=17+16=33
Ivan, just read the documentation for capacity() - it directly answers your question...
The initial capacity of StringBuffer/StringBuilder class is 16.
For the first time if the length of your String becomes >16.
The capacity of StringBuffer/StringBuilder class increases to 34 i.e [(16*2)+2]
But when the length of your String becomes >34 the capacity of StringBuffer/StringBuilder class becomes exactly equal to the current length of String.
It is already too late for the answer, But hoping this might help someone.
When we use default constructor of StringBuffer then the capacity amount get allocated is 16
StringBuffer name = new StringBuffer();
System.out.println("capacity: " + name.capacity()); /*Output - 16*/
But in case of String argument Constructor of StringBuffer the capacity calculation is like below
StringBuffer sb = new StringBuffer(String x);
Capacity = default StringBuffer Capacity + x.length()
Solution:
StringBuffer name = new StringBuffer("stackoverflow.com");
System.out.println("Length: " + name.length() + ", capacity: " + name.capacity());
Capacity Calculation: capacity = 16 + 17