I was trying to figure out when to use or why capacity() method is different from length() method of StringBuilder or StringBuffer classes.
I have searched on Stack Overflow and managed to come up with this answer, but I didn't understand its distinction with length() method. I have visited this website also but this helped me even less.
StringBuilder is for building up text. Internally, it uses an array of characters to hold the text you add to it. capacity is the size of the array. length is how much of that array is currently filled by text that should be used. So with:
StringBuilder sb = new StringBuilder(1000);
sb.append("testing");
capacity() is 1000 (there's room for 1000 characters before the internal array needs to be replaced with a larger one), and length() is 7 (there are seven meaningful characters in the array).
The capacity is important because if you try to add more text to the StringBuilder than it has capacity for, it has to allocate a new, larger buffer and copy the content to it, which has memory use and performance implications*. For instance, the default capacity of a StringBuilder is currently 16 characters (it isn't documented and could change), so:
StringBuilder sb = new StringBuilder();
sb.append("Singing:");
sb.append("I am the very model of a modern Major General");
...creates a StringBuilder with a char[16], copies "Singing:" into that array, and then has to create a new array and copy the contents to it before it can add the second string, because it doesn't have enough room to add the second string.
* (whether either matters depends on what the code is doing)
The length of the string is always less than or equal to the capacity of the builder. The length is the actual size of the string stored in the builder, and the capacity is the maximum size that it can currently fit.
The builder’s capacity is automatically increased if more characters are added to exceed its capacity. Internally, a string builder is an array of characters, so the builder’s capacity is the size of the array. If the builder’s capacity is exceeded, the array is replaced by a new array. The new array size is 2 * (the previous array size + 1).
Since you are new to Java, I would suggest you this tip also regarding StringBuilder's efficiency:
You can use newStringBuilder(initialCapacity) to create a StringBuilder with a specified initial capacity. By carefully choosing the initial capacity, you can make your program more efficient. If the capacity is always larger than the actual length of the builder, the JVM will never need to reallocate memory for the builder. On the other hand, if the capacity is too large, you will waste memory space. You can use the trimToSize() method to reduce the capacity to the actual size.
I tried to explain it the best terms I could so I hope it was helpful.
Related
I have a use-case where I need to store Key - Value pairs of size approx. 500 Million entries in sinle VM of size 8 GB. Key and Value are of type Long. Key is auto incremented starting from 1, 2 ,3, so on..
Only once I build this Map[K-V] structure at the start of program as a exclusive operation, Once this is build, used only for lookup, No update or delete is performed in this structure.
I have tried this with java.util.hashMap but as expected it consumes a lot of memory and program give OOM : Heap usage exceeds Error.
I need some guidance on following which helps in reducing the memory footprint, I am Ok with some degradation in access performance.
What are the other alternative (from java collection or other libraries)
that can be tried here.
What is a recommended way to get the memory footprint by this Map, for
comparison purpose.
Just use a long[] or long[][].
500 million ascending keys is less than 2^31. And if you go over 2^31, use a long[][] where the first dimension is small and the second one is large.
(When the key type is an integer, you only need a complicated "map" data structure if the key space is sparse.)
The space wastage in a 1D array is insignificant. Every Java array node has 12 byte header, and the node size is rounded up to a multiple of 8 bytes. So a 500 million entry long[] will take so close to 500 million x 8 bytes == 4 billion bytes that it doesn't matter.
However, a JVM typically cannot allocate a single object that takes up the entire available heap space. If virtual address space is at a premium, it would be advisable to use a 2-D array; e.g. new long[4][125_000_000]. This makes the lookups slightly more complicated, but you will most likely reduce the memory footprint by doing this.
If you don't know beforehand the number of keys to expect, you could do the same thing with a combination of arrays and ArrayList objects. But an ArrayList has the problem that if you don't set an (accurate) capacity, the memory utilization is liable to be suboptimal. And if you populate an ArrayList by appending to it, the instantaneous memory demand for the append can be as much as 3 times the list's current space usage.
There is no reason for using a Map in your case.
If you just have a start index and further indizes are just constant increments, just use a List:
List<Long> data=new ArrayList<>(510_000_000);//capacity should ideally not be reached, if it is reached, the array behind the ArrayList needs to be reallocated, the allocated memory would be doubled by that
data.add(1337L);//inserting, how often you want
long value=data.get(1-1);//1...your index that starts with 1, -1...because your index starts with 1, you should subtract one from the index.
If you don't even add more elements and know the size from the start, an array will be even better:
long[] data=long[510_000_000];//capacity should surely not be reached, you will need to create a new array and copy all data if it is higher
int currentIndex=0;
data[currentIndex++]=1337L//inserting, as often as it is smaller than the size
long value=data[1-1];//1...your index that starts with 1, -1...because your index starts with 1, you should subtract one from the index.
Note that you should check the index (currentIndex) before inserting so that it is smaller than the array length.
When iterating, use currentIndex+1 as length instead of .length.
Create an array with the size you need and whenever you need to access it, use arr[i-1] (-1 because your indizes start with 1 instead of zero).
If you "just" have 500 million entries, you will not reach the integer limit and a simple array will be fine.
If you need more entries and you have sufficient memories, use an array of arrays.
The memory footprint of using an array this big is the memory footprint of the data and a bit more.
However, if you don't know the size, you should use a higher length/capacity then you may need. If you use an ArrayList, the memory footprint will be doubled (temporarily tripled) whenever the capacity is reached because it needs to allocate a bigger array.
A Map would need an object for each entry and an array of lists for all those object that would highly increase the memory footprint. The increasing of the memory footprint (using HashMap) is even worse than with ÀrrayLists as the underlaying array is reallocated even if the Map is not completely filled up.
But consider saving it to the HDD/SSD if you need to store that much data. In most cases, this works much better. You can use RandomAccessFile in order to access the data on the HDD/SSD on any point.
This question already has answers here:
Java dynamic array sizes?
(19 answers)
Closed 8 years ago.
So I am assigned with a project where I have an array and as the user puts elements into this array it will have to double in length once it gets full. We are not permitted to use array lists or anything in the collections interface. What I was trying to do was to make a new array once the old one was full, and then I would copy the values over to the new array. The problem is I don't know how many times I will have to make a new array, so I was wondering how to go about solving this.
Arrays are fixed length. If you want a data structure that has variable length use an ArrayList
Since the poster explicitly stated the requirements of the homework exclude the use of Collections an alternative approach would be to use System.arraycopy(). In such an approach you only maintain a single array and as you add items to it you use System.arraycopy() to copy the old array into a new larger array. System.arraycopy() is relatively fast and is actually how ArrayList expands its size.
If you are concerned about the cost of using System.arraycopy() you could use it only when your array is full and create a new array with more space. E.g.
Create and array of size 20. When it gets full copy it into and array of size 40. When that gets full copy it into an array of size 60...
Interestingly ArrayList increases its size by
int newCapacity = (oldCapacity * 3)/2 + 1;
when the old array is full. Presumably the writers of this put some considerable thought into how much to grow the array by when needed. It might be worth doing the same.
As we know, there is a attribute in StringBuilder called capacity, it is always larger than the length of StringBuilder object. However, what is capacity used for? It will be expanded if the length is larger than the capacity. If it does something, can someone give an example?
You can use the initial capacity to save the need to re-size the StringBuilder while appending to it, which costs time.
If you know if advance how many characters would be appended to the StringBuilder and you specify that size when you create the StringBuilder, it will never have to be re-sized while it is being used.
If, on the other hand, you don't give an initial capacity, or give a too small intial capacity, each time that capacity is reached, the storage of the StringBuilder has to be increased, which involves copying the data stored in the original storage to the larger storage.
The string builder has to store the string that is being built somewhere. It does so in an array of characters. The capacity is the length of this array. Once the array would overflow, a new (longer) array is allocated and contents are transferred to it. This makes the capacity rise.
If you are not concerned about performance, simply ignore the capacity. The capacity might get interesting once you are about to construct huge strings and know their size upfront. Then you can request a string builder with a capacity being equal to the expected size (or slightly larger if you are not sure about the size).
Example when building a string with a content size of 1 million:
StringBuilder sb = new StringBuilder(1000000);
for(int i = 0; i < 1000000; i++){
sb.append("x");
}
Initializing the string builder with one million will make it faster in comparison to a default string builder which has to copy its array repeatedly.
StringBuilder is backed by an array of characters. The default capacity is 16 + the length of the String argument. If you append to the StringBuilder and the number of characters cannot be fit in the array, then the capacity will have to be changed which will take time. So, if you have some idea about the the number of characters that you might have, initialize the capacity.
The answer is: performance. As the other answers already say, StringBuilder uses an internal array of some original size (capacity). Every time the building up string gets to large for the array to hold it, StringBuilder has to allocate a new, larger array, copy the data from the previous array to the new one and delete the previous array.
If you know beforehand what size the resulting string might be and pass that information to the constructor, StringBuilder can create a large enough array right away and thus can avoid the allocating and copying.
While for small strings, the performance gain is negelectible, it make quite a difference if you build really large strings.
public class example{
public static void main(String args[]) {
StringBuffer s1 = new StringBuffer(10);
s1.insert(0,avaffffffffffffffffffffffffffffffffffffffffffvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv");
System.out.println(s1);
}
}
the output of this code is coming as avaffffffffffffffffffffffffffffffffffffffffffvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv.
what is the use of parameter 10 in the StringBuffer class's method?
if 10 is the size of Buffer and 0 is the offset of insert method then how will we get the whole string as an output?
From JavaDoc:
A string buffer is like a String, but can be modified. At any
point in time it contains some particular sequence of characters, but
the length and content of the sequence can be changed through certain
method calls
10 is just initial capacity (continue reading JavaDoc):
Every string buffer has a capacity. As long as the length of the character sequence contained in the string buffer does not exceed the capacity, it is not necessary to allocate a new internal buffer array. If the internal buffer overflows, it is automatically made larger.
Read the docs:
capacity - the initial capacity.
So it is not the 'size'.
what is the use of parameter 10 in the StringBuffer class's method? if 10 is the size of Buffer and 0 is the offset of insert method then how will we get the whole string as an output?
The answer is that you can make the initial capacity smaller if you know that it is not likely to be used up. When a string buffer is created, memory has to be allocated. The default size is 16, but if you only want to use it for one character, you can specify that is initial capacity is 1, and since it will only resize when you add more than one character to it, you can avoid wasting memory.
The same applies to the parameters in things like HashSet(n). It will resize if you add elements too it, but if you know exactly how many elements it is going to have you can save a little memory and the operations needed for resizing by specifying its size exactly.
When I run this code:
StringBuffer name = new StringBuffer("stackoverflow.com");
System.out.println("Length: " + name.length() + ", capacity: " + name.capacity());
it gives output:
Length: 17, capacity: 33
Obvious length is related to number of characters in string, but I am not sure what capacity is?
Is that number of characters that StringBuffer can hold before reallocating space?
See: JavaSE 6 java.lang.StringBuffer capacity()
But your assumption is correct:
The capacity is the amount of storage available for newly inserted characters, beyond which an allocation will occur
It's the size of internal buffer. As Javadoc says:
Every string buffer has a capacity. As long as the length of the
character sequence contained in the string buffer does not exceed the
capacity, it is not necessary to allocate a new internal buffer array.
If the internal buffer overflows, it is automatically made larger.
Yes, you're correct, see the JavaDoc for more information:
As long as the length of the character sequence contained in the string buffer does not exceed the capacity, it is not necessary to allocate a new internal buffer array. If the internal buffer overflows, it is automatically made larger.
Internally StringBuffer uses a char array in order to store characters. Capacity is the initial size of that char array.
More INFO can be found from http://download.oracle.com/javase/6/docs/api/java/lang/StringBuffer.html
From http://download.oracle.com/javase/6/docs/api/java/lang/StringBuffer.html#capacity%28%29
public int capacity()
Returns the current capacity. The capacity is the amount of storage available for newly inserted characters, beyond which an allocation will occur.
Also from the same document
As of release JDK 5, this class has been supplemented with an equivalent class designed for use by a single thread, StringBuilder. The StringBuilder class should generally be used in preference to this one, as it supports all of the same operations but it is faster, as it performs no synchronization.
Yes, it's exactly that. You can think of StringBuffer as being a bit like a Vector<char> in that respect (except obviously you can't use char as a type argument in Java...)
Every string buffer has a capacity. As long as the length of the
character sequence contained in the string buffer does not exceed the
capacity, it is not necessary to allocate a new internal buffer array.
If the internal buffer overflows, it is automatically made larger.
From: http://download.oracle.com/javase/1.4.2/docs/api/java/lang/StringBuffer.html
StringBuffer has a char[] in which it keeps the strings that you append to it. The amount of memory currently allocated to that buffer is the capacity. The amount currently used is the length.
Taken from the official J2SE documentation
The capacity is the amount of storage available for newly inserted characters, beyond which an allocation will occur.
Its generally length+16, which is the minimum allocation, but once the number of character ie its size exceed the allocated one, StringBuffer also increases its size (by fixed amount), but by how much amount will be assigned,we can't calculate it.
"Every string buffer has a capacity. As long as the length of the character sequence contained in the string buffer does not exceed the capacity, it is not necessary to allocate a new internal buffer array. If the internal buffer overflows, it is automatically made larger."
http://download.oracle.com/javase/1.3/docs/api/java/lang/StringBuffer.html
-see capacity() and ensurecapacity()
Capacity is amount of storage available for newly inserted characters.It is different from length().The length() returns the total number of characters and capacity returns value 16 by default if the number of characters are less than 16.But if the number of characters are more than 16 capacity is number of characters + 16.
In this case,no of characters=17
SO,Capacity=17+16=33
Ivan, just read the documentation for capacity() - it directly answers your question...
The initial capacity of StringBuffer/StringBuilder class is 16.
For the first time if the length of your String becomes >16.
The capacity of StringBuffer/StringBuilder class increases to 34 i.e [(16*2)+2]
But when the length of your String becomes >34 the capacity of StringBuffer/StringBuilder class becomes exactly equal to the current length of String.
It is already too late for the answer, But hoping this might help someone.
When we use default constructor of StringBuffer then the capacity amount get allocated is 16
StringBuffer name = new StringBuffer();
System.out.println("capacity: " + name.capacity()); /*Output - 16*/
But in case of String argument Constructor of StringBuffer the capacity calculation is like below
StringBuffer sb = new StringBuffer(String x);
Capacity = default StringBuffer Capacity + x.length()
Solution:
StringBuffer name = new StringBuffer("stackoverflow.com");
System.out.println("Length: " + name.length() + ", capacity: " + name.capacity());
Capacity Calculation: capacity = 16 + 17