When I run this code:
StringBuffer name = new StringBuffer("stackoverflow.com");
System.out.println("Length: " + name.length() + ", capacity: " + name.capacity());
it gives output:
Length: 17, capacity: 33
Obvious length is related to number of characters in string, but I am not sure what capacity is?
Is that number of characters that StringBuffer can hold before reallocating space?
See: JavaSE 6 java.lang.StringBuffer capacity()
But your assumption is correct:
The capacity is the amount of storage available for newly inserted characters, beyond which an allocation will occur
It's the size of internal buffer. As Javadoc says:
Every string buffer has a capacity. As long as the length of the
character sequence contained in the string buffer does not exceed the
capacity, it is not necessary to allocate a new internal buffer array.
If the internal buffer overflows, it is automatically made larger.
Yes, you're correct, see the JavaDoc for more information:
As long as the length of the character sequence contained in the string buffer does not exceed the capacity, it is not necessary to allocate a new internal buffer array. If the internal buffer overflows, it is automatically made larger.
Internally StringBuffer uses a char array in order to store characters. Capacity is the initial size of that char array.
More INFO can be found from http://download.oracle.com/javase/6/docs/api/java/lang/StringBuffer.html
From http://download.oracle.com/javase/6/docs/api/java/lang/StringBuffer.html#capacity%28%29
public int capacity()
Returns the current capacity. The capacity is the amount of storage available for newly inserted characters, beyond which an allocation will occur.
Also from the same document
As of release JDK 5, this class has been supplemented with an equivalent class designed for use by a single thread, StringBuilder. The StringBuilder class should generally be used in preference to this one, as it supports all of the same operations but it is faster, as it performs no synchronization.
Yes, it's exactly that. You can think of StringBuffer as being a bit like a Vector<char> in that respect (except obviously you can't use char as a type argument in Java...)
Every string buffer has a capacity. As long as the length of the
character sequence contained in the string buffer does not exceed the
capacity, it is not necessary to allocate a new internal buffer array.
If the internal buffer overflows, it is automatically made larger.
From: http://download.oracle.com/javase/1.4.2/docs/api/java/lang/StringBuffer.html
StringBuffer has a char[] in which it keeps the strings that you append to it. The amount of memory currently allocated to that buffer is the capacity. The amount currently used is the length.
Taken from the official J2SE documentation
The capacity is the amount of storage available for newly inserted characters, beyond which an allocation will occur.
Its generally length+16, which is the minimum allocation, but once the number of character ie its size exceed the allocated one, StringBuffer also increases its size (by fixed amount), but by how much amount will be assigned,we can't calculate it.
"Every string buffer has a capacity. As long as the length of the character sequence contained in the string buffer does not exceed the capacity, it is not necessary to allocate a new internal buffer array. If the internal buffer overflows, it is automatically made larger."
http://download.oracle.com/javase/1.3/docs/api/java/lang/StringBuffer.html
-see capacity() and ensurecapacity()
Capacity is amount of storage available for newly inserted characters.It is different from length().The length() returns the total number of characters and capacity returns value 16 by default if the number of characters are less than 16.But if the number of characters are more than 16 capacity is number of characters + 16.
In this case,no of characters=17
SO,Capacity=17+16=33
Ivan, just read the documentation for capacity() - it directly answers your question...
The initial capacity of StringBuffer/StringBuilder class is 16.
For the first time if the length of your String becomes >16.
The capacity of StringBuffer/StringBuilder class increases to 34 i.e [(16*2)+2]
But when the length of your String becomes >34 the capacity of StringBuffer/StringBuilder class becomes exactly equal to the current length of String.
It is already too late for the answer, But hoping this might help someone.
When we use default constructor of StringBuffer then the capacity amount get allocated is 16
StringBuffer name = new StringBuffer();
System.out.println("capacity: " + name.capacity()); /*Output - 16*/
But in case of String argument Constructor of StringBuffer the capacity calculation is like below
StringBuffer sb = new StringBuffer(String x);
Capacity = default StringBuffer Capacity + x.length()
Solution:
StringBuffer name = new StringBuffer("stackoverflow.com");
System.out.println("Length: " + name.length() + ", capacity: " + name.capacity());
Capacity Calculation: capacity = 16 + 17
Related
I've found a few other questions on SO that are close to what I need but I can't figure this out. I'm reading a text file line by line and getting an out of memory error. Here's the code:
System.out.println("Total memory before read: " + Runtime.getRuntime().totalMemory()/1000000 + "MB");
String wp_posts = new String();
try(Stream<String> stream = Files.lines(path, StandardCharsets.UTF_8)){
wp_posts = stream
.filter(line -> line.startsWith("INSERT INTO `wp_posts`"))
.collect(StringBuilder::new, StringBuilder::append,
StringBuilder::append)
.toString();
} catch (Exception e1) {
System.out.println(e1.getMessage());
e1.printStackTrace();
}
try {
System.out.println("wp_posts Mega bytes: " + wp_posts.getBytes("UTF-8").length/1000000);
} catch (UnsupportedEncodingException e) {
e.printStackTrace();
}
System.out.println("Total memory after read: " + Runtime.getRuntime().totalMemory()/1000000 + "MB");
Output is like (when run in an environment with more memory):
Total memory before read: 255MB
wp_posts Mega bytes: 18
Total memory after read: 1035MB
Note than in my production environment, I cannot increase the memory heap.
I've tried explicitly closing the stream, doing a gc, and putting stream in parallel mode (consumed more memory).
My questions are:
Is this amount of memory usage expected?
Is there a way to use less memory?
Your problem is in collect(StringBuilder::new, StringBuilder::append, StringBuilder::append). When you add smth to the StringBuilder and it has not enough internal array, then it double it and copy part from previous one.
Do new StringBuilder(int size) to predefine size of internal array.
Second problem, is that you have a big file, but as result you put it into a StringBuilder. This is very strange to me. Actually this is same as read whole file into a String without using Stream.
Your Runtime.totalMemory() calculation is pointless if you are allowing JVM to resize the heap. Java will allocate heap memory as needed as long as it doesn't exceed -Xmx value. Since JVM is smart it won't allocate heap memory by 1 byte at a time because it would be very expensive. Instead JVM will request a larger amount of memory at a time (actual value is platform and JVM implementation specific).
Your code is currently loading the content of the file into memory so there will be objects created on the heap. Because of that JVM most likely will request memory from the OS and you will observer increased Runtime.totalMemory() value.
Try running your program with strictly sized heap e.g. by adding -Xms300m -Xmx300m options. If you won't get OutOfMemoryError then decrease the heap until you get it. However you also need to pay attention to GC cycles, these things go hand in had and are a trade off.
Alternatively you can create a heap dump after the file is processed and then explore the data with MemoryAnalyzer.
The way you calculated memory is incorrect due to the following reasons:
You have taken the total memory (not the used memory). JVM allocates memory lazily and when it does, it does it in chunks. So, when it needs an additional 1 byte memory, it may allocate 1MB memory (provided the total memory does not exceed the configured max heap size). Thus a good portion of allocated heap memory may remain unused. Therefore, you need to calculate the used memory: Runtime.getRuntime().totalMemory() - Runtime.getRuntime().freeMemory()
A good portion of the memory you see with the above formula maybe ready for garbage collection. JVM would definitely do the garbage collection before saying OutOfMemory. Therefore, to get an idea, you should do a System.gc() before calculating used memory. Ofcourse, you don't call gc in production and also calling gc does not guarantee that JVM would indeed trigger garbage collection. But for testing purpose, I think it works well.
You got the OutOfMemory when the stream processing was in progress. At that time the String was not formed and the StringBuilder had strong reference. You should call the capacity() method of StringBuilder to get the actual number of char elements in the array within StringBuilder and then multiply it by 2 to get the number of bytes because Java internally uses UTF16 which needs 2 bytes to store an ASCII character.
Finally, the way your code is written (i.e. not specifying a big enough size for StringBuilder initially), every time your StringBuilder runs out of space, it double the size of the internal array by creating a new array and copying the content. This means there will be triple the size allocated at a time than the actual String. This you cannot measure because it happens within the StringBuilder class and when the control comes out of StringBuilder class the old array is ready for garbage collection. So, there is a high chance that when you get the OutOfMemory error, you get it at that point in StringBuilder when it tries to allocate a double sized array, or more specifically in the Arrays.copyOf method
How much memory is expected to be consumed by your program as is? (A rough estimate)
Let's consider the program which is similar to yours.
public static void main(String[] arg) {
// Initialize the arraylist to emulate a
// file with 32 lines each containing
// 1000 ASCII characters
List<String> strList = new ArrayList<String>(32);
for (Integer i = 0; i < 32; i++) {
strList.add(String.format("%01000d", i));
}
StringBuilder str = new StringBuilder();
strList.stream().map(element -> {
// Print the number of char
// reserved by the StringBuilder
System.out.print(str.capacity() + ", ");
return element;
}).collect(() -> {
return str;
}, (response, element) -> {
response.append(element);
}, (response, element) -> {
response.append(element);
}).toString();
}
Here after every append, I'm printing the capacity of the StringBuilder.
The output of the program is as follows:
16, 1000, 2002, 4006, 4006, 8014, 8014, 8014, 8014,
16030, 16030, 16030, 16030, 16030, 16030, 16030, 16030,
32062, 32062, 32062, 32062, 32062, 32062, 32062, 32062,
32062, 32062, 32062, 32062, 32062, 32062, 32062,
If your file has "n" lines (where n is a power of 2) and each line has an average "m" ASCII characters, the capacity of the StringBuilder at the end of the program execution will be: (n * m + 2 ^ (a + 1) ) where (2 ^ a = n).
E.g. if your file has 256 lines and an average of 1500 ASCII characters per line, the total capacity of the StringBuilder at the end of program will be: (256 * 1500 + 2 ^ 9) = 384512 characters.
Assuming, you have only ASCII characters in you file, each character will occupy 2 bytes in UTF-16 representation. Additionally, everytime when the StringBuilder array runs out of space, a new bigger array twice the size of original is created (see the capacity growth numbers above) and the content of the old array is copied to the new array. The old array is then left for garbage collection. Therefore, if you add another 2 ^ (a+1) or 2 ^ 9 characters, the StringBuilder would create a new array for holding (n * m + 2 ^ (a + 1) ) * 2 + 2 characters and start copying the content of old array into the new array. Thus, there will be two big sized arrays within the StringBuilder as the copying activity goes on.
thus the total memory will be: 384512 * 2 + (384512 * 2 + 2 ) * 2 = 23,07,076 = 2.2 MB (approx.) to hold only 0.7 MB data.
I have ignored the other memory consuming items like array header, object header, references etc. as those will be negligible or constant compared to the array size.
So, in conclusion, 256 lines with 1500 characters each, consumes 2.2 MB (approx.) to hold only 0.7 MB data (one-third data).
If you had initialized the StringBuilder with the size 3,84,512 at the beginning, you could have accommodated the same number of characters in one-third memory and also there would have been much less work for CPU in terms of array copy and garbage collection
What you may consider doing instead
Finally, in such kind of problems, you may want to do it in chunks where you would write the content of your StringBuilder in a file or database as soon as it has processed 1000 records (say), clear the StringBuilder and start over again for the next batch of records. Thus you'd never hold more than 1000 (say) record worth of data in memory.
I was trying to figure out when to use or why capacity() method is different from length() method of StringBuilder or StringBuffer classes.
I have searched on Stack Overflow and managed to come up with this answer, but I didn't understand its distinction with length() method. I have visited this website also but this helped me even less.
StringBuilder is for building up text. Internally, it uses an array of characters to hold the text you add to it. capacity is the size of the array. length is how much of that array is currently filled by text that should be used. So with:
StringBuilder sb = new StringBuilder(1000);
sb.append("testing");
capacity() is 1000 (there's room for 1000 characters before the internal array needs to be replaced with a larger one), and length() is 7 (there are seven meaningful characters in the array).
The capacity is important because if you try to add more text to the StringBuilder than it has capacity for, it has to allocate a new, larger buffer and copy the content to it, which has memory use and performance implications*. For instance, the default capacity of a StringBuilder is currently 16 characters (it isn't documented and could change), so:
StringBuilder sb = new StringBuilder();
sb.append("Singing:");
sb.append("I am the very model of a modern Major General");
...creates a StringBuilder with a char[16], copies "Singing:" into that array, and then has to create a new array and copy the contents to it before it can add the second string, because it doesn't have enough room to add the second string.
* (whether either matters depends on what the code is doing)
The length of the string is always less than or equal to the capacity of the builder. The length is the actual size of the string stored in the builder, and the capacity is the maximum size that it can currently fit.
The builder’s capacity is automatically increased if more characters are added to exceed its capacity. Internally, a string builder is an array of characters, so the builder’s capacity is the size of the array. If the builder’s capacity is exceeded, the array is replaced by a new array. The new array size is 2 * (the previous array size + 1).
Since you are new to Java, I would suggest you this tip also regarding StringBuilder's efficiency:
You can use newStringBuilder(initialCapacity) to create a StringBuilder with a specified initial capacity. By carefully choosing the initial capacity, you can make your program more efficient. If the capacity is always larger than the actual length of the builder, the JVM will never need to reallocate memory for the builder. On the other hand, if the capacity is too large, you will waste memory space. You can use the trimToSize() method to reduce the capacity to the actual size.
I tried to explain it the best terms I could so I hope it was helpful.
As we know, there is a attribute in StringBuilder called capacity, it is always larger than the length of StringBuilder object. However, what is capacity used for? It will be expanded if the length is larger than the capacity. If it does something, can someone give an example?
You can use the initial capacity to save the need to re-size the StringBuilder while appending to it, which costs time.
If you know if advance how many characters would be appended to the StringBuilder and you specify that size when you create the StringBuilder, it will never have to be re-sized while it is being used.
If, on the other hand, you don't give an initial capacity, or give a too small intial capacity, each time that capacity is reached, the storage of the StringBuilder has to be increased, which involves copying the data stored in the original storage to the larger storage.
The string builder has to store the string that is being built somewhere. It does so in an array of characters. The capacity is the length of this array. Once the array would overflow, a new (longer) array is allocated and contents are transferred to it. This makes the capacity rise.
If you are not concerned about performance, simply ignore the capacity. The capacity might get interesting once you are about to construct huge strings and know their size upfront. Then you can request a string builder with a capacity being equal to the expected size (or slightly larger if you are not sure about the size).
Example when building a string with a content size of 1 million:
StringBuilder sb = new StringBuilder(1000000);
for(int i = 0; i < 1000000; i++){
sb.append("x");
}
Initializing the string builder with one million will make it faster in comparison to a default string builder which has to copy its array repeatedly.
StringBuilder is backed by an array of characters. The default capacity is 16 + the length of the String argument. If you append to the StringBuilder and the number of characters cannot be fit in the array, then the capacity will have to be changed which will take time. So, if you have some idea about the the number of characters that you might have, initialize the capacity.
The answer is: performance. As the other answers already say, StringBuilder uses an internal array of some original size (capacity). Every time the building up string gets to large for the array to hold it, StringBuilder has to allocate a new, larger array, copy the data from the previous array to the new one and delete the previous array.
If you know beforehand what size the resulting string might be and pass that information to the constructor, StringBuilder can create a large enough array right away and thus can avoid the allocating and copying.
While for small strings, the performance gain is negelectible, it make quite a difference if you build really large strings.
public class example{
public static void main(String args[]) {
StringBuffer s1 = new StringBuffer(10);
s1.insert(0,avaffffffffffffffffffffffffffffffffffffffffffvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv");
System.out.println(s1);
}
}
the output of this code is coming as avaffffffffffffffffffffffffffffffffffffffffffvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv.
what is the use of parameter 10 in the StringBuffer class's method?
if 10 is the size of Buffer and 0 is the offset of insert method then how will we get the whole string as an output?
From JavaDoc:
A string buffer is like a String, but can be modified. At any
point in time it contains some particular sequence of characters, but
the length and content of the sequence can be changed through certain
method calls
10 is just initial capacity (continue reading JavaDoc):
Every string buffer has a capacity. As long as the length of the character sequence contained in the string buffer does not exceed the capacity, it is not necessary to allocate a new internal buffer array. If the internal buffer overflows, it is automatically made larger.
Read the docs:
capacity - the initial capacity.
So it is not the 'size'.
what is the use of parameter 10 in the StringBuffer class's method? if 10 is the size of Buffer and 0 is the offset of insert method then how will we get the whole string as an output?
The answer is that you can make the initial capacity smaller if you know that it is not likely to be used up. When a string buffer is created, memory has to be allocated. The default size is 16, but if you only want to use it for one character, you can specify that is initial capacity is 1, and since it will only resize when you add more than one character to it, you can avoid wasting memory.
The same applies to the parameters in things like HashSet(n). It will resize if you add elements too it, but if you know exactly how many elements it is going to have you can save a little memory and the operations needed for resizing by specifying its size exactly.
Is it possible for a string to hold around 250000 ( a round figure) lines of data in a text file ?
You should be able to get a String of length Integer.MAX_VALUE (always 2147483647 (231 - 1) by the Java specification, the maximum size of an array, which the String class uses for internal storage) or half your maximum heap size (since each character is two bytes), whichever is smaller.
Strings are backed by char[] arrays, so they are limited to the size of an array structure. An array can hold Integer.MAX_VALUE elements and it also depends on how much memory you've allocated to the JVM. So, in theory a String can be upwards of two billion characters
Since the maximum size a string can hold is Integer.MAX_VALUE i.e. 2^31-1 bytes which is 2GB-1byte ~ 2GB , a String can hold 1400 Kindle e-book contents