StringBuilder capacity() - java

I noticed that the capacity method returns StringBuilder capacity without a logic
way ... sometime its value is equals to the string length other time it's greater...
is there an equation for know which is its logic?

When you append to the StringBuilder, the following logic happens:
if (newCount > value.length) {
expandCapacity(newCount);
}
where newCount is the number of characters needed, and value.length is the current size of the buffer.
expandCapacity simply increases the size of the backing char[]
The ensureCapacity() method is the public way to call expandCapacity(), and its docs say:
Ensures that the capacity is at least equal to the specified minimum. If the current capacity is less than the argument, then a new internal array is allocated with greater capacity. The new capacity is the larger of:
The minimumCapacity argument.
Twice the old capacity, plus 2.
If the minimumCapacity argument is nonpositive, this method takes no action and simply returns.

I will try to explain this with some example.
public class StringBuilderDemo {
public static void main(String[] args) {
StringBuilder sb = new StringBuilder();
System.out.println(sb.length());
System.out.println(sb.capacity());
}
}
length() - the length of the character sequence in the builder
since this stringbuilder doesn't contain any content, its length will be 0.
capacity() - the number of character spaces that have been allocated.
When you try to construct a stringbuilder with empty content, by default it takes the initialize size as length+16 which is 0+16. so capacity would return 16 here.
Note: The capacity, which is returned by the capacity() method, is always greater than or equal to the length (usually greater than) and will automatically expand as necessary to accommodate additions to the string builder.
The logic behind the capacity function:
If you don't initialize stringbuilder with any content, default capacity will be taken as 16 characters capacity.
If you initialize stringbuilder with any content, then capacity will be content length+16.
When you add new content to stringbuilder object, if current capacity is not sufficient to take new value, then it will grow by (previous array capacity+1)*2.
This analysis is take from actual StringBuilder.java code

This function does something different than you expect - it gives you the max number of chars this StringBuilder instance memory can hold at this time.
String Builder must read

Here's the logic:
If you define a new instance of the StringBuilder class without a constructor, like so new StringBuilder(); the default capacity is 16.
A constructor can be either an int or a String.
For a String constructor, the default capacity is calculated like this
int newCapacity = string.length() + 16;
For an int constructor, the capacity is calculated like this
int newCapacity = intSpecified + 16;
If a new String is appended to the StringBuilder and the new length of the String is greater than the current capacity, then the capacity is calculated like this:
int newCapacity = (oldCapacity + 1) * 2;

EDIT: Apologies - the below is information on .NET's StringBuilder, and is not strictly relevant to the original question.
http://johnnycoder.com/blog/2009/01/05/stringbuilder-required-capacity-algorithm/
StringBuilder allocates space for substrings you might add to it (much like List creates space the array it wraps). If you want the actual length of the string, use StringBuilder.Length.

From the API:
Every string builder has a capacity.
As long as the length of the character
sequence contained in the string
builder does not exceed the capacity,
it is not necessary to allocate a new
internal buffer. If the internal
buffer overflows, it is automatically
made larger.
Whenever you append something, there is a check to make sure that the updated StringBuilder won't exceed its capacity, and if it does, the internal storage of the StringBuilder is resized:
int len = str.length();
int newCount = count + len;
if (newCount > value.length)
expandCapacity(newCount);
When data is added to it that exceeds its capacity it is re-sized according to the following formula:
void expandCapacity(int minimumCapacity) {
int newCapacity = (value.length + 1) * 2;
if (newCapacity < 0) {
newCapacity = Integer.MAX_VALUE;
} else if (minimumCapacity > newCapacity) {
newCapacity = minimumCapacity;
}
value = Arrays.copyOf(value, newCapacity);
}
See the src.zip file that comes with the JDK for more information. (Above snippets taken from the 1.6 JDK)

You can go inside the JDK code and see how it works, it is based on a char array: new char[capacity], it is similar to how the ArrayList works (When to use LinkedList over ArrayList?). Both use arrays to be 'hardware efficient', the trick is to allocate a large chunk of memory and work in it until you run out of memory and need the next big chunk to continue (expand/grow).

in Java 1.8
public AbstractStringBuilder append(String str) {
if (str == null)
return appendNull();
int len = str.length();
ensureCapacityInternal(count + len);
str.getChars(0, len, value, count);
count += len;
return this;
}
private void ensureCapacityInternal(int minimumCapacity) {
// overflow-conscious code
if (minimumCapacity - value.length > 0) {
value = Arrays.copyOf(value,
newCapacity(minimumCapacity));
}
}
for example :
StringBuilder str = new StringBuilder();
System.out.println(str.capacity()); //16
str.append("123456789012345");
System.out.println(str.capacity()); //16
str.append("12345678901234567890");
System.out.println(str.capacity()); // 15 + 20 = 35

Related

Memory size of a Java 32-bit system int[] array

In Java, memory used for occupying the int[] array of size n equals to (4 + n) * 4 bytes.
Practically can be proven by the code below:
public class test {
public static void main(String[] args) {
long size = memoryUsed();
int[] array = new int[2000];
size = memoryUsed() - size;
if (size == 0)
throw new AssertionError("You need to run this with -XX:-UseTLAB for accurate accounting");
System.out.printf("int[2000] used %,d bytes%n", size);
}
public static long memoryUsed() {
return Runtime.getRuntime().totalMemory() - Runtime.getRuntime().freeMemory();
}
}
so interesting is number 4 in parentheses. First portion of 4 bytes takes array reference, second - array length, then what takes 8 bytes left?
First portion of 4 bytes takes array reference, second - array length, then what takes 8 bytes left?
Normal object overhead - typically a few bytes indicating the type of the object, and a few bytes associated with the monitor for the object. This is not array-specific at all - you'll see it for all objects.

Improvement of Algorithm: Counting set bits in Byte-Arrays

We store knowledge in byte arrays as bits. Counting the number of set bits is pretty slow. Any suggestion to improve the algorithm is welcome:
public static int countSetBits(byte[] array) {
int setBits = 0;
if (array != null) {
for (int byteIndex = 0; byteIndex < array.length; byteIndex++) {
for (int bitIndex = 0; bitIndex < 7; bitIndex++) {
if (getBit(bitIndex, array[byteIndex])) {
setBits++;
}
}
}
}
return setBits;
}
public static boolean getBit(int index, final byte b) {
byte t = setBit(index, (byte) 0);
return (b & t) > 0;
}
public static byte setBit(int index, final byte b) {
return (byte) ((1 << index) | b);
}
To count the bits of a byte array of length of 156'564 takes 300 ms, that's too much!
Try Integer.bitcount to obtain the number of bits set in each byte. It will be more efficient if you can switch from a byte array to an int array. If this is not possible, you could also construct a look-up table for all 256 bytes to quickly look up the count rather than iterating over individual bits.
And if it's always the whole array's count you're interested in, you could wrap the array in a class that stores the count in a separate integer whenever the array changes. (edit: Or, indeed, as noted in comments, use java.util.BitSet.)
I would use the same global loop but instead of looping inside each byte I would simply use a (precomputed) array of size 256 mapping bytes to their bit count. That would probably be very efficient.
If you need even more speed, then you should separately maintain the count and increment it and decrement it when setting bits (but that would mean a big additional burden on those operations so I'm not sure it's applicable for you).
Another solution would be based on BitSet implementation : it uses an array of long (and not bytes) and here's how it counts :
658 int sum = 0;
659 for (int i = 0; i < wordsInUse; i++)
660 sum += Long.bitCount(words[i]);
661 return sum;
I would use:
byte[] yourByteArray = ...
BitSet bitset = BitSet.valueOf(yourByteArray); // java.util.BitSet
int setBits = bitset.cardinality();
I don't know if it's faster, but I think it will be faster than what you have. Let me know?
Your method would look like
public static int countSetBits(byte[] array) {
return BitSet.valueOf(array).cardinality();
}
You say:
We store knowledge in byte arrays as bits.
I would recommend to use a BitSet for that. It gives you convenient methods, and you seem to be interested in bits, not bytes, so it is a much more appropriate data type compared to a byte[]. (Internally it uses a long[]).
By far the fastest way is counting bits set, in "parallel", method is called Hamming weight
and is implemented in Integer.bitCount(int i) as far as I know.
As per my understaning,
1 Byte = 8 Bits
So if Byte Array size = n , then isn't total number of bits = n*8 ?
Please correct me if my understanding is wrong
Thanks
Vinod

Program layout - creating aptly sized arrays through System.in.read

The following is obviously very impractical but my lecturer insists on teaching us a very fundamental understanding of programming. The exercise he gave us goes like this:
Using only System.in.read, int, char, and loops, create a method that reads
user input from the command line and returns a char[] that's exactly as big as the amount
of characters that were entered. Do not use System.arraycopy() or other library methods.
I'm clueless. Since there seems to be no way of buffering System.in.read input, the array would have to be perfectly sized before any chars are parsed. How in the world is this supposed to work?
create a method that reads user input from the command line and returns a char[]
On a second thought, I assume that you are supposed to do your own input buffering by growing a char[] array yourself. That should be the reason why System.arraycopy() is mentioned.
Growing an array works like
create a new array that is 1 item longer than the existing one.
for each character in the old array
copy the character from the old to the new array, keeping the position
replace the old array with grown array.
If you combine that with a loop that reads all characters from the inputstream you get about the following and should be done with your assignment.
start with array of length 0
while character available from inputstream
grow the array one larger
put the character from inputstream into the last slot of the array
return array
It is even possible to do it without loops and growing arrays. Just creating a new array of the correct size once.
private static char[] readToCharArray(int length) throws IOException {
int read = System.in.read();
char[] result;
if (read == -1 || read == '\r' || read == '\n' ) {
result = new char[length];
} else {
result = readToCharArray(length + 1);
result[length] = (char) read;
}
return result;
}
char[] myArray = readToCharArray(0);
what about manual arraycopy, the text doesn't say anything about that? if that is allowed you could do something like this:
private static char[] readInput() throws IOException {
System.out.println("type something terminated with '|'");
char[] input = new char[0];
int count = 0;
int read;
for (; ; ) {
read = System.in.read();
if (read == '|') {
break;
} else {
char[] tmp = new char[input.length + 1];
for (int i = 0; i < input.length; i++) {
tmp[i] = input[i];
}
input = tmp;
}
input[count] = (char) read;
count++;
}
return input;
}
you could also check for read == -1 instead of read == '|' but the end-of-input character differs from system to system. Instead of copying the char[] on every iteration you could also do it every x iterations and then at the end create an array of the correct size. You could also use a while loop...
But it would be definitely more fun to just return an empty array of the correct size as zapl suggested :)
I'm going to assume that your lecturer meant:
The char[] should contain the characters that were read from System.in (not just be the right size)
"System.in.read" refers only to InputStream#read() and not to the other overloaded read methods on InputStream, so you're constrained to reading one character at a time.
You should look at the way ArrayList is implemented. It is backed by an array, yet the list is arbitrarily resizable. When the size of the list exceeds the array size, ArrayList creates a new array that is larger, and then copies the contents of the old array into it. Here are some relevant excerpts from ArrayList:
/**
* Appends the specified element to the end of this list.
*
* #param e element to be appended to this list
* #return <tt>true</tt> (as specified by {#link Collection#add})
*/
public boolean add(E e) {
ensureCapacityInternal(size + 1); // Increments modCount!!
elementData[size++] = e;
return true;
}
private void ensureCapacityInternal(int minCapacity) {
modCount++;
// overflow-conscious code
if (minCapacity - elementData.length > 0)
grow(minCapacity);
}
/**
* Increases the capacity to ensure that it can hold at least the
* number of elements specified by the minimum capacity argument.
*
* #param minCapacity the desired minimum capacity
*/
private void grow(int minCapacity) {
// overflow-conscious code
int oldCapacity = elementData.length;
int newCapacity = oldCapacity + (oldCapacity >> 1);
if (newCapacity - minCapacity < 0)
newCapacity = minCapacity;
if (newCapacity - MAX_ARRAY_SIZE > 0)
newCapacity = hugeCapacity(minCapacity);
// minCapacity is usually close to size, so this is a win:
elementData = Arrays.copyOf(elementData, newCapacity);
}
Since you can't use System.arraycopy(), you'll need to write your own method to do that. That's just a for loop.
This isn't actually all that inefficient. As the javadoc describes, ArrayList#add(E) runs in amortized constant time.
If you follow the ArrayList strategy exactly, then your resulting array will be larger than it needs to be, so at the end, you'll need to do one more array resize at the end to truncate it to exactly the input size. Alternately, you could just grow the array by 1 every time you read a character, but the running time will be quadratic (n^2) rather than linear (n) in the input length.

Why does the capacity change to 112 in the following example?

In the following code...
StringBuffer buf = new StringBuffer("Is is a far, far better thing that i do");
System.out.println("buf = "+ buf);
System.out.println("buf.length() = " + buf.length());
System.out.println("buf.capacity() = " + buf.capacity());
buf.setLength(60);
System.out.println("buf = "+ buf);
System.out.println("buf.length() = " + buf.length());
System.out.println("buf.capacity() = " + buf.capacity());
buf.setLength(30);
System.out.println("buf = "+ buf);
System.out.println("buf.length() = " + buf.length());
System.out.println("buf.capacity() = " + buf.capacity());
... the output is:
buf = Is is a far, far better thing that i do
buf.length() = 39
buf.capacity() = 55
buf = Is is a far, far better thing that i do
buf.length() = 60
buf.capacity() = 112
buf = Is is a far, far better thing
buf.length() = 30
buf.capacity() = 112
Consider how StringBuffer is typically used. When the String we need to store in a StringBuffer exceeds the current capacity, the current capacity is increased. If the algorithm only increased the capacity to the required amount, then StringBuffer would be very inefficient.
For example:
buf.append(someText);
buf.append(someMoreText);
buf.append(Another100Chars);
might require that the capacity be increased three times in a row. Every time the capacity is increased, the underlying data structure (an array) needs to be re-allocated in memory, which involves allocating more RAM from the heap, copying the existing data, and then eventually garbage collecting the previously allocated memory. To reduce the frequency of this happening, StringBuffer will double its capacity when needed. The algorithm moves the capacity from n to 2n+2. Here is the source code from AbstraceStringBuilder where this method is implemented:
/**
* This implements the expansion semantics of ensureCapacity with no
* size check or synchronization.
*/
void expandCapacity(int minimumCapacity) {
int newCapacity = value.length * 2 + 2;
if (newCapacity - minimumCapacity < 0)
newCapacity = minimumCapacity;
if (newCapacity < 0) {
if (minimumCapacity < 0) // overflow
throw new OutOfMemoryError();
newCapacity = Integer.MAX_VALUE;
}
value = Arrays.copyOf(value, newCapacity);
}
Every time you append to a StringBuffer or call setLength, this method is called:
public synchronized void ensureCapacity(int minimumCapacity) {
if (minimumCapacity > value.length) {
expandCapacity(minimumCapacity);
}
}
StringBuffer calls at several points the method expandCapacity. If it wouldn't oversize the capacity, it would have to allocate a new array, everytime you changes the Stringbuffers value. So this is some kind of performance optimization.
From the manual:
ensureCapacity
public void ensureCapacity(int minimumCapacity)
Ensures that the capacity is at least equal to the specified minimum.
If the current capacity is less than the argument, then a new internal
array is allocated with greater capacity. The new capacity is the
larger of:
* The minimumCapacity argument.
* Twice the old capacity, plus 2.
If the minimumCapacity argument is nonpositive, this method takes no
action and simply returns.
Parameters:
minimumCapacity - the minimum desired capacity.
A call to setLength(60) will cause ensureCapacity(60) to be called1.
ensureCapacity relies on "array doubling" which means that it will (at least) double the capacity each time it needs to be increased. The precise definition is documented in the Java Doc for ensureCapacity:
Ensures that the capacity is at least equal to the specified minimum. If the current capacity is less than the argument, then a new internal array is allocated with greater capacity. The new capacity is the larger of:
The minimumCapacity argument.
Twice the old capacity, plus 2.
If the minimumCapacity argument is nonpositive, this method takes no action and simply returns.
In your particular case, the second expression (in bold) is larger than the requested capacity, so this will be used. Since 2*55 + 2 equals 112, that's what the new capacity will be.
Related question:
Why is vector array doubled?
1) Actually, it will call extendCapacity but that behaves the same as ensure capacity.
This is a case of "read the free manual". From the Javadoc for StringBuffer -
public StringBuffer(String str)
Constructs a string buffer initialized to the contents of the specified string. The
initial capacity of the string buffer is 16 plus the length of the string argument.
which explains why it's initially 55. Then
public void ensureCapacity(int minimumCapacity)
Ensures that the capacity is at least equal to the specified minimum.
If the current capacity is less than the argument, then a new internal
array is allocated with greater capacity. The new capacity is the
larger of:
•The minimumCapacity argument.
•Twice the old capacity, plus 2.
If the minimumCapacity argument is
nonpositive, this method takes no action and simply returns.
explains why it changes to 112.
public synchronized void setLength(int newLength) {
super.setLength(newLength);
}
in super:
public void setLength(int newLength) {
if (newLength < 0)
throw new StringIndexOutOfBoundsException(newLength);
ensureCapacityInternal(newLength);
....
Then:
private void ensureCapacityInternal(int minimumCapacity) {
// overflow-conscious code
if (minimumCapacity - value.length > 0)
expandCapacity(minimumCapacity);
....
And finally:
void expandCapacity(int minimumCapacity) {
int newCapacity = value.length * 2 + 2;
....

Performance wise, is it better to call the length of a given array every time or store the length in a variable and call that variable every time?

I call on the length of a given array a lot and I was wondering if it is better to keep calling it numerous times (50+ times currently, but it keeps growing) or is it better to just store the length in an integer and use that integer every time.
If I am unclear in what I am saying, consider the following:
I have a String array:
String[] str = new String[500]; //The length is actually dynamic, not static
Of course, I put some values into it, but I call on the length of the string all the time throughout my application:
int a = str.length;
int b = str.length;
int c = str.length;
int d = str.length;
int e = str.length;
and so on...so is it better to do this: (performance wise, don't care about memory as much)
int length = str.length;
int a = length;
int b = length;
int c = length;
int d = length;
int e = length;
Thanks.
There is no difference. You don't call a method when accessing the length of the array. You just read an internal field. Even if you called a method, the difference would be negligible with current JVMs.
array.length is accessing a field in the array object, so it should not have any impact on performance.
Which do you feel would make the code clearer?
The JVM is likely to optimise the code to do the same thing. Even if it didn't the difference is likely to be less than 1 nano-second.

Categories