What happens when HashMap or HashSet maximum capacity is reached? - java

Just a few minutes back I answered a question asking about the "Maximum possible size of HashMap in Java". As I have always read, HashMap is a growable data-structure. It's size is only limited by the JVM memory size. Hence I thought that there is no hard limit to its size and answered accordingly. (The same is applicable to HashSet as well.)
But someone corrected me saying that since the size() method of HashMap returns an int, there is a limit on its size. A perfectly correct point. I just tried to test it on my local but failed, I need more than 8GB memory to insert more than 2,147,483,647 integers in the HashMap, which I don't have.
My questions were:
What happens when we try to insert 2,147,483,647 + 1 element in the
HashMap/HashSet?
Is there an error thrown?
If yes, which error? If not what happens to the HashMap/HashSet, its already
existing elements and the new element?
If someone is blessed with access to a machine with say 16GB memory, you can try it out practically. :)

The underlying capacity of the array has to be a power of 2 (which is limited to 2^30) When this size is reached the load factor is effectively ignored and array stops growing.
At this point the rate of collisions increases.
Given the hashCode() only has 32-bits it wouldn't make sense to grow much big that this in any case.
/**
* Rehashes the contents of this map into a new array with a
* larger capacity. This method is called automatically when the
* number of keys in this map reaches its threshold.
*
* If current capacity is MAXIMUM_CAPACITY, this method does not
* resize the map, but sets threshold to Integer.MAX_VALUE.
* This has the effect of preventing future calls.
*
* #param newCapacity the new capacity, MUST be a power of two;
* must be greater than current capacity unless current
* capacity is MAXIMUM_CAPACITY (in which case value
* is irrelevant).
*/
void resize(int newCapacity) {
Entry[] oldTable = table;
int oldCapacity = oldTable.length;
if (oldCapacity == MAXIMUM_CAPACITY) {
threshold = Integer.MAX_VALUE;
return;
}
Entry[] newTable = new Entry[newCapacity];
transfer(newTable);
table = newTable;
threshold = (int)(newCapacity * loadFactor);
}
When the size exceeds Integer.MAX_VALUE it overflows.
void addEntry(int hash, K key, V value, int bucketIndex) {
Entry<K,V> e = table[bucketIndex];
table[bucketIndex] = new Entry<K,V>(hash, key, value, e);
if (size++ >= threshold)
resize(2 * table.length);
}

Related

Java HashMap size allocated

Java Hash Map has a size() method,
which reflects how many elements are set int the Hash Map.
I am interested to know what is the actual size of the Hash Map.
I tried different methods but can't find the correct one.
I set the initial Capacity to 16
HashMap hm = new HashMap(16);
for(int i=0;i<100;++i){
System.out.println(hm.size());
UUID uuid = UUID.randomUUID();
hm.pet(uuid ,null);
}
when i will add values this size can increase, how can i check the size that is actually allocated?
what is the actual size of the Hash Map
I'm assuming you are asking about the capacity. The capacity is the length of the array holding the buckets of the HashMaps. The initial capacity is 16 by default.
The capacity method is not public, but you can calculate the current capacity based on the current size, the initial capacity and the load factor.
If you use the defaults (for example, when you create the HashMap with the parameter-less constructor), the initial capacity is 16, and the default load factor is 0.75. This means the capacity will be doubled to 32 once the size reaches 16 * 0.75 == 12. It will be doubled to 64 once the size reaches 32 * 0.75 == 24.
If you pass different initial capacity and/or load factor to the constructor, the calculation will be affected accordingly.
You can use Reflection to check actual allocated size (bucket size) of the map.
HashMap<String, Integer> m = new HashMap<>();
m.put("Abhi", 101);
m.put("John", 102);
System.out.println(m.size()); // This will print 2
Field tableField = HashMap.class.getDeclaredField("table");
tableField.setAccessible(true);
Object[] table = (Object[]) tableField.get(m);
System.out.println(table.length); // This will print 16

Difference between Stack.capacity() and Stack.size()

I am currently doing a check with the Stack<E> class to see if it's full. However, List does not have a isFull() implementation, so I am asking to check if capacity() is the same as size(). According to the docs, size() returns the number of components in this vector, and capacity returns the current capacity of the vector. If I understand correctly, are they the same? And if so, how do I go about checking if my Stack<E> is full?
stack.size() - gives the current size i.e., total number of elements pushed to the stack
stack.capacity() - gives the current capacity i.e., array size like 10 or 20 etc... i.e., as soon as you pushes 10 elements to the stack, your stack capacity gets doubled.
Internally Stack uses Vector and Vector is a dynamic growing array.
Also, for a Stack, you can't manually set the capacityIncrement factor, rather the stack itself manages internally, you can look here
The Stack datastructure in Java represents a last-in-first out (LIFO) stack of objects. It extends class Vector with five operation such as
push
pop
peek item at the top of the stack
Check stack is empty and
search for an item in the stack
when the Stack classs would be like as follows
public class Stack extends Vector {
}
When the stack is created it contains no items. Coming to stack capacity and size
Size - Number of elements a stack contains at present
Capacity - Number of elements it is capable of holding
The Push operation is implemented as follows
public E push(E item) {
addElement(item);
return item;
}
addElement method belongs to Vector class which helps to insert a new element into the Vector
public synchronized void addElement(E obj) {
modCount++;
ensureCapacityHelper(elementCount + 1);
elementData[elementCount++] = obj;
}
ensureCapacityHelper allows to check whether the Vector inside is capable of adding a new element or not. If it does not have enough space to hold the new element the Vector grows
private void ensureCapacityHelper(int minCapacity) {
// overflow-conscious code
if (minCapacity - elementData.length > 0)
grow(minCapacity);
}
/**
* The maximum size of array to allocate.
* Some VMs reserve some header words in an array.
* Attempts to allocate larger arrays may result in
* OutOfMemoryError: Requested array size exceeds VM limit
*/
private static final int MAX_ARRAY_SIZE = Integer.MAX_VALUE - 8;
private void grow(int minCapacity) {
// overflow-conscious code
int oldCapacity = elementData.length;
int newCapacity = oldCapacity + ((capacityIncrement > 0) ?
capacityIncrement : oldCapacity);
if (newCapacity - minCapacity < 0)
newCapacity = minCapacity;
if (newCapacity - MAX_ARRAY_SIZE > 0)
newCapacity = hugeCapacity(minCapacity);
elementData = Arrays.copyOf(elementData, newCapacity);
}
Arrays.copyOf is a native method would allocate a new memory space with newCapacity and copies the data from old memory location to new location.
The size is the current number of elements in the stack.
The capacity is an internal detail that tells you the maximum items that would fit in the Vector. However, this is not really relevant as it will expand automatically when the capacity is reached.

How to increase ArrayList size to 100% just like Vector

While changing some code based on SonarQube suggestions I get to know below lines:
Automatic Increase in Capacity A Vector defaults to doubling size of its array . While when you insert an element into the ArrayList ,it
increases its Array size by 50%.
Now I am wondering if I need to replace the Vector with ArrayList there is a chance of failure of normal execution of the code.
Remember existing Vector is not doing any Thead-safe work.
Question:
Is ArrayList capable enough to resize just like vector?
Is it safe to replace the Vector with ArrayList in any condition except Synchronization??
Is there any exact replacement of Vector (Not expecting the Thread-safety)
Please feel free to update the question or ask anything.
The differences between Vector and ArrayList are some like this:
Vector is synchronized while ArrayList is not synchronized. So, Vector is thread safe.
Vector is slow as it is thread safe . In comparison ArrayList is fast as it is non-synchronized.
A Vector grows as doubling size of its array in default. While when you insert an element into the ArrayList, it increases its Array size by 50% .
ArrayList:
/**
* Increases the capacity to ensure that it can hold at least the
* number of elements specified by the minimum capacity argument.
*
* #param minCapacity the desired minimum capacity
*/
private void grow(int minCapacity) {
// overflow-conscious code
int oldCapacity = elementData.length;
int newCapacity = oldCapacity + (oldCapacity >> 1); // 50%
if (newCapacity - minCapacity < 0)
newCapacity = minCapacity;
if (newCapacity - MAX_ARRAY_SIZE > 0)
newCapacity = hugeCapacity(minCapacity);
// minCapacity is usually close to size, so this is a win:
elementData = Arrays.copyOf(elementData, newCapacity);
}
Vector:
private void grow(int minCapacity) {
// overflow-conscious code
int oldCapacity = elementData.length;
int newCapacity = oldCapacity + ((capacityIncrement > 0) ?
capacityIncrement : oldCapacity); // default 100%
if (newCapacity - minCapacity < 0)
newCapacity = minCapacity;
if (newCapacity - MAX_ARRAY_SIZE > 0)
newCapacity = hugeCapacity(minCapacity);
elementData = Arrays.copyOf(elementData, newCapacity);
}
ArrayList does not define the increment size . Vector defines the increment size .
/**
* The amount by which the capacity of the vector is automatically
* incremented when its size becomes greater than its capacity. If
* the capacity increment is less than or equal to zero, the capacity
* of the vector is doubled each time it needs to grow.
*
* #serial
*/
protected int capacityIncrement;
Based above:
ArrayList capable can not resize just like Vector.
ArrayList is not thread safe. It can not replace Vector by ArrayList in multiple threads directly.
It can replace Vector by ArrayList in single thread mostly. Because The declaration of Vector and ArrayList:
public class Vector<E>
extends AbstractList<E>
implements List<E>, RandomAccess, Cloneable, java.io.Serializable
public class ArrayList<E>
extends AbstractList<E>
implements List<E>, RandomAccess, Cloneable, java.io.Serializable
I don’t see a problem. The exact performance metrics of Vector and ArrayList are not the same, but for most practical purposes this is not important. The ArrayList will extend whenever needed, and more often than Vector (if you don’t tell it the needed capacity beforehand). Go ahead.
For your questions: 1. Yes 2. Yes 3. No

Java Hashmap Internal

I have few doubts about the Java HashMap class. It is my understanding that
transient Entry[] table;
the table array is going to hold the data based on the value of hashCode(). I need to know when this array gets initialized. Is the array length based on the capacity we define during the HashMap's initialization or the default capacity of 16 if it is not defined when calling the constructor?
How is the hashcode scaled to the array index? For example, if the hashcode has a huge value, how it is scaled to array index like 10, 20?
I have read that when the threshold value is reached, rehashing will occur. For example, in the default case, when 16 is the capacity and 0.75 is the load factor, then the threshold value is 16*0.75=12. Once the 12 items are added rehashing will occur and capacity will increase. Does this mean that the table array size gets increased?
since your post has many questions I'm going to enumerate your questions as part of my answer. Also, please note that I'm going off HashMap's source code for Java 1.8 b132 for my answers.
Q: I need to know when this array gets initialized.
A: The table array only gets initialized when data is first entered into the map (e.g. a put() method call). It does not happen as part of the instantiation of the map, itself, unless the copy constructor is called, or the map is being deserialized into an object.
Q: Is the array length based on the capacity we define during the HashMap's initialization or the default capacity of 16 if it is not defined when calling the constructor?
A: Correct, the table array's length is based on the initial capacity your pass to the constructor. When the initial capacity is not specified and the default constructor is called, the default capacity is used.
Q: How is the hashcode scaled to the array index?
A: For the actual code that does this, itself, see the implementation of the putVal() method. Basically what happens is that the code takes the very large hash value and performs a bitwise-AND with the last element index of the table. That effectively randomizes the position of the key/value pair with the table array. For example, if the hash value is 333 (101001101 in base 2) and the table array size is 32 (100000), the last element index would be 31 (11111). Thus the index chosen would be 11111 & 101001101 == 01101 == 13.
Q: I have read that when the threshold value is reached, rehashing will occur. ... Does this mean that the table array size gets increased?
A: More or less, yes. When the threshold is exceeded, the table is resized. Note that by resizing, the existing table array isn't modified. Rather, a new table array is created with the twice the capacity of the first table array. For details, see the implementation of the resize() method.
public HashMap(int initialCapacity, float loadFactor) {
if (initialCapacity < 0)
throw new IllegalArgumentException("Illegal initial capacity: " +
initialCapacity);
if (initialCapacity > MAXIMUM_CAPACITY)
initialCapacity = MAXIMUM_CAPACITY;
if (loadFactor <= 0 || Float.isNaN(loadFactor))
throw new IllegalArgumentException("Illegal load factor: " +
loadFactor);
// Find a power of 2 >= initialCapacity
int capacity = 1;
while (capacity < initialCapacity)
capacity <<= 1;
this.loadFactor = loadFactor;
threshold = (int)(capacity * loadFactor);
table = new Entry[capacity];
init();
}
Above code block explains how and when you populate the table.
Once the rehashing occurs it doesn't increase the table array size as you can declare array size once for ever; It creates a new array every time with the updated size:
void resize(int newCapacity) {
Entry[] oldTable = table;
int oldCapacity = oldTable.length;
if (oldCapacity == MAXIMUM_CAPACITY) {
threshold = Integer.MAX_VALUE;
return;
}
Entry[] newTable = new Entry[newCapacity];
transfer(newTable);
table = newTable;
threshold = (int)(newCapacity * loadFactor);
}

Why does the capacity change to 112 in the following example?

In the following code...
StringBuffer buf = new StringBuffer("Is is a far, far better thing that i do");
System.out.println("buf = "+ buf);
System.out.println("buf.length() = " + buf.length());
System.out.println("buf.capacity() = " + buf.capacity());
buf.setLength(60);
System.out.println("buf = "+ buf);
System.out.println("buf.length() = " + buf.length());
System.out.println("buf.capacity() = " + buf.capacity());
buf.setLength(30);
System.out.println("buf = "+ buf);
System.out.println("buf.length() = " + buf.length());
System.out.println("buf.capacity() = " + buf.capacity());
... the output is:
buf = Is is a far, far better thing that i do
buf.length() = 39
buf.capacity() = 55
buf = Is is a far, far better thing that i do
buf.length() = 60
buf.capacity() = 112
buf = Is is a far, far better thing
buf.length() = 30
buf.capacity() = 112
Consider how StringBuffer is typically used. When the String we need to store in a StringBuffer exceeds the current capacity, the current capacity is increased. If the algorithm only increased the capacity to the required amount, then StringBuffer would be very inefficient.
For example:
buf.append(someText);
buf.append(someMoreText);
buf.append(Another100Chars);
might require that the capacity be increased three times in a row. Every time the capacity is increased, the underlying data structure (an array) needs to be re-allocated in memory, which involves allocating more RAM from the heap, copying the existing data, and then eventually garbage collecting the previously allocated memory. To reduce the frequency of this happening, StringBuffer will double its capacity when needed. The algorithm moves the capacity from n to 2n+2. Here is the source code from AbstraceStringBuilder where this method is implemented:
/**
* This implements the expansion semantics of ensureCapacity with no
* size check or synchronization.
*/
void expandCapacity(int minimumCapacity) {
int newCapacity = value.length * 2 + 2;
if (newCapacity - minimumCapacity < 0)
newCapacity = minimumCapacity;
if (newCapacity < 0) {
if (minimumCapacity < 0) // overflow
throw new OutOfMemoryError();
newCapacity = Integer.MAX_VALUE;
}
value = Arrays.copyOf(value, newCapacity);
}
Every time you append to a StringBuffer or call setLength, this method is called:
public synchronized void ensureCapacity(int minimumCapacity) {
if (minimumCapacity > value.length) {
expandCapacity(minimumCapacity);
}
}
StringBuffer calls at several points the method expandCapacity. If it wouldn't oversize the capacity, it would have to allocate a new array, everytime you changes the Stringbuffers value. So this is some kind of performance optimization.
From the manual:
ensureCapacity
public void ensureCapacity(int minimumCapacity)
Ensures that the capacity is at least equal to the specified minimum.
If the current capacity is less than the argument, then a new internal
array is allocated with greater capacity. The new capacity is the
larger of:
* The minimumCapacity argument.
* Twice the old capacity, plus 2.
If the minimumCapacity argument is nonpositive, this method takes no
action and simply returns.
Parameters:
minimumCapacity - the minimum desired capacity.
A call to setLength(60) will cause ensureCapacity(60) to be called1.
ensureCapacity relies on "array doubling" which means that it will (at least) double the capacity each time it needs to be increased. The precise definition is documented in the Java Doc for ensureCapacity:
Ensures that the capacity is at least equal to the specified minimum. If the current capacity is less than the argument, then a new internal array is allocated with greater capacity. The new capacity is the larger of:
The minimumCapacity argument.
Twice the old capacity, plus 2.
If the minimumCapacity argument is nonpositive, this method takes no action and simply returns.
In your particular case, the second expression (in bold) is larger than the requested capacity, so this will be used. Since 2*55 + 2 equals 112, that's what the new capacity will be.
Related question:
Why is vector array doubled?
1) Actually, it will call extendCapacity but that behaves the same as ensure capacity.
This is a case of "read the free manual". From the Javadoc for StringBuffer -
public StringBuffer(String str)
Constructs a string buffer initialized to the contents of the specified string. The
initial capacity of the string buffer is 16 plus the length of the string argument.
which explains why it's initially 55. Then
public void ensureCapacity(int minimumCapacity)
Ensures that the capacity is at least equal to the specified minimum.
If the current capacity is less than the argument, then a new internal
array is allocated with greater capacity. The new capacity is the
larger of:
•The minimumCapacity argument.
•Twice the old capacity, plus 2.
If the minimumCapacity argument is
nonpositive, this method takes no action and simply returns.
explains why it changes to 112.
public synchronized void setLength(int newLength) {
super.setLength(newLength);
}
in super:
public void setLength(int newLength) {
if (newLength < 0)
throw new StringIndexOutOfBoundsException(newLength);
ensureCapacityInternal(newLength);
....
Then:
private void ensureCapacityInternal(int minimumCapacity) {
// overflow-conscious code
if (minimumCapacity - value.length > 0)
expandCapacity(minimumCapacity);
....
And finally:
void expandCapacity(int minimumCapacity) {
int newCapacity = value.length * 2 + 2;
....

Categories