Java Hashmap Internal - java

I have few doubts about the Java HashMap class. It is my understanding that
transient Entry[] table;
the table array is going to hold the data based on the value of hashCode(). I need to know when this array gets initialized. Is the array length based on the capacity we define during the HashMap's initialization or the default capacity of 16 if it is not defined when calling the constructor?
How is the hashcode scaled to the array index? For example, if the hashcode has a huge value, how it is scaled to array index like 10, 20?
I have read that when the threshold value is reached, rehashing will occur. For example, in the default case, when 16 is the capacity and 0.75 is the load factor, then the threshold value is 16*0.75=12. Once the 12 items are added rehashing will occur and capacity will increase. Does this mean that the table array size gets increased?

since your post has many questions I'm going to enumerate your questions as part of my answer. Also, please note that I'm going off HashMap's source code for Java 1.8 b132 for my answers.
Q: I need to know when this array gets initialized.
A: The table array only gets initialized when data is first entered into the map (e.g. a put() method call). It does not happen as part of the instantiation of the map, itself, unless the copy constructor is called, or the map is being deserialized into an object.
Q: Is the array length based on the capacity we define during the HashMap's initialization or the default capacity of 16 if it is not defined when calling the constructor?
A: Correct, the table array's length is based on the initial capacity your pass to the constructor. When the initial capacity is not specified and the default constructor is called, the default capacity is used.
Q: How is the hashcode scaled to the array index?
A: For the actual code that does this, itself, see the implementation of the putVal() method. Basically what happens is that the code takes the very large hash value and performs a bitwise-AND with the last element index of the table. That effectively randomizes the position of the key/value pair with the table array. For example, if the hash value is 333 (101001101 in base 2) and the table array size is 32 (100000), the last element index would be 31 (11111). Thus the index chosen would be 11111 & 101001101 == 01101 == 13.
Q: I have read that when the threshold value is reached, rehashing will occur. ... Does this mean that the table array size gets increased?
A: More or less, yes. When the threshold is exceeded, the table is resized. Note that by resizing, the existing table array isn't modified. Rather, a new table array is created with the twice the capacity of the first table array. For details, see the implementation of the resize() method.

public HashMap(int initialCapacity, float loadFactor) {
if (initialCapacity < 0)
throw new IllegalArgumentException("Illegal initial capacity: " +
initialCapacity);
if (initialCapacity > MAXIMUM_CAPACITY)
initialCapacity = MAXIMUM_CAPACITY;
if (loadFactor <= 0 || Float.isNaN(loadFactor))
throw new IllegalArgumentException("Illegal load factor: " +
loadFactor);
// Find a power of 2 >= initialCapacity
int capacity = 1;
while (capacity < initialCapacity)
capacity <<= 1;
this.loadFactor = loadFactor;
threshold = (int)(capacity * loadFactor);
table = new Entry[capacity];
init();
}
Above code block explains how and when you populate the table.
Once the rehashing occurs it doesn't increase the table array size as you can declare array size once for ever; It creates a new array every time with the updated size:
void resize(int newCapacity) {
Entry[] oldTable = table;
int oldCapacity = oldTable.length;
if (oldCapacity == MAXIMUM_CAPACITY) {
threshold = Integer.MAX_VALUE;
return;
}
Entry[] newTable = new Entry[newCapacity];
transfer(newTable);
table = newTable;
threshold = (int)(newCapacity * loadFactor);
}

Related

Java HashMap size allocated

Java Hash Map has a size() method,
which reflects how many elements are set int the Hash Map.
I am interested to know what is the actual size of the Hash Map.
I tried different methods but can't find the correct one.
I set the initial Capacity to 16
HashMap hm = new HashMap(16);
for(int i=0;i<100;++i){
System.out.println(hm.size());
UUID uuid = UUID.randomUUID();
hm.pet(uuid ,null);
}
when i will add values this size can increase, how can i check the size that is actually allocated?
what is the actual size of the Hash Map
I'm assuming you are asking about the capacity. The capacity is the length of the array holding the buckets of the HashMaps. The initial capacity is 16 by default.
The capacity method is not public, but you can calculate the current capacity based on the current size, the initial capacity and the load factor.
If you use the defaults (for example, when you create the HashMap with the parameter-less constructor), the initial capacity is 16, and the default load factor is 0.75. This means the capacity will be doubled to 32 once the size reaches 16 * 0.75 == 12. It will be doubled to 64 once the size reaches 32 * 0.75 == 24.
If you pass different initial capacity and/or load factor to the constructor, the calculation will be affected accordingly.
You can use Reflection to check actual allocated size (bucket size) of the map.
HashMap<String, Integer> m = new HashMap<>();
m.put("Abhi", 101);
m.put("John", 102);
System.out.println(m.size()); // This will print 2
Field tableField = HashMap.class.getDeclaredField("table");
tableField.setAccessible(true);
Object[] table = (Object[]) tableField.get(m);
System.out.println(table.length); // This will print 16

Load factor of Arraylist and Vector?

Hi I was trying to find the load factor of Array list and vector but I was not able to find it. I know load factor of HashMap and other Map is 0.75. Can any one help to find me how to check the load factor of Vector and Arraylist.
ArrayList:
Initial Capacity:10
Load Factor:1 (when the list is full)
Growth Rate: current_size + current_size/2
Vector:
Initial Capacity:10
Load Factor:1 (when the list is full)
Growth Rate:
current_size * 2 (if capacityIncrement is not defined)
current_size + capacityIncrement (if capacityIncrement is defined during vector initialization)
I assume you would like to know how ArrayList and Vector increase its size.
For ArrayList, every time you put an element into it, it will check if the nested array needs to be enlarge its size. If yes, generally, its size will grow with:
newCapacity = oldCapacity + (oldCapacity >> 1);
For some special case, for example, add many or huge number of elements, things will be different. Please refer grow(int minCapacity) function in java.util.ArrayList source code.
Regarding Vector, generally, its size will grow with:
newCapacity = oldCapacity + ((capacityIncrement > 0) ?
capacityIncrement : oldCapacity);
For some special cases, please refer grow(int minCapacity) in java.util.Vector.
ArrayList al = new ArrayList();
for(int i=0; i<=10; i++){
al.add(i+1);
}
default capacity = 10
in the above example, we want to add 11 elements so new Capacity of ArrayList is
int newCapacity = (oldcapacity*3)/2+1
(10*3)/2+1 = 16

Queue has more items that I put in it

In my Java program, I initialized a Queue with all numbers from 0 to 1000.
emptyFrames = new PriorityQueue<Integer>();
for (int i = 0; i < 1000; i++) {
emptyFrames.add(i);
}
System.out.println("Debug");
However, when I go in to debug, there are 1155 items in the Queue.
Why is this happening?
The indices greater than 1000 are related to the queue's capacity, rather than its size.
Internally, PriorityQueue is backed by an array of objects. When adding objects to a queue with a full backing array, the queue will expand the array by a moderate amount by calling grow, so that it will have internal space (capacity) available for future add calls. This avoids the queue having to expand its array every time add is called, which would be horribly inefficient.
private void grow(int minCapacity) {
int oldCapacity = queue.length;
// Double size if small; else grow by 50%
int newCapacity = oldCapacity + ((oldCapacity < 64) ?
(oldCapacity + 2) :
(oldCapacity >> 1));
// overflow-conscious code
if (newCapacity - MAX_ARRAY_SIZE > 0)
newCapacity = hugeCapacity(minCapacity);
queue = Arrays.copyOf(queue, newCapacity);
}
Code retrieved from Docjar.
The PriorityQueue internally resizes itself according to its capacity to hold more elements. This is a common feature of collections.
From Java:
A priority queue is unbounded, but has an internal capacity governing the size of an array used to store the elements on the queue. It is always at least as large as the queue size. As elements are added to a priority queue, its capacity grows automatically. The details of the growth policy are not specified.
You're looking at two different pieces of information.
First, the formal size of your queue is 1,000 - there are only 1,000 elements in it. You can verify this with emptyFrames.size().
Second, it appears that Eclipse is showing you the backing array, which is not a good reflection of the total number of elements currently present in the queue. That array's size will fluctuate based on its internal resizing rules.
In this scenario, the backing array isn't something you should trust; only inspect the size() of the collection instead.

What happens when HashMap or HashSet maximum capacity is reached?

Just a few minutes back I answered a question asking about the "Maximum possible size of HashMap in Java". As I have always read, HashMap is a growable data-structure. It's size is only limited by the JVM memory size. Hence I thought that there is no hard limit to its size and answered accordingly. (The same is applicable to HashSet as well.)
But someone corrected me saying that since the size() method of HashMap returns an int, there is a limit on its size. A perfectly correct point. I just tried to test it on my local but failed, I need more than 8GB memory to insert more than 2,147,483,647 integers in the HashMap, which I don't have.
My questions were:
What happens when we try to insert 2,147,483,647 + 1 element in the
HashMap/HashSet?
Is there an error thrown?
If yes, which error? If not what happens to the HashMap/HashSet, its already
existing elements and the new element?
If someone is blessed with access to a machine with say 16GB memory, you can try it out practically. :)
The underlying capacity of the array has to be a power of 2 (which is limited to 2^30) When this size is reached the load factor is effectively ignored and array stops growing.
At this point the rate of collisions increases.
Given the hashCode() only has 32-bits it wouldn't make sense to grow much big that this in any case.
/**
* Rehashes the contents of this map into a new array with a
* larger capacity. This method is called automatically when the
* number of keys in this map reaches its threshold.
*
* If current capacity is MAXIMUM_CAPACITY, this method does not
* resize the map, but sets threshold to Integer.MAX_VALUE.
* This has the effect of preventing future calls.
*
* #param newCapacity the new capacity, MUST be a power of two;
* must be greater than current capacity unless current
* capacity is MAXIMUM_CAPACITY (in which case value
* is irrelevant).
*/
void resize(int newCapacity) {
Entry[] oldTable = table;
int oldCapacity = oldTable.length;
if (oldCapacity == MAXIMUM_CAPACITY) {
threshold = Integer.MAX_VALUE;
return;
}
Entry[] newTable = new Entry[newCapacity];
transfer(newTable);
table = newTable;
threshold = (int)(newCapacity * loadFactor);
}
When the size exceeds Integer.MAX_VALUE it overflows.
void addEntry(int hash, K key, V value, int bucketIndex) {
Entry<K,V> e = table[bucketIndex];
table[bucketIndex] = new Entry<K,V>(hash, key, value, e);
if (size++ >= threshold)
resize(2 * table.length);
}

Why does this code throw an IndexOfOutBoundsException - aka, what's up with ensureCapacity()?

Consider the following two snippets of code:
int index = 676;
List<String> strings = new ArrayList<String>();
strings.add(index, "foo");
and
int index = 676;
List<String> strings = new ArrayList<String>();
strings.ensureCapacity(index);
strings.add(index, "foo");
In the first case, I'm not surprised to see an IndexOfOutBoundsException. According to the API, add(int index, E element) will throw an IndexOfOutBoundsException "if the index is out of range (index < 0 || index > size())". The size of strings is 0 before any elements have been added, so index will definitely be larger than the ArrayList's size.
However, in the second case, I would expect the call to ensureCapacity to grow strings such that the call to add would correctly insert the string "foo" at index 676 - but it doesn't.
Why not?
What should I do so that add(index, "foo") works for index > strings.size()?
The capacity of the underlying array in an ArrayList is distinct from the higher-level List API methods (add, remove, etc.), and only speaks to the size of the backing array. If you want to allow adding elements beyond the list bounds, you'll need to code that yourself (or find a collection that does it for you) in a utility class, populating nulls, empty objects, or whatever your application expects between the new index and the old size.
ArrayList.ensureCapacity() does not change the actual size of the list (which is returned by size()), but rather reallocate the internal buffer such that it will not need to reallocate the buffer to grow to this size (when you call list.add(object).
/**
* Increases the capacity of this <tt>ArrayList</tt> instance, if
* necessary, to ensure that it can hold at least the number of elements
* specified by the minimum capacity argument.
*/
Taking a wild guess, I think what you're looking for is
Integer index = Integer.valueOf(676);
Map<Integer,String> strings = new HashMap<Integer,String>();
strings.put(index, "foo");
your length is 676, but you have to remember that they are zero based, so in reality, you'd want index -1 would be your max number.

Categories