Hashcode for NULL key in HashMap

Hashcode for NULL key in HashMap - java

I was just reading about the difference between HashMap and HashTable class in java. There I found a difference that former allow null key and later doesn't privileges for the same.
As far as the working of HashMap is concern I know that, it calls hashcode method on key for finding the bucket in which that key value pair is to be placed. Here comes my question:
How hashcode for a null value is computed or Is there any default value for hashcode of null key (if so please specify the value)?

from HashMap:
public V put(K key, V value) {
if (key == null)
return putForNullKey(value);
...
and if you look further you will see that null always goes to bin 0

From the source code of HashMap, if the key is null it is handled differently. There is no hashcode generated for null, but it is uniquely stored at index 0 in an internal array with hash value 0. Also note that hash value of an empty string also is 0(in case keys are strings), but the index where it is stored in the internal array ensures that they are not mixed up.
/**
* Offloaded version of put for null keys
*/
private V putForNullKey(V value) {
for (Entry<K,V> e = table[0]; e != null; e = e.next) {
if (e.key == null) {
V oldValue = e.value;
e.value = value;
e.recordAccess(this);
return oldValue;
}
}
modCount++;
addEntry(0, null, value, 0);
return null;
}

If you read description of static int hash(int h) method in HashMap you will find that null keys have index 0.

When a null value is existing in the map the key of that value is also null. you can not have many null keys in a map. Only one null key.

It clearly states what happens when you do a put with a key which was already in the map. The specific case of key == null behaves in the same way: you can't have two different mappings for the null key (just like you can't for any other key). It's not a special case, for the context of your question.

Internally Hashmap have a nullcheck for key. If it is null then it will return 0 else hash value of key.
where as Hastable doesn't have any null check and it will call directly hashcode method on key
that's why Hashtable won't accepts null.

Related

Why java's HashMap recheck hashcode inside bucket

When HashMap is searching for key it use key's hashcode in 2 places:
to choose bucket
to find entry inside bucket (openjdk7 HashMap get method source)
public V get(Object key) {
if (key == null)
return getForNullKey();
int hash = hash(key.hashCode());
for (Entry e = table[indexFor(hash, table.length)];
e != null;
e = e.next) {
Object k;
if (e.hash == hash && ((k = e.key) == key || key.equals(k)))
return e.value;
}
return null;
}
Why HashMap is checking hashcode inside bucket? Why is it not enough to check only references and objects equality inside bucket?

Comparing hash codes (that were already computed, so no need to call the hashCode() method again), which is int comparison, will often be cheaper than calling equals.
Since a bucket may contain keys having different hash codes (for example, in a HashMap with 16 buckets, hash codes 1 and 17 will be mapped to the same bucket), comparing hash codes first may save the need to run equals (when the hash codes are not equal to each other).
This is similar to the optimization that checks reference equality ((k = e.key) == key) before calling equals.

Java HashMap resizing

Let's assume we have some code
class WrongHashCode{
public int code=0;
#Override
public int hashCode(){
return code;
}
}
public class Rehashing {
public static void main(String[] args) {
//Initial capacity is 2 and load factor 75%
HashMap<WrongHashCode,String> hashMap=new HashMap<>(2,0.75f);
WrongHashCode wrongHashCode=new WrongHashCode();
//put object to be lost
hashMap.put(wrongHashCode,"Test1");
//Change hashcode of same Key object
wrongHashCode.code++;
//Resizing hashMap involved 'cause load factor barrier
hashMap.put(wrongHashCode,"Test2");
//Always 2
System.out.println("Keys count " + hashMap.keySet().size());
}
}
So, my question is why after resizing hashMap (that, as far, as I understand involves rehashing keys), we still have 2 keys in keySet instead of 1 (since key object is same for both existing KV pairs) ?

So, my question is why after resizing hashMap (that, as far, as I understand involves rehashing keys)
It actually does not involve rehashing keys – at least not in the HashMap code except in certain circumstances (see below). It involves repositioning them in the map buckets. Inside of HashMap is a Entry class which has the following fields:
final K key;
V value;
Entry<K,V> next;
int hash;
The hash field is the stored hashcode for the key that is calculated when the put(...) call is made. This means that if you change the hashcode in your object it will not affect the entry in the HashMap unless you re-put it into the map. Of course if you change the hashcode for a key you won't be even able to find it in the HashMap because it has a different hashcode as the stored hash entry.
we still have 2 keys in keySet instead of 1 (since key object is same for both existing KV pairs) ?
So even though you've changed the hash for the single object, it is in the map with 2 entries with different hash fields in it.
All that said, there is code inside of HashMap which may rehash the keys when a HashMap is resized – see the package protected HashMap.transfer(...) method in jdk 7 (at least). This is why the hash field above is not final. It is only used however when initHashSeedAsNeeded(...) returns true to use "alternative hashing". The following sets the threshold of number of entries where the alt-hashing is enabled:
-Djdk.map.althashing.threshold=1
With this set on the VM, I'm actually able to get the hashcode() to be called again when the resizing happens but I'm not able to get the 2nd put(...) to be seen as an overwrite. Part of the problem is that the HashMap.hash(...) method is doing an XOR with the internal hashseed which is changed when the resizing happens, but after the put(...) records the new hash code for the incoming entry.

The HashMap actually caches the hashCode for each key (as a key's hashCode may be expensive to compute). So, although you changed the hashCode for an existing key, the Entry to which it is linked in the HashMap still has the old code (and hence gets put in the "wrong" bucket after resize).
You can see this for yourself in the jvm code for HashMap.resize() (or a little easier to see in the java 6 code HashMap.transfer()).

I can't tell why two of the answers rely on HashMap.tranfer for some example, when that method is not present in java-8 at all. As such I will provide my small input taking java-8 in consideration.
Entries in a HashMap are indeed re-hashed, but not in the sense you might think they do. A re-hash is basically re-computing the already provided (by you) of the Key#hashcode; there is a method for that:
static final int hash(Object key) {
int h;
return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}
So basically when you compute your hashcode, HashMap will basically say - "i don't trust you enough" and it will re-hash your hashcode and potentially spread the bits better (it's actually a XOR of the first 16 bits and last 16 bits).
On the other hand when HashMap is re-sized it actually means that number of bins/buckets is doubled in size; and because bins are always a power of two - that means that an entry from a current bin will: potential stay in the same bucket OR move to a bucket that is at the offset at the current number of bins. You can find a bit of details how this is done in this question.
So once a re-size happens, there is no extra re-hashing; actually one more bit is taken into consideration and thus an entry might move or stay where it is. And Gray's answer is correct in this sense, that each Entry has the hash field, that is computed only once - the first time you put that Entry.

I can't find it clearly documented, but changing a key value in a way that changes its hashCode() typically breaks a HashMap.
HashMap divides entries amongst b buckets. You can imagine key with hash h is assigned to bucket h%b.
When it receives a new entry it works out which bucket it belongs to then if an equal key already exists in that bucket. It finally adds it to the bucket removing any matched key.
By changing the hash-code the object wrongHashCode will be (typically and here actually) directed to another bucket second time around and its first entry won't be found or removed.
In short, changing the hash of an already inserted key breaks the HashMap and what you get after that is unpredictable but may result in (a) not finding a key or (b) find two or more equal keys.

Because HashMap stores the elements in an internal table and incrementing the code does not affect that table:
public V put(K key, V value) {
if (key == null)
return putForNullKey(value);
int hash = hash(key.hashCode());
int i = indexFor(hash, table.length);
for (Entry<K,V> e = table[i]; e != null; e = e.next) {
Object k;
if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
V oldValue = e.value;
e.value = value;
e.recordAccess(this);
return oldValue;
}
}
modCount++;
addEntry(hash, key, value, i);
return null;
}
And addEntry
void addEntry(int hash, K key, V value, int bucketIndex) {
Entry<K,V> e = table[bucketIndex];
table[bucketIndex] = new Entry<K,V>(hash, key, value, e);
if (size++ >= threshold)
resize(2 * table.length);
}
As you can see table[bucketIndex] = new Entry (hash, ...) so although you increment the code, it won't be reflected here.
Try making the field code to be Integer and see what happens?

HashMap with byte array key and String value - containsKey() function doesn't work

I'm using a HashMap: byte[] key and String value. But I realize that even I put the same object (same byte array and same string value) by using
myList.put(TheSameByteArray, TheSameStringValue)
into HashMap, the table still inserts a new object with different HashMapEntry. Then function containsKey() cannot work.
Can someone explains this for me? How can I fix this? Thanks. (Android Java)
#Override public boolean containsKey(Object key) {
if (key == null) {
return entryForNullKey != null;
}
int hash = Collections.secondaryHash(key);
HashMapEntry<K, V>[] tab = table;
for (HashMapEntry<K, V> e = tab[hash & (tab.length - 1)];
e != null; e = e.next) {
K eKey = e.key;
if (eKey == key || (e.hash == hash && key.equals(eKey))) {
return true;
}
}
return false;
}

A byte[] (or any array) can't work properly as a key in a HashMap, since arrays don't override equals, so two arrays will be considered equal only if they refer to the same object.
You'll have to wrap your byte[] in some custom class that overrides hashCode and equals, and use that custom class as the key to your HashMap.

Adding to Eran's clear answer,Since byte[] or any array doesnt override hashcode and equals(it uses the default methods of Object class ),you can always wrap around a String Object which takes byte[] as constructor argument.Not only does String form good keys in Map,they are immutable too(the operations in a Hash based map are faster)
http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#String(byte[])

NOTE: This is an extremely hack-y way of making an array or a string, a key in a HashMap, without overriding either the equals() or the hashCode() methods.
I'll include the answer in a generic way, so readers can get the idea and implement as per their requirements.
Say, I have two numbers, n and r. I want a key-value pair with [n,r] as the key, and (n+r) as the value.
Map<List<Integer>, Integer> map = new HashMap<List<Integer>, Integer>();
List<Integer> key = Arrays.asList(n, r);
if( map.containsKey(key) )
return map.get(key);
What if the map did not contain the key?
map.put(Collections.unmodifiableList(Arrays.asList(n, r)), (n+r));
The unmodifiable part (without going into any further depth), ensures that the key cannot change the hash code.
Now, map.containsKey(key) will be true.
Note: This is not a good way to do it. It is just a workaround.

You can use ByteBuffer, which is a wrapper for byte[] array with a comparator.
Referring answer from - https://stackoverflow.com/a/14087243/4019660

Array out of bound exception?

Consider the following code:
Map<Integer, Material> TestMap= new HashMap<Integer, Material>();
if (TestMap.get(index)!= null) {
index++;
}
What will happen if TestMap.get(index) is null? Some say it will exit the code, some say it will throw ArrayIndexOutOfBoundsException.

Your code won't throw an ArrayIndexOutOfBoundsException, for it's a HashMap. As per the documentation of the get() method of HashMap:
public V get(Object key)
Returns the value to which the specified key is mapped, or null if this map contains no mapping for the key.
So if the HashMap does not contain a value for ìndex, if will return null, no problem there.
EDIT
If your HashMap was a List, and you added 10 elements, and you called get(25) on the List, then you'd get an ArrayIndexOutOfBoundsException.
So to summarize
if (TestMap.get(index)!= null) {
index++;
}
This piece of code doesn't throw any Exceptions (except when TestMap is null). If there's no value for the given key, get() returns null, and since you perform a null check, the if is not entered when the key doesnt exist in the map.

How does HashSet not allow duplicates?

I was going through the add method of HashSet. It is mentioned that
If this set already contains the element, the call leaves the set unchanged and returns false.
But the add method is internally saving the values in HashMap
public boolean add(E e) {
return map.put(e, PRESENT)==null;
}
The put method of HashMap states that
Associates the specified value with the specified key in this map. If the map previously contained a mapping for the key, the old value is replaced.
So if the put method of HashMap replaces the old value, how the HashSet add method leaves the set unchanged in case of duplicate elements?

PRESENT is just a dummy value -- the set doesn't really care what it is. What the set does care about is the map's keys. So the logic goes like this:
Set.add(a):
map.put(a, PRESENT) // so far, this is just what you said
the key "a" is in the map, so...
keep the "a" key, but map its value to the PRESENT we just passed in
also, return the old value (which we'll call OLD)
look at the return value: it's OLD, != null. So return false.
Now, the fact that OLD == PRESENT doesn't matter -- and note that Map.put doesn't change the key, just the value mapped to that key. Since the map's keys are what the Set really cares about, the Set is unchanged.
In fact, there has been some change to the underlying structures of the Set -- it replaced a mapping of (a, OLD) with (a, PRESENT). But that's not observable from outside the Set's implementation. (And as it happens, that change isn't even a real change, since OLD == PRESENT).

The answer that you may be looking comes down to the fact that the backing hashmap maps the elements of the set to the value PRESENT which is defined in HashSet.java as follows:
private static final Object PRESENT = new Object();
In the source code for HashMap.put we have:
386 public V put(K key, V value) {
387 if (key == null)
388 return putForNullKey(value);
389 int hash = hash(key.hashCode());
390 int i = indexFor(hash, table.length);
391 for (Entry<K,V> e = table[i]; e != null; e = e.next) {
392 Object k;
393 if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
394 V oldValue = e.value;
395 e.value = value;
396 e.recordAccess(this);
397 return oldValue;
398 }
399 }
400
401 modCount++;
402 addEntry(hash, key, value, i);
403 return null;
404 }
Because the key in question already exists, we will take the early return on line 397. But you might think a change is being made to the map on line 395, in which it appears that we are changing the value of a map entry. However, the value of value is PRESENT. But because PRESENT is static and final, so there is only one such instance; and so the assignment e.value = value actually doesn't change the map, and therefore the set, at all!
Update:
Once a HashSet is initialized.
- All the items in it are stored as keys in a HashMap
- All the values that HashMap have ONLY ONE object that is PRESENT which is a static field in HashSet

As you can see the HashSet.add method adds the element to the HashMap.put as a key not as a value. Value is replaced in the HashMap not the key.

See HashMap#put:
Associates the specified value with the specified key in this map. If
the map previously contained a mapping for the key, the old value is
replaced.
It replaces the key with the new value, this way, no duplicates will be in the HashSet.

public boolean add(E e) {
return map.put(e, PRESENT)==null;
}
e is the key, So if e is already present put will not return null. Hence add will return false.
JavaDoc for put :
the previous value associated with key, or null if there was no mapping for key. (A null return can also indicate that the map previously associated null with key.)

From javadocs for HashMap.put(),
"Associates the specified value with the specified key in this map. If the map previously contained a mapping for the key, the old value is replaced."
Thus the map value will be replaced, (which is a constant static field in HashSet class, and thus the same instance is replaced), and the map key is kept untouched (which, in fact IS the Set collection item)

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Hashcode for NULL key in HashMap - java

from HashMap: public V put(K key, V value) { if (key == null) return putForNullKey(value); ... and if you look further you will see that null always goes to bin 0

If you read description of static int hash(int h) method in HashMap you will find that null keys have index 0.

When a null value is existing in the map the key of that value is also null. you can not have many null keys in a map. Only one null key.

Internally Hashmap have a nullcheck for key. If it is null then it will return 0 else hash value of key. where as Hastable doesn't have any null check and it will call directly hashcode method on key that's why Hashtable won't accepts null.

Related

Why java's HashMap recheck hashcode inside bucket

Java HashMap resizing

HashMap with byte array key and String value - containsKey() function doesn't work

Array out of bound exception?

How does HashSet not allow duplicates?

Categories

Resources