in my project I use HashMap in order to store some data, and I've recently discovered that when I mutate the keys of the HashMap, some unexpected wrong results may happen. For Example:
HashMap<ArrayList,Integer> a = new HashMap<>();
ArrayList list1 = new ArrayList<>();
a.put(list1, 1);
System.out.println(a.containsKey(new ArrayList<>())); // true
list1.add(5);
ArrayList list2 = new ArrayList<>();
list2.add(5);
System.out.println(a.containsKey(list2)); // false
Note that both a.keySet().iterator().next().hashCode() == list2.hashCode() and a.keySet().iterator().next().equals(list2) are true.
I cannot understand why it happens, referring to the fact that the two objects are equal and have the same hash-code. Do anyone know what is the cause of that, and if there is any other similar structure that allows mutation of the keys? Thanks.
Mutable keys are always a problem. Keys are to be considered mutable if the mutation could change their hashcode and/or the result of equals(). That being said, lists often generate their hashcodes and check equality based on their elements so they almost never are good candidates for map keys.
What is the problem in your example? When the key is added it is an empty list and thus produces a different hashcode than when it contains an element. Hence even though the hashcode of the key and list2 are the same after changing the key list you'll not find the element. Why? Simply because the map looks in the wrong bucket.
Example (simplified):
Let's start with a few assumptions:
an empty list returns a hashcode of 0
if the list contains the element 5 it returns the hashcode 5
our map has 16 buckets (default)
the bucket index is determined by hashcode % 16 (the number of our buckets)
If you now add the empty list it gets inserted into bucket 0 due to its hashcode.
When you do the lookup with list1 it will look in bucket 5 due to the hashcode of 5. Since that bucket is empty nothing will be found.
The problem is that your key list changes its hashcode and thus should be put into a different bucket but the map doesn't know this should happen (and doing so would probably cause a bunch of other problems).
According to the javadocs for Map:
Note: great care must be exercised if mutable objects are used as map
keys. The behavior of a map is not specified if the value of an object
is changed in a manner that affects equals comparisons while the
object is a key in the map. A special case of this prohibition is that
it is not permissible for a map to contain itself as a key. While it
is permissible for a map to contain itself as a value, extreme caution
is advised: the equals and hashCode methods are no longer well defined
on such a map.
Your lists are the keys and you're changing them. It would not be a problem if the contents of the list were not what determine the values for hash code and what is equal, however that is not your case. If you think about it, it doesn't make much sense to change the key of a map. The key is what identifies the value, and if that key changes, all bets are off.
The map inserts the value given the hash code upon insertion. When you search for it later, it uses the hash code of the parameter to determine if it is a hit. I think you'd find that had you inserted list1 with the value already inserted that you would see "true" printed out since list2.hashCode() would produce the same hash code as list1 when it was inserted.
That's because a HashMap uses the hashCode() Method of Object in combination with equals(Object obj) to check if this map contains an object.
See:
ArrayList<Integer> a = new ArrayList<>();
a.add(1);
System.out.println(a.hashCode());
a.add(2);
System.out.println(a.hashCode());
This example shows, that the hashCode of your ArrayList has changed.
You should never use a mutable object as a key in your hashmap.
So what basically going on when u put the list1 as key in line 3 is that the map calculates its hashCode which it would later compare in containsKey(someKey) .
but when u mutated the list1 in line 5 its hashCode is essentially changed.
so if u now do
System.out.println(a.containsKey(list1));
after line 5 it would say false
and if u do System.out.println(a.get(list1));
it would say null as its comparing two different hashCodes
Probably you didn't override equals() and hashCode() methods.
Related
I'm a bit confused about the internal implementation of HashSet and HashMap in java.
This is my understanding, so please correct me if I'm wrong:
Neither HashSet or HashMap allow duplicate elements.
HashSet is backed by a HashMap, so in a HashSet when we call .add(element), we are calling the hashCode() method on the element and internally doing a put(k,v) to the internal HashMap, where the key is the hashCode and the value is the actual object. So if we try to add the same object to the Set, it will see that the hashCode is already there, and then replace the old value by the new one.
But then, this seems inconsistent to me when I read how a HashMap works when storing our own objects as keys in a HashMap.
In this case we must override the hashCode() and equals() methods and make them consistent between each other, because, if we find keys with the same hashCode, they will go to the same bucket, and then to distinguish between all the entries with the same hashCode we have to iterate over the list of entries to call the method equals() on each key and find a match.
So in this case, we allow to have the same hashCode and we create a bucket containing a list for all the objects with the same hashCode, however using a HashSet, if we find already a hashCode, we replace the old value by the new value.
I'm a bit confused, could someone clarify this to me please?
You are correct regarding the behavior of HashMap, but you are wrong about the implementation of HashSet.
HashSet is backed by a HashMap internally, but the element you are adding to the HashSet is used as the key in the backing HashMap. For the value, a dummy value is used. Therefore the HashSet's contains(element) simply calls the backing HashMap's containsKey(element).
The value we insert in HashMap acts as a Key to the map object and for its value, java uses a constant variable.So in the key-value pair, all the keys will have the same value.
you can refer to this link
https://www.geeksforgeeks.org/hashset-in-java/
Hash Map:-Basically Hash map working as key and value ,if we want to store data as key and value pair then we will go to the hash map, basically when we insert data by using hash map basically internally it will follow 3 think,
1.hashcode
2..equale
3.==
when we insert the data in hash map it will store the data in bucket(fast in) by using hash code , if there is 2 data store in the same bocket then key collision will happen to resolve this key collision we use (==) method, always == method check the reference of the object, if both object hashcode is same then first one replace to second one if the hashcode is not same then hashing Collision will happen to resolve this hashing collision we will use (.equal) method .equal method basically it will check the content , if both the content is same then it will return true other wise it will return false, so in the hash map it will check is the content is same ? if the content is same then first one replace to the second one if both content is different the it will create another one object in the bocket and store the data
Hash Set:- Basically Hash Set is use to store bunch of object at a time ,internally hash set also use hash map only , when we insert somethink by using add method internally it will call put method and it will store data in the hashmap key bcz hash map key always unique and duplicate are not allowed that's way hashset also unique and duplicate are not allowed and if we entered duplicate also in hashst it will not through any exception first one will replace to the second one and in the value it will store constant data "PRESENT".
You can observe that internal hashmap object contains the element of hashset as keys and constant “PRESENT” as their value.
Where present is constant which is defined as
private static final Object present = new Object()
I am looking for a hash function for an integer array containing about 17 integers each. There are about 1000 items in the HashMap and I want the computation to be as fast as possible.
I am now kind of confused by so many hash functions to choose and I notice that most of them are designed for strings with different characters. So is there a hash function designed for strings with only numbers and quick to run?
Thanks for your patience!
You did not specify any requirements (except speed of calculation), but take a look at java.util.Arrays#hashCode. It should be fast, too, just iterating once over the array and combining the elements in an int calculation.
Returns a hash code based on the contents of the specified array. For any two non-null int arrays a and b such that Arrays.equals(a, b), it is also the case that Arrays.hashCode(a) == Arrays.hashCode(b).
The value returned by this method is the same value that would be obtained by invoking the hashCode method on a List containing a sequence of Integer instances representing the elements of a in the same order. If a is null, this method returns 0.
And the hashmap accepts an array of integer as the key.
Actually, no!
You could technically use int[] as a key in a HashMap in Java (you can use any kind of Object), but that won't work well, as arrays don't define a useful hashCode method (or a useful equals method). So the key will use object identity. Two arrays with identical content will be considered to be different from each-other.
You could use List<Integer>, which does implement hashCode and equals. But keep in mind that you must not mutate the list after setting it as a key. That would break the hashtable.
hashmap functions can be found in
https://docs.oracle.com/javase/7/docs/api/java/util/HashMap.html
Creating a hashmap is easy.. it goes as
HashMap<Object, Integer> map = new HashMap<Object, Integer>();
I have been reading/researching the reason why HashMapis faster than HashSet.
I am not quite understanding the following statements:
HashMap is faster than HashSet because the values are associated to a unique key.
In HashSet, member object is used for calculating hashcode value which can be same for two objects so equals() method is used to check for equality. If it returns false, that means the two objects are different. In HashMap, the hashcode value is calculated using the key object.
The HashMap hashcode value is calculated using the key object. Here, the member object is used to calculate the hashcode, which can be the same for two objects, so equals() method is used to check for equality. If it returns false, that means the two objects are different.
To conclude my question:
I thought HashMap and HashSet calculate the hashcode in the same way. Why are they different?
Can you provide a concrete example how HashSet and HashMap calculating the hashcode differently?
I know what a "key object" is, but what does it mean by "member object"?
HashMap can do the same things as HashSet, and faster. Why do we need HashSet? Example:
HashMap <Object1, Boolean>= new HashMap<Object1, boolean>();
map.put("obj1",true); => exist
map.get("obj1"); =>if null = not exist, else exist
Performance:
If you look at the source code of HashSet (at least JDK 6, 7 and 8), it uses HashMap internally, so it basically does exactly what you are doing with sample code.
So, if you need a Set implementation, you use HashSet, if you need a Map - HashMap. Code using HashMap instead of HashSet will have exactly the same performance as using HashSet directly.
Choosing the right collection
Map - maps keys to values (associative array) - http://en.wikipedia.org/wiki/Associative_array.
Set - a collection that contains no duplicate elements - http://en.wikipedia.org/wiki/Set_(computer_science).
If the only thing you need your collection for is to check if an element is present in there - use Set. Your code will be cleaner and more understandable to others.
If you need to store some data for your elements - use Map.
None of these answers really explain why HashMap is faster than HashSet. They both have to calculate the hashcode, but think about the nature of the key of a HashMap - it is typically a simple String or even a number. Calculating the hashcode of that is much faster than the default hashcode calculation of an entire object. If the key of the HashMap was the same object as that stored in a HashSet, there would be no real difference in performance. The difference comes in the what sort of object is the HashMap's key.
Under what circumstances given a correct implementation of hashCode and equals() can the following code return false?
myLinkedHashMap.containsKey(myLinkedHashMap.keySet().iterator().next())
Most likely scenario I can think of would be even though hashCode is "deterministic", it may be based on mutable fields. If you change the fields used to compute hashCode after it's put in the Map, then you won't be able to find it anymore.
Edit: should clarify you 'usually' won't be able to find it anymore. Occasionally it will still work since two numbers can still rehash into the same bucket. This, of course, only adds to the confusion when it happens!
Every hash algorithm I have seen is "deterministic", in that for a given set of input values, you get the same hash value.
If the hash code is computed based on mutable properties of the object, the hash code will change after it's in the hash map if any of those mutable properties are changed.
It's not clear what you mean by "deterministic", but any hash-changing mutation to the key after it's been inserted into the hash map could easily have that effect.
import java.util.*;
public class Test {
public static void main(String[] args) {
List<String> strings = new ArrayList<String>();
Map<List<String>, String> map = new LinkedHashMap<List<String>, String>();
map.put(strings, "");
System.out.println(map.containsKey(map.keySet().iterator().next())); // true
strings.add("Foo");
System.out.println(map.containsKey(map.keySet().iterator().next())); // false
}
}
The hash code of ArrayList<T> is deterministic, but that doesn't mean it won't change if the contents of the list changes.
If the hashCode() is based on instance attributes that are mutable and those attributes are changed after the insertion, the hashCode() call during the iteration will return something different. And the equals() should be based on these same attributes, it will be expected to fail as well.
When another thread has removed all the next items from an Map in the middle of an iteration, there will be no more next().
I would not use the hashCode() values as keys, I would you the objects themselves.
If your hashCode and equals don't agree with one another, this could return false. For example, if the equals method always returns false, this will return false, since there isn't any object that would ever compare equal to the keys in the map.
Hope this helps!
You might want to check hasNext() first.
You could remove the first key in another thread between getting the first keys and calling containsKey.
I am new to Java, I was working with Map class and its derivatives.
I was just wondering about how elements are found inside them. Is only a pointer/reference check performed?
Let's say I have a TreeMap<MyObject, Integer>. If I have an object x i would like you to search an integer v such that its key is "equal" to x even if they are 2 separate instances of the class MyObject, hence 2 different pointers.
Is there any method (of an interface/superclass too) which can it do such operation?
Thanks in advance.
All the methods that involve comparisons in Map and its implementations make use of the 'equals' method for the objects. If you attempt to add a key+value to a Map which already contains aentry with a key that would compare equals to it, then the new key+value replaces the old one.
See the documentation:
For example, the specification for the containsKey(Object key) method says: "returns true if and only if this map contains a mapping for a key k such that (key==null ? k==null : key.equals(k))."
The implementation may not execute any equals comparison if it can determine that the keys are 'unequal' through some other means, such as comparing hashcodes.
In your example, you would do
TreeMap<MyObject, Integer> tree = ...
Integer i = tree.get(x);
The get(x) will iterate over your keys() and returning the integer value for the key matching aKey.equals(x).
In most cases, Maps are backed by a hash table, and are very similar to a HashMap. TreeMap gives a bit more information with each node by having pointers up and down the tree, but lookups are still done via hashes (I believe)