Adding arrays with same values to HashSet results in duplicate items - java

I'm trying to create a set of arrays of ints, the thing is that if I try to do:
HashSet<int[]> s = new HashSet<int[]>();
int a1[] = {1,2,3};
int a2[] = {1,2,3};
s.add(a1);
s.add(a2)
System.out.println(s.size());
Then s has two objects, but there should be only one.
Note: it doesn't matter if it is HashSet< Integer[]>. It just doesn't work.
Now If I try to do this with an ArrayList< Integer>, something like:
HashSet<ArrayList<Integer>> s = new HashSet<ArrayList<Integer>>();
ArrayList<Integer> a1 = new ArrayList<Integer>();
ArrayList<Integer> a2 = new ArrayList<Integer>();
a1.add(1);
a1.add(2);
a1.add(3);
a2.add(1);
a2.add(2);
a2.add(3);
s.add(a1);
s.add(a2)
System.out.println(s.size());
Then s has one object.
I though a way to avoid the error in the first code and was storing hashcodes of each array in a hashset as follows:
int a1[] = {0,10083,10084,1,0,1,10083,0,0,0,0};
int a2[] = {1 ,0 ,0 ,0 ,0 ,0 ,0 ,0 ,1 ,0,2112};
HashSet<Integer> s= new HashSet<Integer>();//hashcodes of each array
s.add(Arrays.hashCode(a1));
s.add(Arrays.hashCode(a2));
System.out.println(Arrays.hashCode(a1));
System.out.println(Arrays.hashCode(a2));
System.out.println(s.size());
It works for the first case(1,2,3) but in cases where there are collisions it doesn't work, so I would have to manage collisions. So, I think that what I am doing is implementing a HashSet by myself.
With HashSet< ArrayList< Integer>> it works perfectly. I suppose that java manage collisions in that case.
My question is why java does not allow to manage a HashSet< int[]> or HashSet< Integer[]> if hashcodes generated are the same as in ArrayList< Integer> and hashcodes of arrays can be calculated simply by calling Arrays.hashCode(...).
And finally, if I want to do a HashSet< int[]>(or HashSet< Integer[]>) I would have to implement it by myself? Or there is a better way to do it?
Thanks.
UPDATE: Ok, finally I think I have came to a complete answer. As #ZiyaoWei and #user1676075 commented it doesn't work because equals returns false and hashcodes are differents. But, why java does not override this methods(with Arrays.equals(), Arrays.hashCode()) so one can do something like HashSet< int[]>? The answer is because an array is a mutable object, and hashcode can not depend on mutable values(each element of array is a mutable value) according to the general contract of hashcode. Mutable objects and hashCode
Here nice explanations of using mutable fields in hashCode http://blog.mgm-tp.com/2012/03/hashset-java-puzzler/ and mutable keys in hashmaps Are mutable hashmap keys a dangerous practice?
My answer is, if you want to use a HashSet< int[]> you have to create a class that has an array and if you want that hashcode and equals to depend on values, override methods equals() and hashCode() with Arrays.equals() and Arrays.hashCode(). If you don't want to violate the contract just make the array final.
Thanks everyone!

It has nothing to do with collision at the end of the day:
a1.equals(a2) == false
Since they are not equal, a Set will treat them as different.
Note Array in Java does not override the equals method from Object.
And since add in Set is defined as
More formally, adds the specified element e to this set if the set contains no element e2 such that (e==null ? e2==null : e.equals(e2))
is seems to be impossible to properly implement a Set that might meet your requirement (compare elements with Arrays.equals) without violating some contracts.

The reason the HashSet> works is because the HashSet will use .equals() comparison to decide if you're inserting the same object twice. In the case of List, two lists of the same base type (e.g. ArrayList) with the same content, in the same order, will compare as equal. Thus you're telling the HashSet to insert the same object twice. It only takes a single instance once.
When you try to do that same operation with an array. See this post: equals vs Arrays.equals in Java for more details about array comparisons in Java. When you insert two arrays, the default .equals() tests if they are the same object, which they are not. Thus it fails.

Related

How the ArrayList objects are stored inside a HashSet in Java?

Today I was doing a question and in that they have used a code similar to this.
I am amazed to see this. I thought every HashSet stores the hash of an object and the answer would be 2. However, the answer to this 1.
Could anyone explain what actually happens internally when I store HashSet of ArrayList of objects and why the answer is 1 instead of 2?
import java.io.*;
import java.util.*;
class Code {
public static void main (String[] args) {
HashSet<ArrayList<Integer>> set=new HashSet<>();
ArrayList<Integer> list1=new ArrayList<>();
ArrayList<Integer> list2=new ArrayList<>();
list1.add(1);
list1.add(2);
list2.add(1);
list2.add(2);
set.add(list1);
set.add(list2);
System.out.println(set.size()); // 1
}
}
Two instances of List are considered "equal" if they have the same elements in the same order. So that means list1 and list2 are "equal". By the general contract of the hashCode method they must also have the same hash code
HashSet does not store duplicate items: if you give it two items that are equal it stores only the first one. So here it's storing list1 only.
The answer is 1 because both Lists contain the same elements. The hash code of an ArrayList is a function of the hash codes of all elements in the list. In your case, both lists contain the same elements which means they correspond to the same hash code.
HashSet implements the Set interface, backed by a hash table. Any implementation of Set simply discards the duplicate elements. Since both list1 and list2 are equal, set will discard list2 when you try to insert it into into set when set already has list1. Thus, the size of set remains 1.
Here both the list values are equal so there hashcode is also the same by the contract and hashset stores hash value of its object and doesn't contain duplicates so list1 is replaced with list2 and hence the size is 1.
It would follow its default behaviour - it would first check if there is any existing entry (using hashCode() and equals() ) and if found, it would replace it and if not, it would insert it.
Note that the hashCode() and equals() method invocations will eventually get invoked on the object-entry - in this case the ArrayList object itself (ArrayList inturn inherits the methods from AbstractList ).
PS : It appears HashSet is implemented internally as a HashMap !

Mutation of the keys in HashMap causes wrong results

in my project I use HashMap in order to store some data, and I've recently discovered that when I mutate the keys of the HashMap, some unexpected wrong results may happen. For Example:
HashMap<ArrayList,Integer> a = new HashMap<>();
ArrayList list1 = new ArrayList<>();
a.put(list1, 1);
System.out.println(a.containsKey(new ArrayList<>())); // true
list1.add(5);
ArrayList list2 = new ArrayList<>();
list2.add(5);
System.out.println(a.containsKey(list2)); // false
Note that both a.keySet().iterator().next().hashCode() == list2.hashCode() and a.keySet().iterator().next().equals(list2) are true.
I cannot understand why it happens, referring to the fact that the two objects are equal and have the same hash-code. Do anyone know what is the cause of that, and if there is any other similar structure that allows mutation of the keys? Thanks.
Mutable keys are always a problem. Keys are to be considered mutable if the mutation could change their hashcode and/or the result of equals(). That being said, lists often generate their hashcodes and check equality based on their elements so they almost never are good candidates for map keys.
What is the problem in your example? When the key is added it is an empty list and thus produces a different hashcode than when it contains an element. Hence even though the hashcode of the key and list2 are the same after changing the key list you'll not find the element. Why? Simply because the map looks in the wrong bucket.
Example (simplified):
Let's start with a few assumptions:
an empty list returns a hashcode of 0
if the list contains the element 5 it returns the hashcode 5
our map has 16 buckets (default)
the bucket index is determined by hashcode % 16 (the number of our buckets)
If you now add the empty list it gets inserted into bucket 0 due to its hashcode.
When you do the lookup with list1 it will look in bucket 5 due to the hashcode of 5. Since that bucket is empty nothing will be found.
The problem is that your key list changes its hashcode and thus should be put into a different bucket but the map doesn't know this should happen (and doing so would probably cause a bunch of other problems).
According to the javadocs for Map:
Note: great care must be exercised if mutable objects are used as map
keys. The behavior of a map is not specified if the value of an object
is changed in a manner that affects equals comparisons while the
object is a key in the map. A special case of this prohibition is that
it is not permissible for a map to contain itself as a key. While it
is permissible for a map to contain itself as a value, extreme caution
is advised: the equals and hashCode methods are no longer well defined
on such a map.
Your lists are the keys and you're changing them. It would not be a problem if the contents of the list were not what determine the values for hash code and what is equal, however that is not your case. If you think about it, it doesn't make much sense to change the key of a map. The key is what identifies the value, and if that key changes, all bets are off.
The map inserts the value given the hash code upon insertion. When you search for it later, it uses the hash code of the parameter to determine if it is a hit. I think you'd find that had you inserted list1 with the value already inserted that you would see "true" printed out since list2.hashCode() would produce the same hash code as list1 when it was inserted.
That's because a HashMap uses the hashCode() Method of Object in combination with equals(Object obj) to check if this map contains an object.
See:
ArrayList<Integer> a = new ArrayList<>();
a.add(1);
System.out.println(a.hashCode());
a.add(2);
System.out.println(a.hashCode());
This example shows, that the hashCode of your ArrayList has changed.
You should never use a mutable object as a key in your hashmap.
So what basically going on when u put the list1 as key in line 3 is that the map calculates its hashCode which it would later compare in containsKey(someKey) .
but when u mutated the list1 in line 5 its hashCode is essentially changed.
so if u now do
System.out.println(a.containsKey(list1));
after line 5 it would say false
and if u do System.out.println(a.get(list1));
it would say null as its comparing two different hashCodes
Probably you didn't override equals() and hashCode() methods.

finding a hash function for long integer array

I am looking for a hash function for an integer array containing about 17 integers each. There are about 1000 items in the HashMap and I want the computation to be as fast as possible.
I am now kind of confused by so many hash functions to choose and I notice that most of them are designed for strings with different characters. So is there a hash function designed for strings with only numbers and quick to run?
Thanks for your patience!
You did not specify any requirements (except speed of calculation), but take a look at java.util.Arrays#hashCode. It should be fast, too, just iterating once over the array and combining the elements in an int calculation.
Returns a hash code based on the contents of the specified array. For any two non-null int arrays a and b such that Arrays.equals(a, b), it is also the case that Arrays.hashCode(a) == Arrays.hashCode(b).
The value returned by this method is the same value that would be obtained by invoking the hashCode method on a List containing a sequence of Integer instances representing the elements of a in the same order. If a is null, this method returns 0.
And the hashmap accepts an array of integer as the key.
Actually, no!
You could technically use int[] as a key in a HashMap in Java (you can use any kind of Object), but that won't work well, as arrays don't define a useful hashCode method (or a useful equals method). So the key will use object identity. Two arrays with identical content will be considered to be different from each-other.
You could use List<Integer>, which does implement hashCode and equals. But keep in mind that you must not mutate the list after setting it as a key. That would break the hashtable.
hashmap functions can be found in
https://docs.oracle.com/javase/7/docs/api/java/util/HashMap.html
Creating a hashmap is easy.. it goes as
HashMap<Object, Integer> map = new HashMap<Object, Integer>();

Use map to check whether contains the array. But does not work

I have a question about Java maps. I use map to contain array, and I want to check whether the map contains the array I want. But it does not work. Is there anyway to check whether the map contains the the array I want?
import java.util.*;
public class testContainKey{
static Map<int[],Integer> map = new HashMap<int[], Integer>();
public static void main(String args[]){
int[] initial={1,2,3,4,5,6,7,8,9};
int[] goal = {1,2,3,4,5,6,7,8,9};
map.put(goal,0);
if(map.containsKey(array)){
System.out.println("OK");
}
else{
System.out.println("Not works");
}
}
}
This is not going to work: Map is based on hash code and equality checks; arrays do not pay attention to their elements when calculating their hash code. That's why the two arrays that you tried to use as keys are considered different.
You can define a class ArrayKey, put an array into it in a constructor, and define equals and hashCode that use array elements.
You are using as a key of the map an array, which is pretty much hard to control, because AFAIK you can not modify the equals() and hashCode() of it.
When you call Map.containsKey(), it is using the array's .equals(), which compares the 2 objects. Since initial and goal are 2 different arrays, initial.equals(goal) will be false, always, even though the contents of the array are the same.
Something you can do is extend Map and override Map.containsKey() to check for int[], and compare each of the elements to determine equality.

Java HashMap with Int Array

I am using this code to check that array is present in the HashMap:
public class Test {
public static void main(String[] arg) {
HashMap<int[], String> map = new HashMap<int[], String>();
map.put(new int[]{1, 2}, "sun");
System.out.println(map.containsKey((new int[]{1, 2})));
}
}
But this prints False. How can I check that array is present in the HashMap?
The problem is because the two int[] aren't equal.
System.out.println(
(new int[] { 1, 2 }).equals(new int[] { 1, 2 })
); // prints "false"
Map and other Java Collections Framework classes defines its interface in terms of equals. From Map API:
Many methods in Collections Framework interfaces are defined in terms of the equals method. For example, the specification for the containsKey(Object key) method says: "returns true if and only if this map contains a mapping for a key k such that (key==null ? k==null : key.equals(k))."
Note that they don't have to be the same object; they just have to be equals. Arrays in Java extends from Object, whose default implementation of equals returns true only on object identity; hence why it prints false in above snippet.
You can solve your problem in one of many ways:
Define your own wrapper class for arrays whose equals uses java.util.Arrays equals/deepEquals method.
And don't forget that when you #Override equals(Object), you must also #Override hashCode
Use something like List<Integer> that does define equals in terms of the values they contain
Or, if you can work with reference equality for equals, you can just stick with what you have. Just as you shouldn't expect the above snippet to ever print true, you shouldn't ever expect to be able to find your arrays by its values alone; you must hang-on to and use the original references every time.
See also:
Overriding equals and hashCode in Java
How to ensure hashCode() is consistent with equals()?
Understanding the workings of equals and hashCode in a HashMap
API
Object.equals and Object.hashCode
It's essential for a Java programmer to be aware of these contracts and how to make them work with/for the rest of the system
You are comparing two different references.
Something like this will work:
public class Test {
public static void main(String[] arg)
{
HashMap<int[],String> map= new HashMap<int[],String>();
int[] a = new int[]{1,2};
map.put(a, "sun");
System.out.println(map.containsKey(a));
}
}
Since a is the same reference, you will receive true as expected. If your application has no option of passing references to do the comparison, I would make a new object type which contains the int[] and override the equals() method (don't forget to override hashCode() at the same time), so that will reflect in the containsKey() call.
I would use a different approach. As mentioned before, the problem is with arrays equality, which is based on reference equality and makes your map useless for your needs.
Another potential problem, assuming that you use ArrayList instead, is the problem of consistency: if you change a list after is has been added to the map, you will have a hashmap corruption since the position of the list will not reflect its hashcode anymore.
In order to solve these two problems, I would use some kind of immutable list. You may want to make an immutable wrapper on int array for example, and implement equals() and hashCode() yourself.
I think the problem is your array is doing an '==' comparison, i.e. it's checking the reference. When you do containsKey(new int[] { ... }), it's creating a new object and thus the reference is not the same.
If you change the array type to something like ArrayList<Integer> that should work, however I would tend to avoid using Lists as map keys as this is not going to be very efficient.
The hashCode() implementation for arrays is derived from Object.hashCode(), so it depends on the memory location of the array. Since the two arrays are instantiated separately, they have different memory locations and thus different hashcodes. If you made one array it would work:
int[] arr = {1, 2};
map.put(arr, "sun");
System.out.println(map.containsKey(arr));
You've got two different objects that happen to contain the same values, because you've called new twice.
One approach you might use is to create a "holder" class of your own, and define that class's equals and hash methods.
Are you sure you don't want to map Strings to arrays instead of the other way around?
Anyway, to answer your question, the problem is you are creating a new array when you call containsKey(). This returns false between you you have two separately newed arrays that happen to have the same elements and dimension. See Yuval's answer to see the correct way of checking if an array is contained as a key.
An alternative, more advanced, approach is to create your own class that wraps an array and overwrites hashCode() so that two arrays with the same dimension and elements will have equal hash codes.
The two instances of int[] are different and not equal.
A nice approach would be to convert the int array to String using Arrays.toString(arr):
HashMap<String, String> h = new HashMap<>();
int[] a = new int[]{1, 2};
h.put(Arrays.toString(a), "sun");
h.get(Arrays.toString(new int[]{1, 2})); // returns sun

Categories