Java - Do two HashSets change similarly overtime? - java

I already looked into different questions but these usually ask about consistency or ordering, while I am interested into ordering of two HashSets containing the same elements at the same time.
I want to create a HashSet of HashSets containing integers. Over time I will put HashSets of size 3 in this bigger HashSet and I will want to see if a newly created HashSet is already contained within the bigger HashSet.
Now my question is will it always find duplicates or can the ordering of two HashSets with the same elements be different?
I am conflicted as they use the same hashcode() function but does that mean they will always be the same?
HashSet<HashSet<Integer>> test = new HashSet<>();
HashSet<Integer> one = new HashSet<>();
one.add(1);
one.add(2);
one.add(5);
test.add(one);
HashSet<Integer> two = new HashSet<>();
two.add(5);
two.add(1);
two.add(2);
//Some other stuff that runs over time
System.out.println(test.contains(two));
Above code tries to illustrate what I mean, does this always return true? (Keep in mind I might initialise another HashSet with the same elements and try the contains again)

Yes, the above always returns true. Sets have no order, and when you test whether two Sets are equal to each other, you are checking that they have the same elements. Order has no meaning.
To elaborate, test.contains(two) will return true, if an only if test contains an element having the same hashCode() as two which is equal to two (according to the equals method).
Two sets s1 and s2 that have the same elements have the same hashCode() and s1.equals(s2) returns true.
This is required by the contract of equals and hashCode of the Set interface:
equals
Compares the specified object with this set for equality. Returns true if the specified object is also a set, the two sets have the same size, and every member of the specified set is contained in this set (or equivalently, every member of this set is contained in the specified set). This definition ensures that the equals method works properly across different implementations of the set interface.
hashCode
Returns the hash code value for this set. The hash code of a set is defined to be the sum of the hash codes of the elements in the set, where the hash code of a null element is defined to be zero. This ensures that s1.equals(s2) implies that s1.hashCode()==s2.hashCode() for any two sets s1 and s2, as required by the general contract of Object.hashCode.
As you can see, one and two don't even have to use the same implementation of the Set interface in order for test.contains(two) to return true. They just have to contain the same elements.

The key property of sets is about uniqueness of keys.
By "default", insertion order doesn't matter at all.
A linked LinkedHashSet guarantees to you that when iterating, you get the elements always in the same order (the one used for inserting them). But even then, when comparing such sets, it is still only about their content, not that insertion order part.
In other words: no matter what (default) implementation of the Set interface you are using, you should always see consistent behavior. Of course you free to implement your own Set and to violate that contract, but well, violating contracts leads to violated contracts, aka bugs.

You can look for yourself, this is open source code:
public boolean equals(Object o) {
if (o == this)
return true;
if (!(o instanceof Set))
return false;
Collection<?> c = (Collection<?>) o;
if (c.size() != size())
return false;
try {
return containsAll(c);
} catch (ClassCastException unused) {
return false;
} catch (NullPointerException unused) {
return false;
}
}
public int hashCode() {
int h = 0;
Iterator<E> i = iterator();
while (i.hasNext()) {
E obj = i.next();
if (obj != null)
h += obj.hashCode();
}
return h;
}
You can easily see that the hashcode will be the sum of the hashcode of the elements, so it is not affected by any order and that equals use containsAll(...) so also here the order doesn't matter.

Related

Why does super.hashCode give different results on objects from the same Class?

I have a class DebugTo where if I have two equal instances el1, el2 a HashSet of el1 will not regard el2 as contained.
import java.util.Objects;
public class DebugTo {
public String foo;
public DebugTo(String foo) {
this.foo = foo;
}
#Override
public int hashCode() {
System.out.println(super.hashCode());
return Objects.hash(super.hashCode(), foo);
}
#Override
public boolean equals(Object o) {
if (this == o) {
return true;
}
if (o == null || getClass() != o.getClass()) {
return false;
}
DebugTo that = (DebugTo) o;
return Objects.equals(foo, that.foo);
}
}
var el1 = new DebugTo("a");
var el2 = new DebugTo("a");
System.out.println("Objects.equals(el1, el2): " + Objects.equals(el1, el2));
System.out.println("Objects.equals(el2, el1): " + Objects.equals(el2, el1));
System.out.println("el1.hashCode(): " + el1.hashCode());
System.out.println("el2.hashCode(): " + el2.hashCode());
Objects.equals(el1, el2): true
Objects.equals(el2, el1): true
1205483858
el1.hashCode(): -1284705008
1373949107
el2.hashCode(): -357249585
From my analysis I have gathered that:
HashSet::contains calls hashCode not equals (relying on the Objects.equals(a, b) => a.hashSet() == b.hashSet())
super.hashCode() gives a different value both times.
Why does super.hashCode() give different results for el1 and el2? since they are of the same class, they have the same super class and so I expect super.hashCode() to give the same result for both.
The hashCode method was probably autogenerated by eclipse. If not answered above, why is super.hashCode used wrong here?
Because the default implementations of the equals and hashCode methods (which go hand in hand - you always override both or neither) treat any 2 different instances as not equal to each other. If you want different behaviour, you override equals and hashCode, and do not invoke super.equals / super.hashCode, or there'd be no point.
HashSets work as follows: They use .hashCode() to know which 'bucket' to put the object into, and if 2 objects end up in the same bucket, equals is used only on those very few objects to double check.
In other words, these are the rules:
If a.equals(b), then b.equals(a) must be true.
a.equals(a) must always be true.
If a.equals(b) and b.equals(c), a.equals(c) must be true.
If a.equals(b), a.hashCode() == b.hashCode() must be true.
The reverse of 4 does not hold: If a.hashCode() == b.hashCode(), that doesn't mean a.equals(b), and hashset does not require it.
Therefore, return 1; is a legal implementation of hashCode.
If a class has really bad hashcode spread (such as the idiotic but legal option listed in bullet 6), then the performance of hashset will be very bad. e.g. set.containsKey(k) which ordinarily takes constant time, will take linear time instead if your objects are all not-equal but have the same hashCode. Hence, do try to ensure hashcodes are as different as they can be.
HashSet and HashMap require stable objects, meaning, their behaviour when calling hashCode and equals cannot change over time.
From the above it naturally follows that overriding equals and not hashCode or vice versa is necessarily broken.
Breaking any of the above rules does not, generally, result in a compiler error. It often doesn't even result in an exception. But instead it results in bizarre behaviour with hashsets and hashmaps: You put an k/v pair in the map, and then immediately ask for the value back and you get null back instead of what you put in, or something completely different. Just an example.
NB: One weird effect of all this is that you cannot add equality-affecting state to subclasses, unless you apply a caveat that most classes including all classes in the core libraries don't apply.
Imagine as an example that we invent the notion of a 'coloured' arraylist. You could have a red '["Hello", "World"]' list, and a blue one:
class ColoredArrayList extends ArrayList {
Color color;
public ColoredArrayList(Color c) {
this.color = color;
}
}
You'd probably want an empty red list to not equal an empty blue one. However, that is impossible if you intend to follow the rules. That's because the equals/hashCode impl of ArrayList itself considers any other list equal to itself if it has the same items in the same order. Therefore:
List<String> a = new ArrayList<String>();
ColoredList<String> b = new ColoredList<String>(Color.RED);
a.equals(b); // this is true, and you can't change that!
Therefore, b.equals(a) must also be true (your impl of equals has to say that an empty red list is equal to an empty plain arraylist), and given that an empty arraylist is also equal to an empty blue one, given that a.equals(b) and b.equals(c) implies that a.equals(c), a red empty list has to be equal to a blue empty list.
There is an easy solution for this that brings in new problems, and a hard solution that is objectively better.
The easy solution is to define that you can't be equal to anything except exact instances of yourself, as in, any subclass is insta-disqualified. Imagine ArrayList's equals method returns false if you call it with an instance of a subclass of ArrayList. Then you could make your colored list just fine. But, this isn't necessarily great, for example, you probably want an empty LinkedList and an empty ArrayList to be equal.
The harder solution is to introduce a second method, canEqual, and call it. You override canEqual to return 'if other is instanceof the nearest class in my hierarchy that introduces equality-relevant state'. Thus, your ColoredList should have #Override public boolean canEqual(Object other) { return other instanceof ColoredList; }.
The problem is, all classes need to have that and use it, or it's not going to work, and ArrayList does not have it. And you can't change that.
Project Lombok can generate this for you if you prefer. It's not particularly common; I'd only use it if you really know you need it.

Duplicate elements being added in a set after I change the value after insertion

I have a set with two distinct objects added to it. After insertion, I change one of the objects in such a way that both the objects are equal (as verified by the overridden equals method in object class). At this point in time I have two duplicate elements in a set. Now I try to add these two duplicate objects in a new set and I am still able to add them even though the equals method returns true for them. Below is the code for the same. Can someone please tell me what exactly am I missing?
public class BasicSetImpl{
public int num; String entry;
public BasicSetImpl(int num, String entry){
this.num = num;
this.entry = entry;
}
#Override
public int hashCode() {
return Objects.hash(entry, num);
}
#Override
public boolean equals(Object obj) {
BasicSetImpl newObj = (BasicSetImpl)obj;
if (this.num == newObj.num)
return true;
else
return false;
}
public static void main(String[] args){
Set<BasicSetImpl> set = new HashSet<>();
BasicSetImpl k1 = new BasicSetImpl(1, "One");
BasicSetImpl k2 = new BasicSetImpl(2, "Two");
set.add(k1);
set.add(k2);
k2.num = 1;
System.out.println(k1.equals(k2)); //This line returns True
Set<BasicSetImpl> newSet = new HashSet<>();
newSet.add(k1);
newSet.add(k2);
//Set.size here is two
The hash based collection, in this case the HashSet uses the Object hashCode method to calculate the hash value as a function of contents of the object. Since you consider both the entry and num to determine the hashcode value of the object, these two objects has distinct hashcodes since they are different from entry. Thus they belong in two different hash buckets and never be recognized as the same.
However, if you set the entry as follows
k2.entry = "One";
then both k1 and k2 has the same hashcode value. Then they both belong to the same hash bucket and according to the equals method two objects are the same. Therefore, now the duplicates are ignored.
But, there's a problem lurking here. Ideally, equal objects must have equal hash codes. But, in your case it is not. Two objects are equal if they have the same num value, but they may produce different hashcode values. So the correct solution would be to just change your hashCode as follows.
#Override
public int hashCode() {
return Integer.hashCode(num);
}
Now, it should behave the way you would expect without the hack we added by setting k2.entry = "One";
hashCode and equals methods must be consistent.
From documentation
If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
In your case despite they are equal their hashCodes are different.
When adding elements HashSet check hashe of the added object and only if there is existing object in set with the same hash, HashSet checks if these objects with same hashes are equal.

demonstrate that equals goes hand in hand with compareTo

It's written in all decent java courses, that if you implement the Comparable interface, you should (in most cases) also override the equals method to match its behavior.
Unfortunately, in my current organization people try to convince me to do exactly the opposite. I am looking for the most convincing code example to show them all the evil that will happen.
I think you can beat them by showing the Comparable javadoc that says:
It is strongly recommended (though not required) that natural
orderings be consistent with equals. This is so because sorted sets
(and sorted maps) without explicit comparators behave "strangely" when
they are used with elements (or keys) whose natural ordering is
inconsistent with equals. In particular, such a sorted set (or sorted
map) violates the general contract for set (or map), which is defined
in terms of the equals method.
For example, if one adds two keys a and b such that (!a.equals(b) &&
a.compareTo(b) == 0) to a sorted set that does not use an explicit
comparator, the second add operation returns false (and the size of
the sorted set does not increase) because a and b are equivalent from
the sorted set's perspective.
So especially with SortedSet (and SortedMap) if the compareTo method returns 0, it assumes it as equal and doesn't add that element second time even the the equals method returns false, and causes confusion as specified in the SortedSet javadoc
Note that the ordering maintained by a sorted set (whether or not an
explicit comparator is provided) must be consistent with equals if the
sorted set is to correctly implement the Set interface. (See the
Comparable interface or Comparator interface for a precise definition
of consistent with equals.) This is so because the Set interface is
defined in terms of the equals operation, but a sorted set performs
all element comparisons using its compareTo (or compare) method, so
two elements that are deemed equal by this method are, from the
standpoint of the sorted set, equal. The behavior of a sorted set is
well-defined even if its ordering is inconsistent with equals; it just
fails to obey the general contract of the Set interface.
If you don't override the equals method, it inherits its behaviour from the Object class.
This method returns true if and only if the specified object is not null and refers to the same instance.
Suppose the following class:
class VeryStupid implements Comparable
{
public int x;
#Override
public int compareTo(VeryStupid o)
{
if (o != null)
return (x - o.x);
else
return (1);
}
}
We create 2 instances:
VeryStupid one = new VeryStupid();
VeryStupid two = new VeryStupid();
one.x = 3;
two.x = 3;
The call to one.compareTo(two) returns 0 indicating the instances are equal but the call to one.equals(two) returns false indicating they're not equal.
This is inconsistent.
Consistency of compareTo and equals is not required but strongly recommended.
I'll give it a shot with this example:
private static class Foo implements Comparable<Foo> {
#Override
public boolean equals(Object _other) {
System.out.println("equals");
return super.equals(_other);
}
#Override
public int compareTo(Foo _other) {
System.out.println("compareTo");
return 0;
}
}
public static void main (String[] args) {
Foo a, b;
a = new Foo();
b = new Foo();
a.compareTo(b); // prints 'compareTo', returns 0 => equal
a.equals(b); // just prints 'equals', returns false => not equal
}
You can see that your (maybe very important and complicated) comparission code is ignored when you use the default equals-method.
the method int compareTo(T o) allow you know if the T o is (in some way) superior or inferior of this, so it allow you to order a list of T o.
In the scenario of int compareTo(T o) you have to do :
is o InstanceOfThis ? => true/false ;
is o EqualOfThis ? => true/false ;
is o SuperiorOfThis ? => true/false ;
is o InferiorOfThis ? true/false ;
So you see you have the equality test, and the best way to not implement the equality two times is to put it in the boolean equals(Object obj) method.

Understanding HashSet

Well here is my question, Can "HashSet Objects" have elements duplicated??
If I read the Set Interface definition, I see:
A collection that contains no duplicate elements. More formally, sets contain no pair of elements e1 and e2 such that e1.equals(e2), and at most one null element. As implied by its name, this interface models the mathematical set abstraction.
And now we are going to write a simple example:
Define class A:
public class A {
#Override
public boolean equals(Object obj) {
return true;
}
}
Now execute this code;
Set<A> set = new HashSet<A>();
set.add(new A());
set.add(new A());
System.out.println(set.toString());
And this is the result:
[com.maths.graphs.A#b9e9a3, com.maths.graphs.A#18806f7]
Why a class what implements Set Interface like HashSet contains elements duplicated?
Thanks!!
You have broken the equals-hashcode contract.
If you override the equals method you must also override the hashCode() method such that:
Two objects which are equal give the same hash, and preferably unequal
objects are highly likely to give different hashcodes
This is important because many objects (unsurprisingly including the HashSet) use the hashcode as a quick, efficient early step to eliminate unequal objects. This is what has happened here since the hashcodes of the different As will be different as they are still using the implementation of .hashCode() provided within object.
If you were to create the class A as follows it would not allow more than 1 A in the set
public class A {
#Override
public boolean equals(Object obj) {
return true;
}
#Override
public int hashCode() {
int hash = 1; //any number since in this case all objects of class A are equal to everything
return hash;
}
}
From the javadoc
public int hashCode()
Returns a hash code value for the object. This method is supported for
the benefit of hash tables such as those provided by HashMap.
The general contract of hashCode is:
Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently
return the same integer, provided no information used in equals
comparisons on the object is modified. This integer need not remain
consistent from one execution of an application to another execution
of the same application.
If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must
produce the same integer result.
It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on
each of the two objects must produce distinct integer results.
However, the programmer should be aware that producing distinct
integer results for unequal objects may improve the performance of
hash tables.
Most IDEs will object if you do not include an overriding HashCode method when overiding the equals method and can generate a hashCode method for you.
Notes
Strictly speaking my hashCode() method doesn't completely satisfy the contract. Since A#equals(Object obj) equals anything including objects which are not of type A it is impossible to fully satisfy the contract. Ideally the equals method would be changed to the following as well to cover all bases
#Override
public boolean equals(Object obj) {
if (obj instanceof A){
return true;
}else{
return false;
}
}
Here the HashSet does not have duplicates, as the two add methods add new objects in the HashSet and these are different Objects. The reason that the hash codes for the two elements of the set are different for this reason. Try changing the code to:
Set<A> set = new HashSet<A>();
A a = new A();
set.add(a);
set.add(a);
System.out.println(set.toString());
and you will see that there is only one value in the set.
Or just add the following in you code and check
#Override
public int hashCode() {
return 31;
}
You have violated the hashCode() method contract i.e for same key it should return same hashcode() every time

In Java, why must equals() and hashCode() be consistent?

If I override either method on a class, it must make sure that if A.equals(B) == true then A.hashCode() == B.hashCode must also be true.
Can someone show me a simple example where if this is violated, it'll cause a problem? I think it has something to do with if you use that class as the type of keys to Hashmap?
Sure:
public class Test {
private final int m, n;
public Test(int m, int n) {
this.m = m;
this.n = n;
}
public int hashCode() { return n * m; }
public boolean equals(Object ob) {
if (ob.getClass() != Test.class) return false;
Test other = (Test)ob;
return m == other.m;
}
}
with:
Set<Test> set = new HashSet<Test>();
set.put(new Test(3,4));
boolean b = set.contains(new Test(3, 10)); // false
Technically that should be true because m == 3 in both cases.
In general a HashMap works like this: it has a variable number of what are commonly called "buckets". The number of buckets can change over time (as entries are added and removed) but it is always a power of 2.
Let's say a given HashMap has 16 buckets. When you call put() to add an entry, the hashCode() of the key is calculated and then a mask is taken depending on the size of the buckets. If you (bitwise) AND the hashCode() with 15 (0x0F) you will get the last 4 bits, equaling a number between 0 and 15 inclusive:
int factor = 4;
int buckets = 1 << (factor-1) - 1; // 16
int mask = buckets - 1; // 15
int code = key.hashCode();
int dest = code & mask; // a number from 0 to 15 inclusive
Now if there is already an entry in that bucket you have what's called a collision. There are multiple ways of dealing with this but the one used by HashMap (and is probably the most common overall) is bucketing. All the entries with the same masked hashCode are put in a list of some kind.
So to find if a given key is in the map already:
Calculate the masked hash code;
Find the appropriate bucket;
If it's empty, key not found;
If is isn't empty, loop through all entries in the bucket checking equals().
Looking through a bucket is a linear (O(n)) operation but it's on a small subset. The hashcode bucket determination is essentially constant (O(1)). If buckets are sufficiently small then access to a HashMap is usually described as "near O(1)".
You can make a couple of observations about this.
Firstly, if you have a bunch of objects that all return 42 as their hash code a HashMap will still work but it will operate as an expensive list. Access will be O(n) (as everything will be in the same bucket regardless of the number of buckets). I've actually been asked this in an interview.
Secondly, returning to your original point, if two objects are equal (meaning a.equals(b) == b.equals(a) == true) but have different hash codes then the HashMap will go looking in (probably) the wrong bucket resulting in unpredictable and undefined behaviour.
This is discussed in the Item 8: Always override hashCode when you override equals of Joshua Bloch's Effective Java:
A common source of bugs is the failure to override the hashCode method. You must
override hashCode in every class that overrides equals. Failure to do so will
result in a violation of the general contract for Object.hashCode, which will pre-
vent your class from functioning properly in conjunction with all hash-based collec-
tions, including HashMap, HashSet, and Hashtable.
Here is the contract, copied from the
java.lang.Object specification:
Whenever it is invoked on the same object more than once during an execution of an application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application.
If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
It is not required that if two objects are unequal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hash tables.
The key provision that is violated when you fail to override hashCode is
the second one: Equal objects must have equal hash codes. Two distinct
instances may be logically equal according to the class’s equals method, but to
the Object class’s hashCode method, they’re just two objects with nothing much
in common. Therefore object’s hashCode method returns two seemingly random
numbers instead of two equal numbers as required by the contract.
For example, consider the following simplistic PhoneNumber class, whose
equals method is constructed according to the recipe in Item 7:
public final class PhoneNumber {
private final short areaCode;
private final short exchange;
private final short extension;
public PhoneNumber(int areaCode, int exchange,
int extension) {
rangeCheck(areaCode, 999, "area code");
rangeCheck(exchange, 999, "exchange");
rangeCheck(extension, 9999, "extension");
this.areaCode = (short) areaCode;
this.exchange = (short) exchange;
this.extension = (short) extension;
}
private static void rangeCheck(int arg, int max,
String name) {
if (arg < 0 || arg > max)
throw new IllegalArgumentException(name +": " + arg);
}
public boolean equals(Object o) {
if (o == this)
return true;
if (!(o instanceof PhoneNumber))
return false;
PhoneNumber pn = (PhoneNumber)o;
return pn.extension == extension &&
pn.exchange == exchange &&
pn.areaCode == areaCode;
}
// No hashCode method!
... // Remainder omitted
}
Suppose you attempt to use this class
with a HashMap:
Map m = new HashMap();
m.put(new PhoneNumber(408, 867, 5309), "Jenny");
At this point, you might expect
m.get(new PhoneNumber(408 , 867,
5309)) to return "Jenny", but it
returns null. Notice that two PhoneNumber instances are
involved: One is used for insertion
into the HashMap, and a second, equal,
instance is used for (attempted)
retrieval. The PhoneNumber class’s
failure to override hashCode causes
the two equal instances to have
unequal hash codes, in violation of
the hashCode contract. Therefore the
get method looks for the phone number
in a different hash bucket from the
one in which it was stored by the put
method. Fixing this problem is as
simple as providing a proper hashCode
method for the PhoneNumber class.
[...]
See the Chapter 3 for the full content.
Containers like HashSet rely on the hash function to determine where to put it, and where to get it from when asked for it. If A.equals(B), then a HashSet is expecting A to be in the same place as B. If you put A in with value V, and look up B, you should expect to get V back (since you've said A.equals(B)). But if A.hashcode() != B.hashcode(), then the hashset may not find where you put it.
Here's a little example:
Set<Foo> myFoos = new HashSet<Foo>();
Foo firstFoo = new Foo(123,"Alpha");
myFoos.add(firstFoo);
// later in the processing you get another Foo from somewhere
Foo someFoo = //use imagination here...;
// maybe you get it from a database... and it's equal to Foo(123,"Alpha)
if (myFoos.contains(someFoo)) {
// maybe you win a million bucks.
}
So, imagine that the hashCode that gets created for firstFoo is 99999 and it winds up at a specific spot in the myFoos HashSet. Later when you get the someFoo and you look for it in the myFoos HashSet, it needs to generate the same hashCode so you can find it.
It's exactly because of hash tables.
Because of the possibility of hash code collisions, hash tables need to check identity as well, otherwise the table can't determine if it found the object it was looking for, or one with the same hash code. So every get() in a hash table calls key.equals(potentialMatch) before returning a value.
If equals() and hashCode() are inconsistent you can get very inconsistent behavior. Say for two objects, a and b, a.equals(b) returns true, but a.hashCode() != b.hashCode(). Insert a and a HashSet will return false for .contains(b), but a List created from that set will return true (because the list doesn't use hash codes).
HashSet set = new HashSet();
set.add(a);
set.contains(b); // false
new ArrayList(set).contains(b); // true
Obviously, that could be bad.
The idea behind this is that two objects are "equal" if all of their fields have equal values. If all of fields have equal values, the two objects should have the same hash value.

Categories