Overwriting the hashCode() and equals() method in Java - java

As the heading suggests, my question has something to do with overriding the hashCode() and equals() method. However, this is also not completely true, I just did not know another way to summarize my question.
I have an object Label that contains multiple components, one of which is a List that contains multiple objects of type Node. An example of Label would be: [(n1, n2, n3), (n4, n5)]. I want to store all unique Label objects generated in a LinkedHashSet. However, this is not working as expected. Suppose that the LinkedHashSet currently contains the Label described above, and that we now generated a new Label called other which turned out to contain the same nodes as the already added label, thus also [(n1, n2, n3), (n4, n5)]. Since it has the same list of nodes, the other components in Label are also identical. I won't explain here why, just assume that it is, because that is the case. However, when checking if the LinkedHashSet already contains the Label it returns false, since the objects have different object ID's.
One approach would be to write a for-loop over the LinkedHashSet and compare the new label with all labels in the LinkedHashSet, but that would be very expensive in terms of running time, so I am looker for a cheaper option. Any suggestion is welcome!
Another approach would be to adapt the equals() method, but this was not working out, since I also have to adapt the hashCode() method and I do not know what to change this to to get this working.

Implementing equals(...) and hashCode(...) is the right approach here. Usually people use their IDE to generate those methods for them, here is what this would look like with Java 7+:
#Override
public boolean equals(Object o) {
if (this == o) return true;
if (o == null || getClass() != o.getClass()) return false;
Label label = (Label) o;
return Objects.equal(nodes, label.nodes);
}
#Override
public int hashCode() {
return Objects.hashCode(nodes);
}
If you have multiple fields this become:
#Override
public boolean equals(Object o) {
if (this == o) return true;
if (o == null || getClass() != o.getClass()) return false;
Label label = (Label) o;
return Objects.equal(nodes, label.nodes) &&
Objects.equal(other, label.other);
}
#Override
public int hashCode() {
return Objects.hashCode(nodes, other);
}
However this will only work if the implementations for Node also implement equals(...) and hashCode(...).

You definitely don't want to "for-loop over the LinkedHashSet and compare..." Detecting duplicates is what the LinkedHashSet is supposed to do. In order to do that, it needs appropriate implementations of equals and hashCode.
You probably understand that you need to implement equals such that it returns true when a Label is "equal to" another Label, however you define that. If two labels are equal when they have the same nodes, then yeah, you pretty much have to look at all the nodes and check that they're the same.
That leaves hashCode. You must implement hashCode to be consistent with equals, that is, if two labels are equal, they must have the same hash code. That's because LinkedHashSet is going to use the hash code to determine the bucket in which a Label resides, and then use equals to compare the new Label with the ones that already exist in bucket. If two labels are equal but generate different hash codes, LinkedHashSet won't be able to detect them as duplicates.
The simplest thing to do would be to incorporate the hash codes of all the nodes into the hash code of the Label. Something like:
int hashCode() {
int hc = 1;
for (Node n : allMyNodes) {
hc = hc * 31 + n.hashCode();
}
return hc;
}
If there are lots of nodes and it's unlikely that two labels will share the same node unless they're "equal," you could just use the hash code of the first node, instead of rummaging through them all.

Use #EqualsAndHashCode from lombok at your Node (or Label) class.
Then you can add nodes to your LinkedHashSet.

Related

Why does super.hashCode give different results on objects from the same Class?

I have a class DebugTo where if I have two equal instances el1, el2 a HashSet of el1 will not regard el2 as contained.
import java.util.Objects;
public class DebugTo {
public String foo;
public DebugTo(String foo) {
this.foo = foo;
}
#Override
public int hashCode() {
System.out.println(super.hashCode());
return Objects.hash(super.hashCode(), foo);
}
#Override
public boolean equals(Object o) {
if (this == o) {
return true;
}
if (o == null || getClass() != o.getClass()) {
return false;
}
DebugTo that = (DebugTo) o;
return Objects.equals(foo, that.foo);
}
}
var el1 = new DebugTo("a");
var el2 = new DebugTo("a");
System.out.println("Objects.equals(el1, el2): " + Objects.equals(el1, el2));
System.out.println("Objects.equals(el2, el1): " + Objects.equals(el2, el1));
System.out.println("el1.hashCode(): " + el1.hashCode());
System.out.println("el2.hashCode(): " + el2.hashCode());
Objects.equals(el1, el2): true
Objects.equals(el2, el1): true
1205483858
el1.hashCode(): -1284705008
1373949107
el2.hashCode(): -357249585
From my analysis I have gathered that:
HashSet::contains calls hashCode not equals (relying on the Objects.equals(a, b) => a.hashSet() == b.hashSet())
super.hashCode() gives a different value both times.
Why does super.hashCode() give different results for el1 and el2? since they are of the same class, they have the same super class and so I expect super.hashCode() to give the same result for both.
The hashCode method was probably autogenerated by eclipse. If not answered above, why is super.hashCode used wrong here?
Because the default implementations of the equals and hashCode methods (which go hand in hand - you always override both or neither) treat any 2 different instances as not equal to each other. If you want different behaviour, you override equals and hashCode, and do not invoke super.equals / super.hashCode, or there'd be no point.
HashSets work as follows: They use .hashCode() to know which 'bucket' to put the object into, and if 2 objects end up in the same bucket, equals is used only on those very few objects to double check.
In other words, these are the rules:
If a.equals(b), then b.equals(a) must be true.
a.equals(a) must always be true.
If a.equals(b) and b.equals(c), a.equals(c) must be true.
If a.equals(b), a.hashCode() == b.hashCode() must be true.
The reverse of 4 does not hold: If a.hashCode() == b.hashCode(), that doesn't mean a.equals(b), and hashset does not require it.
Therefore, return 1; is a legal implementation of hashCode.
If a class has really bad hashcode spread (such as the idiotic but legal option listed in bullet 6), then the performance of hashset will be very bad. e.g. set.containsKey(k) which ordinarily takes constant time, will take linear time instead if your objects are all not-equal but have the same hashCode. Hence, do try to ensure hashcodes are as different as they can be.
HashSet and HashMap require stable objects, meaning, their behaviour when calling hashCode and equals cannot change over time.
From the above it naturally follows that overriding equals and not hashCode or vice versa is necessarily broken.
Breaking any of the above rules does not, generally, result in a compiler error. It often doesn't even result in an exception. But instead it results in bizarre behaviour with hashsets and hashmaps: You put an k/v pair in the map, and then immediately ask for the value back and you get null back instead of what you put in, or something completely different. Just an example.
NB: One weird effect of all this is that you cannot add equality-affecting state to subclasses, unless you apply a caveat that most classes including all classes in the core libraries don't apply.
Imagine as an example that we invent the notion of a 'coloured' arraylist. You could have a red '["Hello", "World"]' list, and a blue one:
class ColoredArrayList extends ArrayList {
Color color;
public ColoredArrayList(Color c) {
this.color = color;
}
}
You'd probably want an empty red list to not equal an empty blue one. However, that is impossible if you intend to follow the rules. That's because the equals/hashCode impl of ArrayList itself considers any other list equal to itself if it has the same items in the same order. Therefore:
List<String> a = new ArrayList<String>();
ColoredList<String> b = new ColoredList<String>(Color.RED);
a.equals(b); // this is true, and you can't change that!
Therefore, b.equals(a) must also be true (your impl of equals has to say that an empty red list is equal to an empty plain arraylist), and given that an empty arraylist is also equal to an empty blue one, given that a.equals(b) and b.equals(c) implies that a.equals(c), a red empty list has to be equal to a blue empty list.
There is an easy solution for this that brings in new problems, and a hard solution that is objectively better.
The easy solution is to define that you can't be equal to anything except exact instances of yourself, as in, any subclass is insta-disqualified. Imagine ArrayList's equals method returns false if you call it with an instance of a subclass of ArrayList. Then you could make your colored list just fine. But, this isn't necessarily great, for example, you probably want an empty LinkedList and an empty ArrayList to be equal.
The harder solution is to introduce a second method, canEqual, and call it. You override canEqual to return 'if other is instanceof the nearest class in my hierarchy that introduces equality-relevant state'. Thus, your ColoredList should have #Override public boolean canEqual(Object other) { return other instanceof ColoredList; }.
The problem is, all classes need to have that and use it, or it's not going to work, and ArrayList does not have it. And you can't change that.
Project Lombok can generate this for you if you prefer. It's not particularly common; I'd only use it if you really know you need it.

Uses of hashcode in Java apart from hashing collections [duplicate]

In Java, obj.hashCode() returns some value. What is the use of this hash code in programming?
hashCode() is used for bucketing in Hash implementations like HashMap, HashTable, HashSet, etc.
The value received from hashCode() is used as the bucket number for storing elements of the set/map. This bucket number is the address of the element inside the set/map.
When you do contains() it will take the hash code of the element, then look for the bucket where hash code points to. If more than 1 element is found in the same bucket (multiple objects can have the same hash code), then it uses the equals() method to evaluate if the objects are equal, and then decide if contains() is true or false, or decide if element could be added in the set or not.
From the Javadoc:
Returns a hash code value for the object. This method is supported for the benefit of hashtables such as those provided by java.util.Hashtable.
The general contract of hashCode is:
Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application.
If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hashtables.
As much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects. (This is typically implemented by converting the internal address of the object into an integer, but this implementation technique is not required by the Java programming language.)
hashCode() is a function that takes an object and outputs a numeric value. The hashcode for an object is always the same if the object doesn't change.
Functions like HashMap, HashTable, HashSet, etc. that need to store objects will use a hashCode modulo the size of their internal array to choose in what "memory position" (i.e. array position) to store the object.
There are some cases where collisions may occur (two objects end up with the same hashcode), and that, of course, needs to be solved carefully.
The value returned by hashCode() is the object's hash code, which is the object's memory address in hexadecimal.
By definition, if two objects are equal, their hash code must also be equal. If you override the equals() method, you change the way two objects are equated and Object's implementation of hashCode() is no longer valid. Therefore, if you override the equals() method, you must also override the hashCode() method as well.
This answer is from the java SE 8 official tutorial documentation
A hashcode is a number generated from any object.
This is what allows objects to be stored/retrieved quickly in a Hashtable.
Imagine the following simple example:
On the table in front of you. you have nine boxes, each marked with a number 1 to 9. You also have a pile of wildly different objects to store in these boxes, but once they are in there you need to be able to find them as quickly as possible.
What you need is a way of instantly deciding which box you have put each object in. It works like an index. you decide to find the cabbage so you look up which box the cabbage is in, then go straight to that box to get it.
Now imagine that you don't want to bother with the index, you want to be able to find out immediately from the object which box it lives in.
In the example, let's use a really simple way of doing this - the number of letters in the name of the object. So the cabbage goes in box 7, the pea goes in box 3, the rocket in box 6, the banjo in box 5 and so on.
What about the rhinoceros, though? It has 10 characters, so we'll change our algorithm a little and "wrap around" so that 10-letter objects go in box 1, 11 letters in box 2 and so on. That should cover any object.
Sometimes a box will have more than one object in it, but if you are looking for a rocket, it's still much quicker to compare a peanut and a rocket, than to check a whole pile of cabbages, peas, banjos, and rhinoceroses.
That's a hash code. A way of getting a number from an object so it can be stored in a Hashtable. In Java, a hash code can be any integer, and each object type is responsible for generating its own. Lookup the "hashCode" method of Object.
Source - here
Although hashcode does nothing with your business logic, we have to take care of it in most cases. Because when your object is put into a hash based container(HashSet, HashMap...), the container puts/gets the element's hashcode.
hashCode() is a unique code which is generated by the JVM for every object creation.
We use hashCode() to perform some operation on hashing related algorithm like Hashtable, Hashmap etc..
The advantages of hashCode() make searching operation easy because when we search for an object that has unique code, it helps to find out that object.
But we can't say hashCode() is the address of an object. It is a unique code generated by JVM for every object.
That is why nowadays hashing algorithm is the most popular search algorithm.
One of the uses of hashCode() is building a Catching mechanism.
Look at this example:
class Point
{
public int x, y;
public Point(int x, int y)
{
this.x = x;
this.y = y;
}
#Override
public boolean equals(Object o)
{
if (this == o) return true;
if (o == null || getClass() != o.getClass()) return false;
Point point = (Point) o;
if (x != point.x) return false;
return y == point.y;
}
#Override
public int hashCode()
{
int result = x;
result = 31 * result + y;
return result;
}
class Line
{
public Point start, end;
public Line(Point start, Point end)
{
this.start = start;
this.end = end;
}
#Override
public boolean equals(Object o)
{
if (this == o) return true;
if (o == null || getClass() != o.getClass()) return false;
Line line = (Line) o;
if (!start.equals(line.start)) return false;
return end.equals(line.end);
}
#Override
public int hashCode()
{
int result = start.hashCode();
result = 31 * result + end.hashCode();
return result;
}
}
class LineToPointAdapter implements Iterable<Point>
{
private static int count = 0;
private static Map<Integer, List<Point>> cache = new HashMap<>();
private int hash;
public LineToPointAdapter(Line line)
{
hash = line.hashCode();
if (cache.get(hash) != null) return; // we already have it
System.out.println(
String.format("%d: Generating points for line [%d,%d]-[%d,%d] (no caching)",
++count, line.start.x, line.start.y, line.end.x, line.end.y));
}

Is it a bad practice to return super.equals and super.hashcode in a class?

Let's say i have this class:
public class Person {
private Integer idFromDatabase;
private String name;
//Getters and setters
}
The field idFromDatabase is the attribute that should be verified in equals and used to create the hashCode. But sometimes, i am working with a list of People in memory, and have not yet stored the objects on the database, so the idFromDatabase is null for all objects, which would cause hashCode to return the same value for every object.
I solved this issue by adding the following to equals and hashCode metods:
if(idFromDatabase == null) return super.equals(o);
and
if(idFromDatabase == null) return super.hashCode();
It worked, but is it safe? Can i do it for every class that relies on a database field for equality check?
if(idFromDatabase == null) return super.equals(o); is incorrect as super's equals (if implemented correctly) does a getClass() check, which will of course be different, thus super.equals will always be false.
As already noted by #Jeroen Vannevel, if you are likely to end up with having 2 or more objects not stored in database holding the exact same information, then this technique will not help you in identifying this.
#Solver is also quite true in that a subclass is meant to have different behavior than its superclass, so you shouldn't return that they're equal.
However, in your particular example, you are just extending the Object class, so your assumption that it is safe is true (if we exclude the possibility of having 2 not-yet-persisted same Persons in memory).
Object provides the most basic equals method:
For any non-null reference values x and y, this method returns true
if and only if x and y refer to the same object
(x == y has the value true).
The hashCode method of Object:
As much as is reasonably practical, [...] does return distinct integers for distinct objects
These definitions make it clear that if you're only extending Object, then this technique is safe.
From your description I'm inferring that when comparing two People objects:
If both have an ID, they are equal if they have the same ID, even if they have different names
Otherwise, they are only equal if they are the same instance.
If that's correct, then:
#Override
public boolean equals(Object obj) {
if (this == obj)
return true;
if (this.idFromDatabase == null)
return false;
if (! (obj instanceof People))
return false;
People that = (People)obj;
if (that.idFromDatabase == null)
return false;
return this.idFromDatabase.equals(that.idFromDatabase);
}
#Override
public int hashCode() {
// Use super.hashCode to distribute objects without an idFromDatabase
return (this.idFromDatabase != null ? this.idFromDatabase.hashCode() : super.hashCode());
}
There are a few problems with your reasoning.
equals and hashcode are not subtype-friendly so it doesn't make sense to start thinking about super calls,
super is Object anyway so it's equals and hashcode are useless in this context.
What if you have two Person objects referring to the same person, but only one is stored in the database. Are they the same or different?
One universal solution is to make two classes
Person which stores a 'local' person. Doesn't contain idFromDatabase,
StoredPerson which contains idFromDatabase and a Person (or all fields of Person, but this is harder to maintain)
This way, at least equals and hashcode are well-defined and well-behaved at all times.
Implementation and usage
If you use any kind of Set/Map to store people, you now have two of them. When you save new Persons to database, you remove them from the 'local' Set/Map, wrap them in StoredPerson, and put them in the 'database' Set/Map.
If you want a searchable list of all people, make one with all Persons from both datasets into one. When you find a Person you're interested in and want to retrieve the idFromDatabase, if any, then you'd do good to prepare a map from Person to StoredPerson beforehand.
Thus you need at least,
Set<Person> localPeople = new HashSet<>();
Map<Person, StoredPerson> storedPeople = new HashMap<>();
and something like this:
void savePerson(Person person) {
synchronized (lockToPreserveInvariants) {
int id = db.insert(person);
StoredPerson sp = new StoredPerson(id, person);
localPeople.remove(person);
storedPeople.put(person, sp);
}
}

how can I implement the union and intersection of set theory with sets of objects

hi i've seen this post how to implement the union and intersection when it you have two sets of data,that are strings.how can i do the same when my sets contain objects,and i want to get the union of only one property of each object?
But I want to override them somehow so that it wont add an object if there's an object already in my set that has the same value in a selected property.If i'm not clear enough tell me so i can write an example.
I think the best way to do this is to use a ConcurrentMap.
ConcurrentMap<String, MyType> map = new ConcurrentHashMap<>();
// the collection to retain.
for(MyType mt: retainedSet) map.put(mt.getKey(), mt);
// the collection to keep if not duplicated
for(MyType mt: onlyIfNewSet) map.putIfAbsent(mt.getKey(), mt);
// to get the intersection.
Set<String> toKeep = new HashSet<>();
for(MyType mt: onlyIfNewSet) toKeep.add(mt.getKey());
// keep only the keys which match.
map.keySet().retainAll(toKeep);
Google Guava, has Sets class which contains these methods and many more.
As in this answer, use Collection methods retainAll(Collection)- intersection and #addAll(Collection)- union.
Since those methods use equals, you also have to override equals method in your Class and implement one-property based comparison.
In case it's simple comparison, here's an example (generated by my IDEA):
public class Main {
private Integer age;
...
#Override
public boolean equals(Object o) {
if (this == o) return true;
if (o == null || getClass() != o.getClass()) return false;
Main main = (Main) o;
if (age != null ? !age.equals(main.age) : main.age != null) return false;
return true;
}
Intersection is using contains, which uses equals. You should implement equals() method on the class that you want to do intersection.
I didn't find specific comments about set.addAll() implementation, but most probably it also uses equals() to determine if an object is already on the set.
If you want to compare only by a field, your equals() should only compare this field.

What is the use of hashCode in Java?

In Java, obj.hashCode() returns some value. What is the use of this hash code in programming?
hashCode() is used for bucketing in Hash implementations like HashMap, HashTable, HashSet, etc.
The value received from hashCode() is used as the bucket number for storing elements of the set/map. This bucket number is the address of the element inside the set/map.
When you do contains() it will take the hash code of the element, then look for the bucket where hash code points to. If more than 1 element is found in the same bucket (multiple objects can have the same hash code), then it uses the equals() method to evaluate if the objects are equal, and then decide if contains() is true or false, or decide if element could be added in the set or not.
From the Javadoc:
Returns a hash code value for the object. This method is supported for the benefit of hashtables such as those provided by java.util.Hashtable.
The general contract of hashCode is:
Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application.
If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hashtables.
As much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects. (This is typically implemented by converting the internal address of the object into an integer, but this implementation technique is not required by the Java programming language.)
hashCode() is a function that takes an object and outputs a numeric value. The hashcode for an object is always the same if the object doesn't change.
Functions like HashMap, HashTable, HashSet, etc. that need to store objects will use a hashCode modulo the size of their internal array to choose in what "memory position" (i.e. array position) to store the object.
There are some cases where collisions may occur (two objects end up with the same hashcode), and that, of course, needs to be solved carefully.
The value returned by hashCode() is the object's hash code, which is the object's memory address in hexadecimal.
By definition, if two objects are equal, their hash code must also be equal. If you override the equals() method, you change the way two objects are equated and Object's implementation of hashCode() is no longer valid. Therefore, if you override the equals() method, you must also override the hashCode() method as well.
This answer is from the java SE 8 official tutorial documentation
A hashcode is a number generated from any object.
This is what allows objects to be stored/retrieved quickly in a Hashtable.
Imagine the following simple example:
On the table in front of you. you have nine boxes, each marked with a number 1 to 9. You also have a pile of wildly different objects to store in these boxes, but once they are in there you need to be able to find them as quickly as possible.
What you need is a way of instantly deciding which box you have put each object in. It works like an index. you decide to find the cabbage so you look up which box the cabbage is in, then go straight to that box to get it.
Now imagine that you don't want to bother with the index, you want to be able to find out immediately from the object which box it lives in.
In the example, let's use a really simple way of doing this - the number of letters in the name of the object. So the cabbage goes in box 7, the pea goes in box 3, the rocket in box 6, the banjo in box 5 and so on.
What about the rhinoceros, though? It has 10 characters, so we'll change our algorithm a little and "wrap around" so that 10-letter objects go in box 1, 11 letters in box 2 and so on. That should cover any object.
Sometimes a box will have more than one object in it, but if you are looking for a rocket, it's still much quicker to compare a peanut and a rocket, than to check a whole pile of cabbages, peas, banjos, and rhinoceroses.
That's a hash code. A way of getting a number from an object so it can be stored in a Hashtable. In Java, a hash code can be any integer, and each object type is responsible for generating its own. Lookup the "hashCode" method of Object.
Source - here
Although hashcode does nothing with your business logic, we have to take care of it in most cases. Because when your object is put into a hash based container(HashSet, HashMap...), the container puts/gets the element's hashcode.
hashCode() is a unique code which is generated by the JVM for every object creation.
We use hashCode() to perform some operation on hashing related algorithm like Hashtable, Hashmap etc..
The advantages of hashCode() make searching operation easy because when we search for an object that has unique code, it helps to find out that object.
But we can't say hashCode() is the address of an object. It is a unique code generated by JVM for every object.
That is why nowadays hashing algorithm is the most popular search algorithm.
One of the uses of hashCode() is building a Catching mechanism.
Look at this example:
class Point
{
public int x, y;
public Point(int x, int y)
{
this.x = x;
this.y = y;
}
#Override
public boolean equals(Object o)
{
if (this == o) return true;
if (o == null || getClass() != o.getClass()) return false;
Point point = (Point) o;
if (x != point.x) return false;
return y == point.y;
}
#Override
public int hashCode()
{
int result = x;
result = 31 * result + y;
return result;
}
class Line
{
public Point start, end;
public Line(Point start, Point end)
{
this.start = start;
this.end = end;
}
#Override
public boolean equals(Object o)
{
if (this == o) return true;
if (o == null || getClass() != o.getClass()) return false;
Line line = (Line) o;
if (!start.equals(line.start)) return false;
return end.equals(line.end);
}
#Override
public int hashCode()
{
int result = start.hashCode();
result = 31 * result + end.hashCode();
return result;
}
}
class LineToPointAdapter implements Iterable<Point>
{
private static int count = 0;
private static Map<Integer, List<Point>> cache = new HashMap<>();
private int hash;
public LineToPointAdapter(Line line)
{
hash = line.hashCode();
if (cache.get(hash) != null) return; // we already have it
System.out.println(
String.format("%d: Generating points for line [%d,%d]-[%d,%d] (no caching)",
++count, line.start.x, line.start.y, line.end.x, line.end.y));
}

Categories