Java Set collection - override equals method - java

Is there any way to override the the equals method used by a Set datatype? I wrote a custom equals method for a class called Fee. Now I have a LnkedList of Fee and I want to ensure that there are no duplicated entries. Thus I am considering using a Set insted of a LinkedList, but the criteria for deciding if two fees are equal resides in the overriden equals method in the Fee class.
If using a LinkedList, I will have to iterate over every list item and call the overriden equals method in the Fee class with the remaining entries as a parameter. Just reading this alone sounds like too much processing and will add to computational complexity.
Can I use Set with an overridden equals method? Should I?

As Jeff Foster said:
The Set.equals() method is only used to compare two sets for equality.
You can use a Set to get rid of the duplicate entries, but beware: HashSet doesn't use the equals() methods of its containing objects to determine equality.
A HashSet carries an internal HashMap with <Integer(HashCode), Object> entries and uses equals() as well as the equals method of the HashCode to determine equality.
One way to solve the issue is to override hashCode() in the Class that you put in the Set, so that it represents your equals() criteria
For Example:
class Fee {
String name;
public boolean equals(Object o) {
return (o instanceof Fee) && ((Fee)o.getName()).equals(this.getName());
}
public int hashCode() {
return name.hashCode();
}
}

You can and should use a Set to hold an object type with an overridden equals method, but you may need to override hashCode() too. Equal objects must have equal hash codes.
For example:
public Fee{
public String fi;
public String fo;
public int hashCode(){
return fi.hashCode() ^ fo.hashCode();
}
public boolean equals(Object obj){
return fi.equals(obj.fi) && fo.equals(obj.fo);
}
}
(With null checks as necessary, of course.)
Sets often use hashCode() to optimize performance, and will misbehave if your hashCode method is broken. For example, HashSet uses an internal HashMap.
If you check the source code of HashMap, you'll see it depends on both the hashCode() and the equals() methods of the elements to determine equality:
if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
If the hash is not generated correctly, your equals method may never get called.
To make your set faster, you should generate distinct hash codes for objects that are not equal, wherever possible.

Set uses the equals method of the object added to the set. The JavaDoc states
A collection that contains no duplicate elements. More formally, sets contain no pair of elements e1 and e2 such that e1.equals(e2), and at most one null element.
The Set.equals() method is only used to compare two sets for equality. It's never used as part of adding/remove items from the set.

One solution would be to use a TreeSet with a Comparator.
From the documentation:
TreeSet instance performs all element comparisons using its compareTo (or compare) method, so two elements that are deemed equal by this method are, from the standpoint of the set, equal.
This approach would be much faster than using a LinkedList, but a bit slower than a HashSet (ln(n) vs n).
It's worth noting a one side effect of using TreeSet would be that your set is sorted.

There are PredicatedList or PredicatedSet in Apache Commons Collection

Related

Why do two different HashSets with the same data have the same HashCode?

I recently ran across a problem on leetcode which I solved with a nested hashset. This is the problem, if you're interested: https://leetcode.com/problems/group-anagrams/.
My intuition was to add all of the letters of each word into a hashset, then put that hashset into another hashset. At each iteration, I would check if the hashset already existed, and if it did, add to the existing hashset.
Oddly enough, that seems to work. Why do 2 hashsets share the same hashcode if they are different objects? Would something like if(set1.hashCode() == set2.hashCode()) doStuff() be valid code?
This is expected. HashSet extends AbstractSet. The hashCode() method in AbstractSet says:
Returns the hash code value for this set. The hash code of a set is defined to be the sum of the hash codes of the elements in the set, where the hash code of a null element is defined to be zero. This ensures that s1.equals(s2) implies that s1.hashCode()==s2.hashCode() for any two sets s1 and s2, as required by the general contract of Object.hashCode.
This implementation iterates over the set, calling the hashCode method on each element in the set, and adding up the results.
Here's the code from AbstractSet:
public int hashCode() {
int h = 0;
Iterator<E> i = iterator();
while (i.hasNext()) {
E obj = i.next();
if (obj != null)
h += obj.hashCode();
}
return h;
}
Why do 2 hashsets share the same hashcode if they are different objects?
With HashSet, the hashCode is calculated using the contents of the set. Since it's just numeric addition, the order of addition doesn't matter – just add them all up. So it makes sense that you have two sets, each containing objects which are equivalent (and thus should have matching hashCode() values), and then the sum of hashCodes within each set is the same.
Would something like if(set1.hashCode() == set2.hashCode()) doStuff() be valid code?
Sure.
EDIT: The best way of comparing two sets for equality is to use equals(). In the case of AbstractSet, calling set1.equals(set2) would result in individual calls to equals() at the level of the objects within the set (as well as some other checks).
Why do two different HashSets with the same data have the same
HashCode?
Actually this is needed to fulfill another need that is specified in Java.
The equals method of Set is overridden to take in consideration that equals returns true (example a.equals(b)) if:
a is of type Set and b is of type Set.
both a and b have exactly the same size.
a contains all elements of b.
b contains all elements of a.
Since the default equals (which compares only the memory reference to be the same) is overridden for Set, according to java guidelines the hashCode method has to be overridden as well. So, this custom implementation of hashCode is provided in order to match with the custom implementation of equals.
In order to see why it is necessary to override hashCode method when the equals method is overridden, you can take a look at this previous answer of mine.
Why do 2 hashsets share the same hashcode if they are different
objects
Because as explained above this is needed so that Set can have the custom functionality for equals that it currently has.
If you want to just check if a and b are different instances of set you can still check this with operators == and !=.
a == b -> true means a and b point to the same instance of Set in memory
a != b -> true means a and b point to different instances of Set in memory

Is it necessary to override equals and hashCode methods in a class if I use the objects of class to insert in a TreeSet only? [duplicate]

I have a quick question about TreeSet collections and hashCode methods. I have a TreeSet and I'm adding objects to it, before I add an object, I check to see if it exists in the TreeSet using the contains method.
I have 2 distinct objects, each of which produce a distinct hashCode using my implementation of the hashCode method, example below:
public int hashCode()
{
int hash = 7;
hash = hash * 31 + anAttribute.hashCode();
hash = hash * 31 + anotherAttribute.hashCode();
hash = hash * 31 + yetAnotherAttribute.hashCode();
return hash;
}
The hashCodes for a particular run are: 76126352 and 76126353 (the objects only differ by one digit in one attribute).
The contains method is returning true for these objects, even though the hashCodes are different. Any ideas why? This is really confusing and help would really be appreciated.
TreeSet does not use hashCode at all. It uses either compareTo or the Comparator you passed to the constructor. This is used by methods like contains to find objects in the set.
So the answer to your question is that your compareTo method or your Comparator are defined so that the two objects in question are considered equal.
From the javadocs:
a TreeSet instance performs all
element comparisons using its
compareTo (or compare) method, so two
elements that are deemed equal by this
method are, from the standpoint of the
set, equal.
From Java Doc:
If two objects are equal according to the equals(Object) method,
then calling the hashCode method on each of the two objects must
produce the same integer result.
Means: the objects you use for hashing are not equal.
You need to read Joshua Bloch's "Effective Java" chapter 3. It explains the equals contract and how to properly override equals, hashCode, and compareTo.
You don't need to checked if it is contained, because the insert() basically does the same operation (i.e. searching the proper position) on its way to the insertion point. If the object can't be inserted (i.e., the object is already contained), insert returns false.

Set that only needs equals

I'm curious, is there any Set that only requires .equals() to determine the uniqueness?
When looking at Set classes from java.util, I can only find HashSet which needs .hashCode() and TreeSet (or generally SortedSet) which requires Comparator. I cannot find any class that use only .equals().
Does it make sense that if I have .equals() method, it is sufficient to use it to determine object uniqueness? Thus have a Set implementation that only need to use .equals()? Or did I miss something here that .equals() are not sufficient to determine object uniqueness in Set implementation?
Note that I am aware of Java practice that if we override .equals(), we should override .hashCode() as well to maintain contract defined in Object.
On its own, the equals method is perfectly sufficient to implement a set correctly, but not to implement it efficiently.
The point of a hash code or a comparator is that they provide ways to arrange objects in some ordered structure (a hash table or a tree) which allows for fast finding of objects. If you have only the equals method for comparing pairs of objects, you can't arrange the objects in any meaningful or clever order; you have only a loose jumble of objects.
For example, with only the equals method, ensuring that objects in a set are unique requires comparing each added object to every other object in the jumble. Adding n objects requires
n * (n - 1) / 2 comparisons. For 5 objects that's 10 comparisons, which is fine, but for 1,000 objects that's 499,500 comparisons. It scales terribly.
Because it would not give scalable performance, no such set implementation is in the standard library.
If you don't care about hash table performance, this is a minimal implementation of the hashCode method which works for any class:
#Override
public int hashCode() {
return 0; // or any other constant
}
Although it is required that equal objects have equal hash codes, it is never required for correctness that inequal objects have inequal hash codes, so returning a constant is legal. If you put these objects in a HashSet or use them as HashMap keys, they will end up in a jumble in a single hash table bucket. Performance will be bad, but it will work correctly.
Also, for what it's worth, a minimal working Set implementation which only ever uses the equals method would be:
public class ArraySet<E> extends AbstractSet<E> {
private final ArrayList<E> list = new ArrayList<>();
#Override
public boolean add(E e) {
if (!list.contains(e)) {
list.add(e);
return true;
}
return false;
}
#Override
public Iterator<E> iterator() {
return list.iterator();
}
#Override
public int size() {
return list.size();
}
}
The set stores objects in an ArrayList, and uses list.contains to call equals on objects. Inherited methods from AbstractSet and AbstractCollection provide the bulk of the functionality of the Set interface; for example its remove method gets implemented via the list iterator's remove method. Each operation to add or remove an object or test an object's membership does a comparison against every object in the set, so it scales terribly, but works correctly.
Is this useful? Maybe, in certain special cases. For sets that are known to be very tiny, the performance might be fine, and if you have millions of these sets, this could save memory compared to a HashSet.
In general, though, it is better to write meaningful hash code methods and comparators, so you can have sets and maps that scale efficiently.
You should always override hashCode() when you override equals(). The contract for Object clearly specifies that two equal objects have identical hash codes, and a surprising number of data structures and algorithms depend on this behavior. It's not difficult to add a hashCode(), and if you skip it now, you'll eventually get hard-to-diagnose bugs when your objects start getting put in hash-based structures.
It would mathematically make sense to have a set that requires nothing but .equals().
But such an implementation would be so slow (linear time for every operation) that it has been decided that you can always give a hint.
Anyway, if there is really no way you can write a hashCode(), just make it always return 0 and you will have a structure that is as slow as the one you hoped for!

Do I need a equals and Hashcode method if my class implements comparable in Java?

I found this comment on can StringBuffer objects be keys in TreeSet in Java?
"There are 2 identifying strategies used with Maps in Java (more-or-less).
Hashing: An input "Foo" is converted into a best-as-possible attempt to generate a number that uniquely accesses an index into an array. (Purists, please don't abuse me, I am intentionally simplifying). This index is where your value is stored. There is the likely possibility that "Foo" and "Bar" actually generate the same index value meaning they would both be mapped to the same array position. Obviously this can't work and so that's where the "equals()" method comes in; it is used to disambiguate
Comparison: By using a comparative method you don't need this extra disambiguation step because comparison NEVER produces this collision in the first place. The only key that "Foo" is equal to is "Foo". A really good idea though is if you can is to define "equals()" as compareTo() == 0; for consistency sake. Not a requirement."
my question is as follows:
if my class implements comparable, then does it mean I dont have to override equals and hashcode method for using my objects as keys in Hash collections. eg
class Person implements Comparable<Person> {
int id;
String name;
public Person(int id, String name) {
this.id=id;
this.name=name;
}
public int compareTo(Person other) {
return this.id-other.id;
}
}
Now, can I use my Person objects in Hashable collections?
The article you brough is talking on TreeSet. a tree set is a tree with each node has a place defined by it's value in compare to the other values already in the tree.
a hashTable stores key/value pairs in a hash table. When using a Hashtable, you specify an object that is used as a key, and the value that you want linked to that key. The key is then hashed, and the resulting hash code is used as the index at which the value is stored within the table.
the difference between Hashable and TreeSet is that treeset don't need hashCode, it just need to know if you need the take the item left or right in the tree. for that you can use Compare and nothing more.
in hashTable a compare will suffice, because it's build differently, each object get to his cell by hashing it, not by comparing it to the items already in the collection.
so the answer is no, you can' use Person in hashtable just with compareTo. u must override hashCode() and equals() for that
i also suggest you read this article on hashtables
HashTable does use equals and hashCode. Every class has those methods. If you don't implement them, you inherit them.
Whether you need to implement them depends on whether the inherited version is suitable for your purposes. In particular, since Person has no specified superclass, it inherits the Object methods. That means a Person object is equal only to itself.
Do you need two distinct Person objects to be treated as being equal as HashTable keys?
if my class implements comparable, then does it mean I dont have to override equals and hashcode method for using my objects as keys in Hash collections. eg
No, you still need to implement equals() and hashCode(). The methods perform very different functions and cannot be replaced by compareTo().
equals() returns a boolean based on equality of the object. This is usually identity equality and not field equality. This can be very different from the fields used to compare an object in compareTo(...) although if it makes sense for the entity, the equals() method can be:
#Overrides
public boolean equals(Object obj) {
if (obj == null || obj.getClass() != getClass()) {
return false;
} else {
return compareTo((Person)obj) == 0;
}
}
hashCode() returns an integer value for the instance which is used in hash tables to calculate the bucket it should be placed in. There is no equivalent way to get this value out of compareTo(...).
TreeSet needs Comparable, to add values to right or left of tree. HashMap needs equals() and Hashcode() methods that are available from Object Class but you have to override them for your purpose.
If a class implements Comparable, that would suggest that instances of the class represent values of some sort; generally, when classes encapsulate values it will be possible for there to exist two distinct instances which hold the same value and should consequently be considered equivalent. Since the only way for distinct object instances to be considered equivalent is for them to override equals and hashCode, that would imply that things which implement Comparable should override equals and hashCode unless the encapsulated values upon which compare operates will be globally unique (implying that distinct instances should never be considered equivalent).
As a simple example, suppose a class includes a CreationRank field of type long; every time an instances is created, that member is set to a value fetched from a singleton AtomicLong, and Comparable uses that field to rank objects in the order of creation. No two distinct instances of the class will ever report the same CreationRank; consequently, the only way x.equals(y) should ever be true is if x and y refer to the same object instance--exactly the way the default equals and hashCode work.
BTW, having x.compare(y) return zero should generally imply that x.equals(y) will return true, and vice versa, but there are some cases where x.equals(y) may be false but x.compare(y) should nonetheless return zero. This may be the case when an object encapsulates some properties that can be ranked and others that cannot. Consider, for example, a hypohetical FutureAction type which encapsulates a DateTime and an implementation of a DoSomething interface. Such things could be ranked based upon the encapsulated date and time, but there may be no sensible way to rank two items which have the same date and time but different actions. Having equals report false while compare reports zero would make more sense than pretending that the clearly-non-equivalent items should be called "equal".

Does Hashcode equality imply refer reference based equality?

I read that to use equals() method in java we also have to override the hashcode() method and that the equal (logically) objects should have eual hashcodes, but doesn't that imply reference based equality! Here is my code for overridden equals() method, how should I override hashcode method for this:
#Override
public boolean equals(Object o)
{
if (!(o instanceof dummy))
return false;
dummy p = (dummy) o;
return (p.getName() == this.getName() && p.getId() == this.getId() && p.getPassword() == this.getPassword());
}
I just trying to learn how it works, so there are only three fields, namely name , id and password , and just trying to compare two objects that I define in the main() thats all! I also need to know if it is always necessary to override hashcode() method along with equals() method?
Hashcode equality does not imply anything. However, hashcode inequality should imply that equals will yield false, and any two items that are equal should always have the same hashcode.
For this reason, it is always wise to override hashcode with equals, because a number of data structures rely on it.
Even though failure to override hashCode() will only break usage of your class in HashSet, HashMap, and other hashCode dependent structures, you should still override hashCode() to maintain the contract described by Object.
The general strategy of most hashCode() implementations is to combine the hash codes of the fields used to determine equality. In your case, a reasonable hashCode() may look something like this:
public int hashCode(){
return this.getName().hashCode() ^ this.getId() ^ this.getPassword().hashCode();
}
You need to override hashCode() when you override equals(). Merely using equals() is not enough to require you to override hashCode().
In your code, you aren't actually comparing your fields' values. Use equals() instead of == to make your implementation of equal correct.
return (p.getName().equals(this.getName()) && ...
(Note that the above code can cause null reference exceptions if getName() returns null: you may want to use a utility class as described here)
And yes hashCode() would be called when you use some hashing data structure like HashMap,HashSet
You must override hashCode() in every
class that overrides equals(). Failure
to do so will result in a violation of
the general contract for
Object.hashCode(), which will prevent
your class from functioning properly
in conjunction with all hash-based
collections, including HashMap,
HashSet, and Hashtable.
from Effective Java, by Joshua Bloch
Also See
overriding-equals-and-hashcode-in-java
hashcode-and-equals
Nice article on equals() & hashCode()
The idea with hashCode() is that it is a unique representation of your object in a given space. Data structures that hold objects use hash codes to determine where to place objects. In Java, a HashSet for example uses the hash code of an object to determine which bucket that objects lies in, and then for all objects in that bucket, it uses equals() to determine whether it is a match.
If you don't override hashCode(), but do override equals(), then you will get to a point where you consider 2 objects to be equal, but Java collections don't see it the same way. This will lead to a lot of strange behaviour.

Categories