Implementing `hashCode()` for very simple classes

Implementing `hashCode()` for very simple classes - java

I have a very simple class with only one field member (e.g. String). Is it OK to implement hashCode() to simply return fieldMember.hashCode()? Or should I manipulate the field's hash code somehow? Also, if I should manipulate it, why is that?

If fieldMember is a pretty good way to uniquely identify the object, I would say yes.

Joshua Bloch lays out how to properly override equals and hashCode in "Effective Java" Chapter 3.

Multiplying, adding, or xor-ing things will not make it more unique. Mathematically, you'd be applying constant functions to a single variable, which does not increase the number of possible values of the variable.
That sort of technique is useful for combining multiple hashcodes and still keeping the risk of collisions relatively small; it has no bearing whatever on a single hashcode.

Yeah, that's pretty standard. And if the class reflects a database row, I just return the primary key.

There are only two real requirements for hashCode: one, that equals instances have equal hash codes, and two, that hashCode runs reasonably fast. The first requirement is the most important one in practice; without it, you could put something into a collection but not find it there. The second is simply a performance issue.
If the hash code algorithm of your field meets the above, then its algorithm also works for your class, if your class equals also depends solely on whether those fields are equals.

If 'fieldMember' variable already implements 'hashCode' function then you can use it directly from your parent class. If 'fieldMember' variable is a custom class instance, then you must implement it correctly by yourself. Read java.lang.Object API documentation as guideline to implement 'hashCode'.

Ya. It is good programming practice. I normally use:
return var ^ 1;

Usually, unless you are using this object as the key for a *HashMap or an element in a *HashSet, hashCode() doesn't need to be overridden.

As someone else mentioned, you should follow the advice in Effective Java. If you override the hashCode() method, you should also be overriding the equals() method. Furthermore, the two methods should be consistent.
To simplify writing good equals() and hashCode() methods, I use EqualsBuilder and HashCodeBuilder from Apache Commons Lang
Here are examples:
public boolean equals(Object o) {
if (this == o) {
return true;
}
if (o == null || getClass() != o.getClass()) {
return false;
}
User other = (User) o;
return new EqualsBuilder()
.append(this.getUniqueId(), other.getUniqueId())
.isEquals();
}
public int hashCode() {
return new HashCodeBuilder()
.append(this.getUniqueId())
.toHashCode();
}

Related

Should I override hashCode() of Collections?

Given that I some class with various fields in it:
class MyClass {
private String s;
private MySecondClass c;
private Collection<someInterface> coll;
// ...
#Override public int hashCode() {
// ????
}
}
and of that, I do have various objects which I'd like to store in a HashMap. For that, I need to have the hashCode() of MyClass.
I'll have to go into all fields and respective parent classes recursively to make sure they all implement hashCode() properly, because otherwise hashCode() of MyClass might not take into consideration some values. Is this right?
What do I do with that Collection? Can I always rely on its hashCode() method? Will it take into consideration all child values that might exist in my someInterface object?
I OPENED A SECOND QUESTION regarding the actual problem of uniquely IDing an object here: How do I generate an (almost) unique hash ID for objects?
Clarification:
is there anything more or less unqiue in your class? The String s? Then only use that as hashcode.
MyClass hashCode() of two objects should definitely differ, if any of the values in the coll of one of the objects is changed. HashCode should only return the same value if all fields of two objects store the same values, resursively. Basically, there is some time-consuming calculation going on on a MyClass object. I want to spare this time, if the calculation had already been done with the exact same values some time ago. For this purpose, I'd like to look up in a HashMap, if the result is available already.
Would you be using MyClass in a HashMap as the key or as the value? If the key, you have to override both equals() and hashCode()
Thus, I'm using the hashCode OF MyClass as the key in a HashMap. The value (calculation result) will be something different, like an Integer (simplified).
What do you think equality should mean for multiple collections? Should it depend on element ordering? Should it only depend on the absolute elements that are present?
Wouldn't that depend on the kind of Collection that is stored in coll? Though I guess ordering not really important, no
The response you get from this site is gorgeous. Thank you all
#AlexWien that depends on whether that collection's items are part of the class's definition of equivalence or not.
Yes, yes they are.

I'll have to go into all fields and respective parent classes recursively to make sure they all implement hashCode() properly, because otherwise hashCode() of MyClass might not take into consideration some values. Is this right?
That's correct. It's not as onerous as it sounds because the rule of thumb is that you only need to override hashCode() if you override equals(). You don't have to worry about classes that use the default equals(); the default hashCode() will suffice for them.
Also, for your class, you only need to hash the fields that you compare in your equals() method. If one of those fields is a unique identifier, for instance, you could get away with just checking that field in equals() and hashing it in hashCode().
All of this is predicated upon you also overriding equals(). If you haven't overridden that, don't bother with hashCode() either.
What do I do with that Collection? Can I always rely on its hashCode() method? Will it take into consideration all child values that might exist in my someInterface object?
Yes, you can rely on any collection type in the Java standard library to implement hashCode() correctly. And yes, any List or Set will take into account its contents (it will mix together the items' hash codes).

So you want to do a calculation on the contents of your object that will give you a unique key you'll be able to check in a HashMap whether the "heavy" calculation that you don't want to do twice has already been done for a given deep combination of fields.
Using hashCode alone:
I believe hashCode is not the appropriate thing to use in the scenario you are describing.
hashCode should always be used in association with equals(). It's part of its contract, and it's an important part, because hashCode() returns an integer, and although one may try to make hashCode() as well-distributed as possible, it is not going to be unique for every possible object of the same class, except for very specific cases (It's easy for Integer, Byte and Character, for example...).
If you want to see for yourself, try generating strings of up to 4 letters (lower and upper case), and see how many of them have identical hash codes.
HashMap therefore uses both the hashCode() and equals() method when it looks for things in the hash table. There will be elements that have the same hashCode() and you can only tell if it's the same element or not by testing all of them using equals() against your class.
Using hashCode and equals together
In this approach, you use the object itself as the key in the hash map, and give it an appropriate equals method.
To implement the equals method you need to go deeply into all your fields. All of their classes must have equals() that matches what you think of as equal for the sake of your big calculation. Special care needs to be be taken when your objects implement an interface. If the calculation is based on calls to that interface, and different objects that implement the interface return the same value in those calls, then they should implement equals in a way that reflects that.
And their hashCode is supposed to match the equals - when the values are equal, the hashCode must be equal.
You then build your equals and hashCode based on all those items. You may use Objects.equals(Object, Object) and Objects.hashCode( Object...) to save yourself a lot of boilerplate code.
But is this a good approach?
While you can cache the result of hashCode() in the object and re-use it without calculation as long as you don't mutate it, you can't do that for equals. This means that calculation of equals is going to be lengthy.
So depending on how many times the equals() method is going to be called for each object, this is going to be exacerbated.
If, for example, you are going to have 30 objects in the hashMap, but 300,000 objects are going to come along and be compared to them only to realize that they are equal to them, you'll be making 300,000 heavy comparisons.
If you're only going to have very few instances in which an object is going to have the same hashCode or fall in the same bucket in the HashMap, requiring comparison, then going the equals() way may work well.
If you decide to go this way, you'll need to remember:
If the object is a key in a HashMap, it should not be mutated as long as it's there. If you need to mutate it, you may need to make a deep copy of it and keep the copy in the hash map. Deep copying again requires consideration of all the objects and interfaces inside to see if they are copyable at all.
Creating a unique key for each object
Back to your original idea, we have established that hashCode is not a good candidate for a key in a hash map. A better candidate for that would be a hash function such as md5 or sha1 (or more advanced hashes, like sha256, but you don't need cryptographic strength in your case), where collisions are a lot rarer than a mere int. You could take all the values in your class, transform them into a byte array, hash it with such a hash function, and take its hexadecimal string value as your map key.
Naturally, this is not a trivial calculation. So you need to think if it's really saving you much time over the calculation you are trying to avoid. It is probably going to be faster than repeatedly calling equals() to compare objects, as you do it only once per instance, with the values it had at the time of the "big calculation".
For a given instance, you could cache the result and not calculate it again unless you mutate the object. Or you could just calculate it again only just before doing the "big calculation".
However, you'll need the "cooperation" of all the objects you have inside your class. That is, they will all need to be reasonably convertible into a byte array in such a way that two equivalent objects produce the same bytes (including the same issue with the interface objects that I mentioned above).
You should also beware of situations in which you have, for example, two strings "AB" and "CD" which will give you the same result as "A" and "BCD", and then you'll end up with the same hash for two different objects.

For future readers.
Yes, equals and hashCode go hand in hand.
Below shows a typical implementation using a helper library, but it really shows the "hand in hand" nature. And the helper library from apache keeps things simpler IMHO:
#Override
public boolean equals(Object o) {
if (this == o) {
return true;
}
if (o == null || getClass() != o.getClass()) {
return false;
}
MyCustomObject castInput = (MyCustomObject) o;
boolean returnValue = new org.apache.commons.lang3.builder.EqualsBuilder()
.append(this.getPropertyOne(), castInput.getPropertyOne())
.append(this.getPropertyTwo(), castInput.getPropertyTwo())
.append(this.getPropertyThree(), castInput.getPropertyThree())
.append(this.getPropertyN(), castInput.getPropertyN())
.isEquals();
return returnValue;
}
#Override
public int hashCode() {
return new org.apache.commons.lang3.builder.HashCodeBuilder(17, 37)
.append(this.getPropertyOne())
.append(this.getPropertyTwo())
.append(this.getPropertyThree())
.append(this.getPropertyN())
.toHashCode();
}
17, 37 .. those you can pick your own values.

From your clarifications:
You want to store MyClass in an HashMap as key.
This means the hashCode() is not allowed to change after adding the object.
So if your collections may change after object instantiation, they should not be part of the hashcode().
From http://docs.oracle.com/javase/8/docs/api/java/util/Map.html
Note: great care must be exercised if mutable objects are used as map
keys. The behavior of a map is not specified if the value of an object
is changed in a manner that affects equals comparisons while the
object is a key in the map.
For 20-100 objects it is not worth that you enter the risk of an inconsistent hash() or equals() implementation.
There is no need to override hahsCode() and equals() in your case.
If you don't overide it, java takes the unique object identity for equals and hashcode() (and that works, epsecially because you stated that you don't need an equals() considering the values of the object fields).
When using the default implementation, you are on the safe side.
Making an error like using a custom hashcode() as key in the HashMap when the hashcode changes after insertion, because you used the hashcode() of the collections as part of your object hashcode may result in an extremly hard to find bug.
If you need to find out whether the heavy calculation is finished, I would not absue equals(). Just write an own method objectStateValue() and call hashcode() on the collection, too. This then does not interfere with the objects hashcode and equals().
public int objectStateValue() {
// TODO make sure the fields are not null;
return 31 * s.hashCode() + coll.hashCode();
}
Another simpler possibility: The code that does the time consuming calculation can raise an calculationCounter by one as soon as the calculation is ready. You then just check whether or not the counter has changed. this is much cheaper and simpler.

Overriding hashCode() when overriding equals() [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
In Java, why must equals() and hashCode() be consistent?
I read that one should always hascode() when overriding equals().
Can anyone give a practical example of why it might be wrong otherwise?
i.e. Problems that might arise when overriding equals() but not hashCode().
Is it necessary to write a robust hasCode() function whenever we override equals()? Or a trivial implementation is enough?
For example,
A poor implementation such as the below is good enough to satisfy the contract between equals() & hashCode()?
public int hashCode() {
return 91;
}

Both equals and hashcode are based on the principle of object's unicity. If equals returns true the hashcode of both objects must be the same, otherwise hash based structures and algorithms could have undefined results.
Think of a hash based structure such as a HashMap. hashcode will be invoked as a base to get the key's reference, not equals, making it impossible in most cases to find the key. Also, a poor implementation of hashcode will create collisions (multiple objects with the same hashcode, which one is the "correct" one?) that affect performance.
IMHO, overriding equals OR hashcode (instead of overriding both) should be considered a code smell or, at least, a potential bugs source. That is, unless you're 100% sure it won't affect your code sooner or later (when are we so sure anyway?).
Note: There are various libraries that provide support for this by having equals and hashcode builders, like Apache Commons with HashcodeBuilder and EqualsBuilder.

equals() and hashCode() are used conjunctively in certain collections, such as HashSet and HashMap, so you have to make sure that if you use these collections, you override hashCode according to the contract.
If you don't override hashCode at all, then you'll have problems with HashSet and HashMap. In particular, two objects that are "equal" may be put in different hash buckets even though they should be equal.
If you do override hashCode, but do so poorly, then you'll have performance issues. All your entries for HashSet and HashMap will be put into the same bucket, and you'll lose the O(1) performance and have O(n) instead. This is because the data structure essentially becomes a linearly-checked linked list.
As for breaking programs outside of these conditions, it's not likely, but you never know when an API (especially in 3rd-party libraries) is going to depend on this contract. The contract is upheld for objects that don't implement either of them, so it's conceivable that a library may depend on this somewhere without using hash buckets.
In any case, implementing a good hashCode is easy, especially if you're using an IDE. Eclipse and Netbeans both have the ability to generate equals and hashCode for you in a way that all contracts are followed, including the inverse rules of equals (the assertion that a.equals(b) == b.equals(a)). All you need to do is select the fields you want to be included and go.

Here's some code that illustrates a bug you can introduce by not implementing hashCode(): Set.contains() will first check the hashCode() of an object, and then check .equals(). So, if you don't implement both, .contains() will not behave in an intuitive way:
public class ContainsProblem {
// define a class that implements equals, without implementing hashcode
class Car {
private String name;
#Override
public boolean equals(Object o) {
if (this == o) return true;
if (!(o instanceof Car)) return false;
Car car = (Car) o;
if (name != null ? !name.equals(car.name) : car.name != null) return false;
return true;
}
public String getName() {return name;}
public Car(String name) { this.name = name;}
}
public static void main(String[] args) {
ContainsProblem oc = new ContainsProblem();
ContainsProblem.Car ford = oc.new Car("ford");
ContainsProblem.Car chevy = oc.new Car("chevy");
ContainsProblem.Car anotherFord = oc.new Car("ford");
Set cars = Sets.newHashSet(ford,chevy);
// if the set of cars contains a ford, a ford is equal to another ford, shouldn't
// the set return the same thing for both fords? without hashCode(), it won't:
if (cars.contains(ford) && ford.equals(anotherFord) && !cars.contains(anotherFord)) {
System.out.println("oh noes, why don't we have a ford? isn't this a bug?");
}
}
}

Your trivial implementation is correct, but would kill the performance of hash-based collections.
The default implementation (provided by Object) would break the contract if two different instances of your class compared equal.

I suggest reading Joshua Bloch's "Effective Java" Chapter 3 "Methods Common to All Objects". Nobody can explaing better than him. He He led the design and implementation of numerous Java platform features.

Does List.retainAll() use HashMap internally?

I am purposefully violating the hashCode contract that says that if we override equals() in our class, we must override hashCode() as well, and I am making sure that no Hash related data structures (like HashMap, HashSet, etc) are using it. The problem is that I fear methods like removeAll() and containsAll() of Lists might use HashMaps internally, and in that case, since I am not overriding hashCode() in my classes, their functionality might break.
Can anyone please conform whether my doubt is valid ? The classes contain a lot of fields that are being used for equality comparison, and I will have to come up with an efficient technique to get a hashCode using all of them. I really don't require them in any hash-related operations, and as such, I am trying to avoid implementing hashCode()

From AbstractCollection.retainAll()
* <p>This implementation iterates over this collection, checking each
* element returned by the iterator in turn to see if it's contained
* in the specified collection. If it's not so contained, it's removed
* from this collection with the iterator's <tt>remove</tt> method.
public boolean retainAll(Collection<?> c) {
boolean modified = false;
Iterator<E> e = iterator();
while (e.hasNext()) {
if (!c.contains(e.next())) {
e.remove();
modified = true;
}
}
return modified;
}

As for
I will have to come up with an efficient technique to get a hashCode using all of them
You don't need to use all of the fields used by equals in your hashCode implementation:
It is not required that if two objects are unequal according to the equals method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hashtables.
Therefore, your hashCode implementation could be very simple and still obey the contract:
public int hashCode() {
return 1;
}
This will ensure that hash-based data structures still work (alebit at degraded performance). If you add logging to your hashCode implementation, then you could even check if it is ever called.

I think a simple way to test if hashCode() is being used anywhere is to override hashCode() for your class, make it print a statement to the console (or a file if you prefer) and then return some random value (won't matter since you said you don't want to use any hash-based classes anyway).
However, i think the best would be to just override it, i'm sure some IDE's even can do it for you (Eclipse can, for example). If you never expect it to get called, it can't hurt.

Does Hashcode equality imply refer reference based equality?

I read that to use equals() method in java we also have to override the hashcode() method and that the equal (logically) objects should have eual hashcodes, but doesn't that imply reference based equality! Here is my code for overridden equals() method, how should I override hashcode method for this:
#Override
public boolean equals(Object o)
{
if (!(o instanceof dummy))
return false;
dummy p = (dummy) o;
return (p.getName() == this.getName() && p.getId() == this.getId() && p.getPassword() == this.getPassword());
}
I just trying to learn how it works, so there are only three fields, namely name , id and password , and just trying to compare two objects that I define in the main() thats all! I also need to know if it is always necessary to override hashcode() method along with equals() method?

Hashcode equality does not imply anything. However, hashcode inequality should imply that equals will yield false, and any two items that are equal should always have the same hashcode.
For this reason, it is always wise to override hashcode with equals, because a number of data structures rely on it.

Even though failure to override hashCode() will only break usage of your class in HashSet, HashMap, and other hashCode dependent structures, you should still override hashCode() to maintain the contract described by Object.
The general strategy of most hashCode() implementations is to combine the hash codes of the fields used to determine equality. In your case, a reasonable hashCode() may look something like this:
public int hashCode(){
return this.getName().hashCode() ^ this.getId() ^ this.getPassword().hashCode();
}

You need to override hashCode() when you override equals(). Merely using equals() is not enough to require you to override hashCode().

In your code, you aren't actually comparing your fields' values. Use equals() instead of == to make your implementation of equal correct.
return (p.getName().equals(this.getName()) && ...
(Note that the above code can cause null reference exceptions if getName() returns null: you may want to use a utility class as described here)
And yes hashCode() would be called when you use some hashing data structure like HashMap,HashSet
You must override hashCode() in every
class that overrides equals(). Failure
to do so will result in a violation of
the general contract for
Object.hashCode(), which will prevent
your class from functioning properly
in conjunction with all hash-based
collections, including HashMap,
HashSet, and Hashtable.
from Effective Java, by Joshua Bloch
Also See
overriding-equals-and-hashcode-in-java
hashcode-and-equals
Nice article on equals() & hashCode()

The idea with hashCode() is that it is a unique representation of your object in a given space. Data structures that hold objects use hash codes to determine where to place objects. In Java, a HashSet for example uses the hash code of an object to determine which bucket that objects lies in, and then for all objects in that bucket, it uses equals() to determine whether it is a match.
If you don't override hashCode(), but do override equals(), then you will get to a point where you consider 2 objects to be equal, but Java collections don't see it the same way. This will lead to a lot of strange behaviour.

Can you explain this Java hash map key collision?

I have a HashMap and is used in the following way:
HashMap<SomeInterface, UniqueObject> m_map;
UniqueObject getUniqueObject(SomeInterface keyObject)
{
if (m_map.containsKey(keyObject))
{
return m_map.get(keyObject);
}
else
{
return makeUniqueObjectFor(keyObject);
}
}
My issue is that I'm seeing multiple objects of different classes matching the same key on m_map.containsKey(keyObject).
So here are my questions:
Is this possible? The Map interface says it uses equals() to compare if the key is not null. I haven't overridden equals() in any of my SomeInterface classes. Does this mean the equals method can be wrong?
If the above is true, how do I get HashMap to only return true on equals() if they are in fact the same object and not a copy? Is this possible by saying if (object1 == object2)? I was told early on in Java development that I should avoid doing that, but I never found out when it should be used.
Thanks in advance. :)

I strongly suspect you've misdiagnosed the issue. If you aren't overriding equals anywhere (and you're not subclassing anything else that overrides equals) then you should indeed have "identity" behaviour.
I would be shocked to hear that this was not the case, to be honest.
If you can product a short but complete program which demonstrates the problem, that would make it easier to look into - but for the moment, I'd definitely double-check your suspicions about seeing different objects being treated as equal keys.

The default implementation of equals() is done in java.lang.Object:
public boolean equals(Object obj) {
return (this == obj);
}
Other method hashCode(); by default returns some kind of reference to the object. I.e. both are unique by default. Equals returns true only for the same object, hashCode() is different for every object.
This is exactly what can create some kind of multiple entries. You can create 2 instances of your class. From your point of view they are equal because they contain identical data. But they are different. So, if you are using these objects as keys of map you are producing 2 entries. If you want to avoid this implement equals and hashCode for your class.
This implementation sometimes is very verbose. HashCodeBuilder and EqualsBuilder from Jakarta project may help you. Here is an example:
#Override
public int hashCode() {
return HashCodeBuilder.reflectionHashCode(this);
}
#Override
public boolean equals(Object other) {
return EqualsBuilder.reflectionEquals(this, other);
}
#Override
public String toString() {
return ToStringBuilder.reflectionToString(this);
}

You need to ensure that your .equals() and your .hashCode() methods are implemented for all objects that you want to store in the HashMap. To not have that invites all sorts of problems.

You must implement the equals() and hashCode() methods of the objects that you use as the keys in the HashMap.
Note that HashMap not only uses equals(), it also uses hashCode(). Your hashCode() method must be implemented correctly to match the implementation of the equals() method. If the implementation of these methods don't match, you can get unpredictable problems.
See the description of equals() and hashCode() in the API documentation of class Object for the detailed requirements.

FYI you can have IDE's such as Eclipse generate the hashCode & equals methods for you. They'll probably do a better job than if you try to hand-code them yourself.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.