Can you explain this Java hash map key collision? - java

I have a HashMap and is used in the following way:
HashMap<SomeInterface, UniqueObject> m_map;
UniqueObject getUniqueObject(SomeInterface keyObject)
{
if (m_map.containsKey(keyObject))
{
return m_map.get(keyObject);
}
else
{
return makeUniqueObjectFor(keyObject);
}
}
My issue is that I'm seeing multiple objects of different classes matching the same key on m_map.containsKey(keyObject).
So here are my questions:
Is this possible? The Map interface says it uses equals() to compare if the key is not null. I haven't overridden equals() in any of my SomeInterface classes. Does this mean the equals method can be wrong?
If the above is true, how do I get HashMap to only return true on equals() if they are in fact the same object and not a copy? Is this possible by saying if (object1 == object2)? I was told early on in Java development that I should avoid doing that, but I never found out when it should be used.
Thanks in advance. :)

I strongly suspect you've misdiagnosed the issue. If you aren't overriding equals anywhere (and you're not subclassing anything else that overrides equals) then you should indeed have "identity" behaviour.
I would be shocked to hear that this was not the case, to be honest.
If you can product a short but complete program which demonstrates the problem, that would make it easier to look into - but for the moment, I'd definitely double-check your suspicions about seeing different objects being treated as equal keys.

The default implementation of equals() is done in java.lang.Object:
public boolean equals(Object obj) {
return (this == obj);
}
Other method hashCode(); by default returns some kind of reference to the object. I.e. both are unique by default. Equals returns true only for the same object, hashCode() is different for every object.
This is exactly what can create some kind of multiple entries. You can create 2 instances of your class. From your point of view they are equal because they contain identical data. But they are different. So, if you are using these objects as keys of map you are producing 2 entries. If you want to avoid this implement equals and hashCode for your class.
This implementation sometimes is very verbose. HashCodeBuilder and EqualsBuilder from Jakarta project may help you. Here is an example:
#Override
public int hashCode() {
return HashCodeBuilder.reflectionHashCode(this);
}
#Override
public boolean equals(Object other) {
return EqualsBuilder.reflectionEquals(this, other);
}
#Override
public String toString() {
return ToStringBuilder.reflectionToString(this);
}

You need to ensure that your .equals() and your .hashCode() methods are implemented for all objects that you want to store in the HashMap. To not have that invites all sorts of problems.

You must implement the equals() and hashCode() methods of the objects that you use as the keys in the HashMap.
Note that HashMap not only uses equals(), it also uses hashCode(). Your hashCode() method must be implemented correctly to match the implementation of the equals() method. If the implementation of these methods don't match, you can get unpredictable problems.
See the description of equals() and hashCode() in the API documentation of class Object for the detailed requirements.

FYI you can have IDE's such as Eclipse generate the hashCode & equals methods for you. They'll probably do a better job than if you try to hand-code them yourself.

Related

Overriding hashCode() when overriding equals() [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
In Java, why must equals() and hashCode() be consistent?
I read that one should always hascode() when overriding equals().
Can anyone give a practical example of why it might be wrong otherwise?
i.e. Problems that might arise when overriding equals() but not hashCode().
Is it necessary to write a robust hasCode() function whenever we override equals()? Or a trivial implementation is enough?
For example,
A poor implementation such as the below is good enough to satisfy the contract between equals() & hashCode()?
public int hashCode() {
return 91;
}
Both equals and hashcode are based on the principle of object's unicity. If equals returns true the hashcode of both objects must be the same, otherwise hash based structures and algorithms could have undefined results.
Think of a hash based structure such as a HashMap. hashcode will be invoked as a base to get the key's reference, not equals, making it impossible in most cases to find the key. Also, a poor implementation of hashcode will create collisions (multiple objects with the same hashcode, which one is the "correct" one?) that affect performance.
IMHO, overriding equals OR hashcode (instead of overriding both) should be considered a code smell or, at least, a potential bugs source. That is, unless you're 100% sure it won't affect your code sooner or later (when are we so sure anyway?).
Note: There are various libraries that provide support for this by having equals and hashcode builders, like Apache Commons with HashcodeBuilder and EqualsBuilder.
equals() and hashCode() are used conjunctively in certain collections, such as HashSet and HashMap, so you have to make sure that if you use these collections, you override hashCode according to the contract.
If you don't override hashCode at all, then you'll have problems with HashSet and HashMap. In particular, two objects that are "equal" may be put in different hash buckets even though they should be equal.
If you do override hashCode, but do so poorly, then you'll have performance issues. All your entries for HashSet and HashMap will be put into the same bucket, and you'll lose the O(1) performance and have O(n) instead. This is because the data structure essentially becomes a linearly-checked linked list.
As for breaking programs outside of these conditions, it's not likely, but you never know when an API (especially in 3rd-party libraries) is going to depend on this contract. The contract is upheld for objects that don't implement either of them, so it's conceivable that a library may depend on this somewhere without using hash buckets.
In any case, implementing a good hashCode is easy, especially if you're using an IDE. Eclipse and Netbeans both have the ability to generate equals and hashCode for you in a way that all contracts are followed, including the inverse rules of equals (the assertion that a.equals(b) == b.equals(a)). All you need to do is select the fields you want to be included and go.
Here's some code that illustrates a bug you can introduce by not implementing hashCode(): Set.contains() will first check the hashCode() of an object, and then check .equals(). So, if you don't implement both, .contains() will not behave in an intuitive way:
public class ContainsProblem {
// define a class that implements equals, without implementing hashcode
class Car {
private String name;
#Override
public boolean equals(Object o) {
if (this == o) return true;
if (!(o instanceof Car)) return false;
Car car = (Car) o;
if (name != null ? !name.equals(car.name) : car.name != null) return false;
return true;
}
public String getName() {return name;}
public Car(String name) { this.name = name;}
}
public static void main(String[] args) {
ContainsProblem oc = new ContainsProblem();
ContainsProblem.Car ford = oc.new Car("ford");
ContainsProblem.Car chevy = oc.new Car("chevy");
ContainsProblem.Car anotherFord = oc.new Car("ford");
Set cars = Sets.newHashSet(ford,chevy);
// if the set of cars contains a ford, a ford is equal to another ford, shouldn't
// the set return the same thing for both fords? without hashCode(), it won't:
if (cars.contains(ford) && ford.equals(anotherFord) && !cars.contains(anotherFord)) {
System.out.println("oh noes, why don't we have a ford? isn't this a bug?");
}
}
}
Your trivial implementation is correct, but would kill the performance of hash-based collections.
The default implementation (provided by Object) would break the contract if two different instances of your class compared equal.
I suggest reading Joshua Bloch's "Effective Java" Chapter 3 "Methods Common to All Objects". Nobody can explaing better than him. He He led the design and implementation of numerous Java platform features.

Does singleton means hashcode always return the same?

I have two objects, o1 and o2 from the same class.
If o1.hashcode() == o2.hashcode(), can I tell they are the same object?
Beside o1==o2, is there any other way to tell the singleton.
If you have a single instance of the class, the == and the equals comparison will always return true.
However, the hashcode can be equal for different objects, so an equality is not guaranteed just by having equal hashcodes.
Here is a nice explanation of the hashcode and equals contracts.
Checking the equality is not sufficient to be sure that you have a singleton, only that the instances are considered equal.
If you want to have a single instance of a java class, it may be better to make use of static members and methods.
Here, several approaches to singletons are demonstrated.
EDIT: as emory pointed out - you could in fact override equals to return something random and thus violate the required reflexivity (x.equals(x) == true). As you cannot override operators in java, == is the only reliable way to determine identical objects.
No, different objects can have the same hashCode():
"hypoplankton".hashCode()
"unheavenly" .hashCode()
both return the same 427589249 hash value, while they are clearly not equal.
Your question (from the title) seems to be "will hashCode() always return the same value for the same object"... the answer is no.
The implementation is free to return anything, although to be well behaved it should return the same for the same object. For example, this is a valid, albeit poor, implementation:
#Override
public int hashCode() {
return (int) (Math.random() * Integer.MAX_VALUE);
}
The general contract for the hashCode is described below (Effective Java, page. 92). The 3rd item means that the hashCode() results do not need to be unique when called on unequal objects.
Within the same program, the result of hashCode() must not change
If equals() returns true when called with two objects, calling hashCode() on each of those objects must return the same result
If equals() returns false when called with two objects, calling hashCode() on each of those objects does not have to return a different result

Does List.retainAll() use HashMap internally?

I am purposefully violating the hashCode contract that says that if we override equals() in our class, we must override hashCode() as well, and I am making sure that no Hash related data structures (like HashMap, HashSet, etc) are using it. The problem is that I fear methods like removeAll() and containsAll() of Lists might use HashMaps internally, and in that case, since I am not overriding hashCode() in my classes, their functionality might break.
Can anyone please conform whether my doubt is valid ? The classes contain a lot of fields that are being used for equality comparison, and I will have to come up with an efficient technique to get a hashCode using all of them. I really don't require them in any hash-related operations, and as such, I am trying to avoid implementing hashCode()
From AbstractCollection.retainAll()
* <p>This implementation iterates over this collection, checking each
* element returned by the iterator in turn to see if it's contained
* in the specified collection. If it's not so contained, it's removed
* from this collection with the iterator's <tt>remove</tt> method.
public boolean retainAll(Collection<?> c) {
boolean modified = false;
Iterator<E> e = iterator();
while (e.hasNext()) {
if (!c.contains(e.next())) {
e.remove();
modified = true;
}
}
return modified;
}
As for
I will have to come up with an efficient technique to get a hashCode using all of them
You don't need to use all of the fields used by equals in your hashCode implementation:
It is not required that if two objects are unequal according to the equals method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hashtables.
Therefore, your hashCode implementation could be very simple and still obey the contract:
public int hashCode() {
return 1;
}
This will ensure that hash-based data structures still work (alebit at degraded performance). If you add logging to your hashCode implementation, then you could even check if it is ever called.
I think a simple way to test if hashCode() is being used anywhere is to override hashCode() for your class, make it print a statement to the console (or a file if you prefer) and then return some random value (won't matter since you said you don't want to use any hash-based classes anyway).
However, i think the best would be to just override it, i'm sure some IDE's even can do it for you (Eclipse can, for example). If you never expect it to get called, it can't hurt.

Does Hashcode equality imply refer reference based equality?

I read that to use equals() method in java we also have to override the hashcode() method and that the equal (logically) objects should have eual hashcodes, but doesn't that imply reference based equality! Here is my code for overridden equals() method, how should I override hashcode method for this:
#Override
public boolean equals(Object o)
{
if (!(o instanceof dummy))
return false;
dummy p = (dummy) o;
return (p.getName() == this.getName() && p.getId() == this.getId() && p.getPassword() == this.getPassword());
}
I just trying to learn how it works, so there are only three fields, namely name , id and password , and just trying to compare two objects that I define in the main() thats all! I also need to know if it is always necessary to override hashcode() method along with equals() method?
Hashcode equality does not imply anything. However, hashcode inequality should imply that equals will yield false, and any two items that are equal should always have the same hashcode.
For this reason, it is always wise to override hashcode with equals, because a number of data structures rely on it.
Even though failure to override hashCode() will only break usage of your class in HashSet, HashMap, and other hashCode dependent structures, you should still override hashCode() to maintain the contract described by Object.
The general strategy of most hashCode() implementations is to combine the hash codes of the fields used to determine equality. In your case, a reasonable hashCode() may look something like this:
public int hashCode(){
return this.getName().hashCode() ^ this.getId() ^ this.getPassword().hashCode();
}
You need to override hashCode() when you override equals(). Merely using equals() is not enough to require you to override hashCode().
In your code, you aren't actually comparing your fields' values. Use equals() instead of == to make your implementation of equal correct.
return (p.getName().equals(this.getName()) && ...
(Note that the above code can cause null reference exceptions if getName() returns null: you may want to use a utility class as described here)
And yes hashCode() would be called when you use some hashing data structure like HashMap,HashSet
You must override hashCode() in every
class that overrides equals(). Failure
to do so will result in a violation of
the general contract for
Object.hashCode(), which will prevent
your class from functioning properly
in conjunction with all hash-based
collections, including HashMap,
HashSet, and Hashtable.
from Effective Java, by Joshua Bloch
Also See
overriding-equals-and-hashcode-in-java
hashcode-and-equals
Nice article on equals() & hashCode()
The idea with hashCode() is that it is a unique representation of your object in a given space. Data structures that hold objects use hash codes to determine where to place objects. In Java, a HashSet for example uses the hash code of an object to determine which bucket that objects lies in, and then for all objects in that bucket, it uses equals() to determine whether it is a match.
If you don't override hashCode(), but do override equals(), then you will get to a point where you consider 2 objects to be equal, but Java collections don't see it the same way. This will lead to a lot of strange behaviour.

Implementing `hashCode()` for very simple classes

I have a very simple class with only one field member (e.g. String). Is it OK to implement hashCode() to simply return fieldMember.hashCode()? Or should I manipulate the field's hash code somehow? Also, if I should manipulate it, why is that?
If fieldMember is a pretty good way to uniquely identify the object, I would say yes.
Joshua Bloch lays out how to properly override equals and hashCode in "Effective Java" Chapter 3.
Multiplying, adding, or xor-ing things will not make it more unique. Mathematically, you'd be applying constant functions to a single variable, which does not increase the number of possible values of the variable.
That sort of technique is useful for combining multiple hashcodes and still keeping the risk of collisions relatively small; it has no bearing whatever on a single hashcode.
Yeah, that's pretty standard. And if the class reflects a database row, I just return the primary key.
There are only two real requirements for hashCode: one, that equals instances have equal hash codes, and two, that hashCode runs reasonably fast. The first requirement is the most important one in practice; without it, you could put something into a collection but not find it there. The second is simply a performance issue.
If the hash code algorithm of your field meets the above, then its algorithm also works for your class, if your class equals also depends solely on whether those fields are equals.
If 'fieldMember' variable already implements 'hashCode' function then you can use it directly from your parent class. If 'fieldMember' variable is a custom class instance, then you must implement it correctly by yourself. Read java.lang.Object API documentation as guideline to implement 'hashCode'.
Ya. It is good programming practice. I normally use:
return var ^ 1;
Usually, unless you are using this object as the key for a *HashMap or an element in a *HashSet, hashCode() doesn't need to be overridden.
As someone else mentioned, you should follow the advice in Effective Java. If you override the hashCode() method, you should also be overriding the equals() method. Furthermore, the two methods should be consistent.
To simplify writing good equals() and hashCode() methods, I use EqualsBuilder and HashCodeBuilder from Apache Commons Lang
Here are examples:
public boolean equals(Object o) {
if (this == o) {
return true;
}
if (o == null || getClass() != o.getClass()) {
return false;
}
User other = (User) o;
return new EqualsBuilder()
.append(this.getUniqueId(), other.getUniqueId())
.isEquals();
}
public int hashCode() {
return new HashCodeBuilder()
.append(this.getUniqueId())
.toHashCode();
}

Categories