Hashset allows duplicates? - java

This question surely isn't a new one, but I didn't find any helpful answer anywhere.
As you can see in the code below, the equals and hashcode methods are overriden, but it still allows duplicates. The Hashcode has been generated automatically by Netbeans.
#Override
public boolean equals(Object o)
{
TaskDetails other = (TaskDetails) o;
if ( (id_subtask == other.id_subtask)
&& ((date.compareTo(other.date)) == 0) )
{
System.err.println("Duplicate Entry"+id_subtask+" + "+other.id_subtask);
return true;
}
else
{
System.out.println("Good!" +id_subtask+" + "+other.id_subtask);
return false;
}
}
#Override
public int hashCode() {
int hash = 7;
hash = 71 * hash + this.id_subtask;
hash = 71 * hash + this.id_team_member;
hash = 71 * hash + Float.floatToIntBits(this.nb_hours);
hash = 71 * hash + (this.date != null ? this.date.hashCode() : 0);
hash = 71 * hash + (this.comment != null ? this.comment.hashCode() : 0);
hash = 71 * hash + (this.subtask_name != null ? this.subtask_name.hashCode() : 0);
System.out.println("Hash : "+hash + "Subtask : " + id_subtask);
return hash;
}
This the code used to add an entry into the hashset :
TaskDetails newTaskDetails = new TaskDetails
(
s.getId_subtask(),
mus.teamMember.getId_team_member(),
f,
mysqlFormat.format(caldate),
c.substring(0, Math.min(c.length(), 100)),
s.getName_subtask()
);
allTasks.add(newTaskDetails);
(allTasks being the Hashset)
This code is used in function A and B.
If only function A is executed, it works fine. If function B is executed after function A (so the code above is executed twice), then the hashset suddenly accepts duplicates, even though system.err is triggered saying there is a duplicate entry?
Is there a flaw in the code, or am I just missing something?
Thanks for the help!

you are using 2 fields to consider 2 objects to be "equal", but you are using more than 2 fields to construct the hashcode. your hashCode() method cannot be more specific than your equals() method. as a good rule of thumb, your hashCode() method should not use any fields that your equals() method does not use (it can use fewer however). to put it more technically, if 2 objects are "equal" they must have the same hashcode (the reverse is not required).

You are violating the consistency requirement between hashCode() and equals(). If two objects are equal according to equals(), they must also have the same hash. Because your equals only considers two fields, and hashCode considers more, this requirement is not met.

Your problem is that the implementation of hashCode() does not match equals(). Both methods must use the same attributes of your object.
It's likely in your implementation that the hashCode() is different even if equals() evaluates to true. In this case (different hashCodes) the objects are different for the HashMap.
Please correct your implementations to use the same attributes. Then the error should vanish.

From the javadoc of Object
If two objects are equal according to the equals(Object) method, then
calling the hashCode method on each of the two objects must produce the
same integer result.
Your hashcodes are different for two objects that are equal according to the equals(Object) method, so the other code HashSet is going to make the wrong assumptions, and return the wrong results.
Some code is written in a manner that they depend on other objects honoring "contracts". Your class doesn't honor the Object contract, so nothing in collections can be assumed to work, as collections requires that the Object contracts not be broken.

This is a duplicate question, see my previous answer.
The behaviour where a java.util.HashSet allows duplicates is caused when the hash code of the objects in the java.util.HashSet can change.
This typically happens when an object's hash code is constructed from mutable fields.

Related

Garbage Collection for String Class in Java

In this code I have declared a Initialized a String variable and then printed its hashcode, then reinitialized it to another value and then invoked the Garbage Collector to clear the dereferenced objects.
But when I reinitialize the String variable to its original value and print the hashcode, the same hashcode is getting printed. How?
public class TestGarbage1 {
public static void main(String args[]) {
String m = "JAVA";
System.out.println(m.hashCode());
m = "java";
System.gc();
System.out.println(m.hashCode());
m = "JAVA";
System.out.println(m.hashCode());
}
}
Hash code relates to object equality, not identity.
a.equals(b) implies a.hashCode() == b.hashCode()
(Provided the two methods have been implemented consistently)
Even if a gc were actually taking place here (and you weren't simply referencing strings in the constant pool), you wouldn't expect two string instances with the same sequence of chars not to be equal - hence, their hash codes will also be the same.
String a = new String("whatever");
String b = new String(a);
System.out.println(a == b); // false, they are not the same instance
System.out.println(a.equals(b)); // true, they represent the same string
System.out.println(a.hashCode() == b.hashCode()); // true, they represent the same string
I think you are misunderstanding something about how hashcodes work. Without going in to too much detail, in Java, hashcodes are used for many things. One example is used to find an item in a Hash datastructure like HashMap or HashSet.
A hash of the same value should always return the same hash. In this case, a hash of "JAVA" should never change because then it will break the agreement set forth in Java.
I think it's too complicated to go about how hashcodes for String are calculated. You can read more about it here. I can give you an example though.
Let's say you have a class Fruit and it has fields like shape, color and weight.
You must implement equals AND hashcode for this class. It is very important to do both because otherwise you are breaking the way Hashmap work. Let's say you make this for your hashCode() method.
#Override
public int hashCode() {
int hash = 1;
hash = hash * 17 + this.color;
hash = hash * 31 + this.shape.hashCode();
hash = hash * 31 + this.weight;
return hash;
}
This will generate the same hash value EVERY TIME for the two Fruit instances that are equal. That is exactly what you would want.
Really quick, how would this be actually used in a HashMap? Let's say you want to see if you have foo = new Fruit(); HashMap first calculates foo.hashCode(). It checks to see if there is anything in the bucket for that hashCode. If there is then it will use the equals() method until it returns true. It must do this because there might be hashcode collisions. And that's why it is important why equals and hashCode should be implemented together.

Combining Hash of String and Hash of Long

I have the following java class:
public class Person{
String name; //a unique name
Long DoB; //a unique time
.
.
.
#Override
public int hashCode(){
return name.hashCode() + DoB.hashCode();
}
}
Is my hashCode method correct (i.e. would it return a unique number of all combinations.
I have a feeling I'm missing something here.
You could let java.util.Arrays do it for you:
return Arrays.hashCode(new Object[]{ name, DoB });
You might also want to use something more fluent and more NPE-bulletproof like Google Guava:
#Override
public int hashCode(){
return Objects.hashCode(name, DoB);
}
#Override
public boolean equals(Object o) {
if ( this == o ) {
return true;
}
if ( o == null || o.getClass() != Person.class ) {
return false;
}
final Person that = (Person) o;
return Objects.equal(name, that.name) && Objects.equal(DoB, that.DoB);
}
Edit:
IntelliJ IDEA and Eclipse can generate more efficient hashCode() and equals().
Aside for the obvious, which is, you might want to implement the equals method as well...
Summing two hash codes has the very small risk of overflowing int
The sum itself seems like a bit of a weak methodology to provide unique hash codes. I would instead try some bitwise manipulation and use a seed.
See Bloch's Effective Java #9.
But you should start with an initial value (so that subsequent zero values are significant), and combine the fields that apply to the result along with a multiplier so that order is significant (so that similar classes will have much different hashes.)
Also, you will have to treat things like long fields and Strings a little different. e.g., for longs:
(int) (field ^ (field>>>32))
So, this means something like:
#Override public int hashCode() {
int result = 17;
result += name.hashCode() == null ? 0 : name.hashCode();
result = 31 * result + (int) (DoB ^ (DoB >>> 32));
return result;
}
31 is slightly magic, but odd primes can make it easier for the compiler to optimize the math to shift-subtraction. (Or you can do the shift-subtraction yourself, but why not let the compiler do it.)
usually a hashcode is build like so:
#Override
public int hashCode(){
return name.hashCode() ^ DoB.hashCode();
}
but the important thing to remember when doing a hashcode method is the use of it. the use of hashcode method is to put different object in different buckets in a hashtable or other collection using hashcode. as such, it's impotent to have a method that gives different answers to different objects at a low run time but doesn't have to be different for every item, though it's better that way.
This hash is used by other code when storing or manipulating the
instance – the values are intended to be evenly distributed for varied
inputs in order to use in clustering. This property is important to
the performance of hash tables and other data structures that store
objects in groups ("buckets") based on their computed hash values
and
The general contract for overridden implementations of this method is
that they behave in a way consistent with the same object's equals()
method: that a given object must consistently report the same hash
value (unless it is changed so that the new version is no longer
considered "equal" to the old), and that two objects which equals()
says are equal must report the same hash value.
Your hash code implementation is fine and correct. It could be better if you follow any of the suggestions other people have made, but it satisfies the contract for hashCode, and collisions aren't particularly likely, though they could be made less likely.

default implementation of hashcode returns different values for objects constructed the same way

Here I am writing one sample code:
public class Test {
private int i;
private int j;
public Test() {
// TODO Auto-generated constructor stub
}
public Test(int i, int j)
{
this.i=i;
this.j=j;
}
}
now I am creating two objects as bellow:
Test t1= new Test(4,5);
Test t2 = new Test(4,5);
But when i am printing t1.hashcode() and t2.hashcode() they are giving different values.
But as per java's general contact they should return same value.
In fact, when i am doing same thing with String or Integer they are returning same hashcode(). Can anyone please explain why hashcode is different for t1 and t2 object?
But as per java's general contact they should return same value.
Java's equals-hashCode contract requires that if two objects are equal by Object.equals, they must have the same hashcode from Object.hashCode. But the default implementation of Object.equals is reference equality, and therefore two instances are the same if and only if they are the same instance.
Therefore, in particular, your two instances t1 and t2 are in fact not equal because you have not overridden Object.equals. They are not equal as references, and therefore not equal per Object.equals, and therefore it is acceptable for hashCode to possibly return different values. In fact, the contract explicitly says the following:
It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results.
Thus, we do not have a violation of the equals-hashCode contract here.
So, for your objects, if you want different instances to be equal per a logical definition of equality, you need to override Object.equals:
#Override
public boolean equals(Object obj) {
if (obj == null) {
return false;
if (this == obj) {
return true;
}
if (!(obj instanceof Test)) {
return false;
}
Test other = (Test)obj;
return this.i == other.i && this.j == other.j;
}
And the equals-hashCode contract requires that you override Object.hashCode too or you'll run into some nasty bugs:
#Override
public int hashCode() {
int hash = 17;
hash = 31 * hash + this.i;
hash = 31 * hash + this.j;
return hash;
}
What does the contract say:
If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
Let's see if we have satisfied this requirement here. If x and y are instances of Test and satisfy x.equals(y) is true, we have that x.i == y.i and x.j == y.j. Then, clearly, if we invoke x.hashCode() and y.hashCode() we have the invariant that at each line of execution in Test.hashCode we will have hash holding the same value. Clearly this is true on the first line since hash will be 17 in both cases. It will hold on the second line since this.i will return the same value whether this == x or this == y because x.i equals y.i. Finally, on the penultimate line, we will still have hash being equal across both invocations because x.j equals y.j is true as well.
Note that there is one last piece of the contract that we haven't discussed yet. This is the requirement that hashCode return a consistent value during a single execution of a Java application:
Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified.
The necessity of this is obvious. If you change the return value from hashCode during a single execution of the same application, you could lose your objects in hashtable-like data structures that use hashCode to keep track of objects. In particular, this is why mutating objects that are keys in hashtable-like data structures is pure evil; don't do it. I would go so far as to argue that they should be immutable objects.
In fact, when i am doing same thing with String or Integer they are returning same hashcode().
They've both overridden Object.equals and Object.hashCode.
You have not overridden the equals method in your class so the default one will be used that belongs to Object class.
Object class methods simply checks for the references whether they are referring to the same object or not.
Test t1 = new Test(4,5);
Test t2 = new Test(4,5);
are two different objects, if you don't override the equals method here, they will be equal if and only if you do
Test t2 = t1;
As you are creating two different objects here, hashcode which are NOT equal because they don't refer to the same object, hashcodes must be differnt
Remember
If two objects are equal, then their hashcode MUST be equal
But if hashcodes are equal, then its not necessary that objects should be equal
This is because of the default implementation of equals and hashCode in Java.
The JVM has no way of knowing how you decide that two objects are the same. What it does is use memory references. So, by default, the equals and hashCode methods compare memory references; i.e. two different objects are never .equals.
If you want to override this behaviour (and it is recommended you do so if you wish to use Collections for example) then all you need to do is implement your own equals and hashCode methods.
The problem is that t1 and t2 are not the same object they are different object. All objects created with new are different objects. And the default hashCode() implementation usually returns different hash codes for different objects. See Object.hashCode API.

Creating a hashCode() Method - Java

I'm having some trouble writing a hashCode() method for a class I created. This class is meant to be used inside a TreeSet, and as such, it implements Comparable. The class has the following variables:
public class Node implements Comparable<Node> {
Matrix matrix;
int[] coordinates= new int[2];
Node father;
int depth;
int cost;
Here's the implementation of the compareTo() method. I want the TreeSet to organize these Node structures by their cost, therefore, compareTo() returns the result of a simple subtraction.
public int compareTo(Node nodeToCompare) {
return this.cost - nodeToCompare.cost;
}
I also implemented an equals() method.
public boolean equals(Object objectToCompare) {
if(objectToCompare== this) {return true;}
if(objectToCompare== null || objectToCompare.getClass()!= this.getClass()) {return false;}
Node objectNode= (Node) objectToCompare;
return this.father.equals(objectNode.father) &&
this.depth== objectNode.depth &&
this.cost== objectNode.cost &&
this.matrix.equals(objectNode.matrix) &&
Arrays.equals(this.coordinates, objectNode.coordinates);
}
Having said all of that, I have a few questions:
Since I implemented a new equals() method, should I implement a new hashCode() method?
How can I go about implementing a new hashCode method() with those variables? (Note that the variable matrix of the type Matrix has a hashCode() method implemented)
That's all!
Your compareTo method is not consistent with your equals method: your compareTo method says that two instances are equivalent if they have the same cost — such that a TreeSet can only ever contain at most one instance with a given cost — but your equals method says that they're only equivalent if they have the same cost and are the same in various other ways.
So, assuming that your equals method is correct:
you need to fix your compareTo method to be consistent with it.
you need to create a hashCode method that is consistent with it. I recommend using the same sort of logic as is used by java.util.List.hashCode(), which is a straightforward and effective way to assemble the hash-codes of component objects in a specific order; basically you would write something like: int hashCode = 1;
hashCode = 31 * hashCode + (father == null ? 0 : father.hashCode());
hashCode = 31 * hashCode + depth;
hashCode = 31 * hashCode + cost;
hashCode = 31 * hashCode + matrix.hashCode();
hashCode = 31 * hashCode + java.util.Arrays.hashCode(coordinates);
return hashCode;
Intellij IDEA can do this as a ' right-click' feature. Just seeing it done correctly will teach you alot.
And you should override both in any case.
The contract for the hashCode method states that if two objects are equal, then calling hashCode() should give you the same integer result. The opposite does not have to be true, i.e. if two hashCodes are the same the objects don't have to equal each other.
Looking at your equals method (which needs variable translation btw), you can add the hashCodes of all the internal member variables that need to be equals for your equals method to give true. e.g.
public int hashCode() {
return this.matrix.hashCode() +
this.coordinates[0] +
this.coordinates[1] +
this.father.hashCode() +
this.depth + this.cost;
}
The above assumes that matrix and father are never nulls, you need to make sure that you check for nulls if that's not the case.
If you feel more adventurous you can multiply a few of the above with a prime to ensure you don't get hashCode collisions for different data (this will help improve performance if you are using your class in hashTables and hashMaps). If you need to cater for nulls, the above method can be written a bit better like this:
public int hashCode() {
return ((this.matrix == null) ? 0 : this.matrix.hashCode()) +
17 * this.coordinates[0] +
this.coordinates[1] +
((this.father == null) ? 0 : this.father.hashCode()) +
31 * this.depth + 19 * this.cost;
}
If your collection is small you can return constant from hashCode method. It use for quick finding. hashCodes is like the boxes, which keep elements. Rules are:
Equal elements must be in same box (have same hashCode) - surely;
Not equal elements can be either in same or in different boxes.
Then you return constant, you obey these 2 rules, but it can significantly decrease perfomance on not small lists (because JVM will look for in all elements, and not in elements in the same box only). But return constant is the bad approach.
PS: Sorry for my writing. English is not my native language.
PPS: usualy you have to implement hashCode method in the same way as equals (use same elements)

In Java, why must equals() and hashCode() be consistent?

If I override either method on a class, it must make sure that if A.equals(B) == true then A.hashCode() == B.hashCode must also be true.
Can someone show me a simple example where if this is violated, it'll cause a problem? I think it has something to do with if you use that class as the type of keys to Hashmap?
Sure:
public class Test {
private final int m, n;
public Test(int m, int n) {
this.m = m;
this.n = n;
}
public int hashCode() { return n * m; }
public boolean equals(Object ob) {
if (ob.getClass() != Test.class) return false;
Test other = (Test)ob;
return m == other.m;
}
}
with:
Set<Test> set = new HashSet<Test>();
set.put(new Test(3,4));
boolean b = set.contains(new Test(3, 10)); // false
Technically that should be true because m == 3 in both cases.
In general a HashMap works like this: it has a variable number of what are commonly called "buckets". The number of buckets can change over time (as entries are added and removed) but it is always a power of 2.
Let's say a given HashMap has 16 buckets. When you call put() to add an entry, the hashCode() of the key is calculated and then a mask is taken depending on the size of the buckets. If you (bitwise) AND the hashCode() with 15 (0x0F) you will get the last 4 bits, equaling a number between 0 and 15 inclusive:
int factor = 4;
int buckets = 1 << (factor-1) - 1; // 16
int mask = buckets - 1; // 15
int code = key.hashCode();
int dest = code & mask; // a number from 0 to 15 inclusive
Now if there is already an entry in that bucket you have what's called a collision. There are multiple ways of dealing with this but the one used by HashMap (and is probably the most common overall) is bucketing. All the entries with the same masked hashCode are put in a list of some kind.
So to find if a given key is in the map already:
Calculate the masked hash code;
Find the appropriate bucket;
If it's empty, key not found;
If is isn't empty, loop through all entries in the bucket checking equals().
Looking through a bucket is a linear (O(n)) operation but it's on a small subset. The hashcode bucket determination is essentially constant (O(1)). If buckets are sufficiently small then access to a HashMap is usually described as "near O(1)".
You can make a couple of observations about this.
Firstly, if you have a bunch of objects that all return 42 as their hash code a HashMap will still work but it will operate as an expensive list. Access will be O(n) (as everything will be in the same bucket regardless of the number of buckets). I've actually been asked this in an interview.
Secondly, returning to your original point, if two objects are equal (meaning a.equals(b) == b.equals(a) == true) but have different hash codes then the HashMap will go looking in (probably) the wrong bucket resulting in unpredictable and undefined behaviour.
This is discussed in the Item 8: Always override hashCode when you override equals of Joshua Bloch's Effective Java:
A common source of bugs is the failure to override the hashCode method. You must
override hashCode in every class that overrides equals. Failure to do so will
result in a violation of the general contract for Object.hashCode, which will pre-
vent your class from functioning properly in conjunction with all hash-based collec-
tions, including HashMap, HashSet, and Hashtable.
Here is the contract, copied from the
java.lang.Object specification:
Whenever it is invoked on the same object more than once during an execution of an application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application.
If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
It is not required that if two objects are unequal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hash tables.
The key provision that is violated when you fail to override hashCode is
the second one: Equal objects must have equal hash codes. Two distinct
instances may be logically equal according to the class’s equals method, but to
the Object class’s hashCode method, they’re just two objects with nothing much
in common. Therefore object’s hashCode method returns two seemingly random
numbers instead of two equal numbers as required by the contract.
For example, consider the following simplistic PhoneNumber class, whose
equals method is constructed according to the recipe in Item 7:
public final class PhoneNumber {
private final short areaCode;
private final short exchange;
private final short extension;
public PhoneNumber(int areaCode, int exchange,
int extension) {
rangeCheck(areaCode, 999, "area code");
rangeCheck(exchange, 999, "exchange");
rangeCheck(extension, 9999, "extension");
this.areaCode = (short) areaCode;
this.exchange = (short) exchange;
this.extension = (short) extension;
}
private static void rangeCheck(int arg, int max,
String name) {
if (arg < 0 || arg > max)
throw new IllegalArgumentException(name +": " + arg);
}
public boolean equals(Object o) {
if (o == this)
return true;
if (!(o instanceof PhoneNumber))
return false;
PhoneNumber pn = (PhoneNumber)o;
return pn.extension == extension &&
pn.exchange == exchange &&
pn.areaCode == areaCode;
}
// No hashCode method!
... // Remainder omitted
}
Suppose you attempt to use this class
with a HashMap:
Map m = new HashMap();
m.put(new PhoneNumber(408, 867, 5309), "Jenny");
At this point, you might expect
m.get(new PhoneNumber(408 , 867,
5309)) to return "Jenny", but it
returns null. Notice that two PhoneNumber instances are
involved: One is used for insertion
into the HashMap, and a second, equal,
instance is used for (attempted)
retrieval. The PhoneNumber class’s
failure to override hashCode causes
the two equal instances to have
unequal hash codes, in violation of
the hashCode contract. Therefore the
get method looks for the phone number
in a different hash bucket from the
one in which it was stored by the put
method. Fixing this problem is as
simple as providing a proper hashCode
method for the PhoneNumber class.
[...]
See the Chapter 3 for the full content.
Containers like HashSet rely on the hash function to determine where to put it, and where to get it from when asked for it. If A.equals(B), then a HashSet is expecting A to be in the same place as B. If you put A in with value V, and look up B, you should expect to get V back (since you've said A.equals(B)). But if A.hashcode() != B.hashcode(), then the hashset may not find where you put it.
Here's a little example:
Set<Foo> myFoos = new HashSet<Foo>();
Foo firstFoo = new Foo(123,"Alpha");
myFoos.add(firstFoo);
// later in the processing you get another Foo from somewhere
Foo someFoo = //use imagination here...;
// maybe you get it from a database... and it's equal to Foo(123,"Alpha)
if (myFoos.contains(someFoo)) {
// maybe you win a million bucks.
}
So, imagine that the hashCode that gets created for firstFoo is 99999 and it winds up at a specific spot in the myFoos HashSet. Later when you get the someFoo and you look for it in the myFoos HashSet, it needs to generate the same hashCode so you can find it.
It's exactly because of hash tables.
Because of the possibility of hash code collisions, hash tables need to check identity as well, otherwise the table can't determine if it found the object it was looking for, or one with the same hash code. So every get() in a hash table calls key.equals(potentialMatch) before returning a value.
If equals() and hashCode() are inconsistent you can get very inconsistent behavior. Say for two objects, a and b, a.equals(b) returns true, but a.hashCode() != b.hashCode(). Insert a and a HashSet will return false for .contains(b), but a List created from that set will return true (because the list doesn't use hash codes).
HashSet set = new HashSet();
set.add(a);
set.contains(b); // false
new ArrayList(set).contains(b); // true
Obviously, that could be bad.
The idea behind this is that two objects are "equal" if all of their fields have equal values. If all of fields have equal values, the two objects should have the same hash value.

Categories