I have an object in a LinkedHashSet that implements equals, hashCode and compareTo (in a superclass) but when I try to remove that exact object from the set set.remove(obj) the remove method returns false and the object remains in the set. Is the implementation of LinkedHashSet supposed to call the equals() method of its objects? Because it doesn't. Could this be a java bug? I'm running 1.6.0_25.
My guess would be that your object's hashCode() implementation is returning a different value than when you added the object to the set.
LinkedHashSet works fine for me:
import java.util.*;
public class Test {
public static void main( String[] args ) {
LinkedHashSet<String> lhs = new LinkedHashSet<String>();
String s = "hi";
lhs.add( s );
System.out.println( lhs );
lhs.remove( s );
System.out.println( lhs );
}
}
Perhaps you're passing in a reference to a different object to the remove method? Are you sure you didn't change the reference in any way?
Also make sure that hashCode() returns the same value when you insert it as when you are trying to remove it.
The chances of this being a bug in LinkedHashSet are infinitessimnally small. You should dismiss this as a plausible explanation of your problem.
Assuming that this is a bug in your code, then it could be due to a number of things. For instance:
Your equals and hashCode methods are returning contradictory answers for the object.
Your equals or hashCode methods depend on mutable fields and those fields are being changed while the object is in the set. (For instance, if the hashcode value changes, the object is likely to be on the wrong hash chain, causing the remove method to not find it.)
You have declared the equals method as an overload, not an override of equals(Object). (That could explain why your equals is not being called ... assuming that your assertion is factually correct.)
The object you are trying to remove is (in reality) not the one you inserted.
Something else has already removed the object.
You are running a different version of some class that does not match the source code you have been examining.
Now, I know that you have dismissed some of these explanations. But that may have been premature. Review the evidence that you based that dismissal on.
Another approach you could use is to use a Java debugger to forensically examine the data structures (e.g. the innards of the LinkedHashSet) and single-step the code where the deletion is supposed to be happening.
Related
I'm trying to understand java.util.Collection and java.util.Map a little deeper but I have some doubts about HashSet funcionality:
In the documentation, it says: This class implements the Set interface, backed by a hash table (actually a HashMap instance). Ok, so I can see that a HashSet always has a Hashtable working in background. A hashtable is a struct that asks for a key and a value everytime you want to add a new element to it. Then, the value and the key are stored in a bucket based on the key hashCode. If the hashcodes of two keys are the same, they add both key values to the same bucket, using a linkedlist. Please, correct me if I said something wrong.
So, my question is: If a HashSet always has a Hashtable acting in background, then everytime we add a new element to the HashSet using HashSet.add() method, the HashSet should add it to its internal Hashtable. But, the Hashtable asks for a value and a key, so what key does it use? Does it just uses the value we're trying to add also as a key and then take its hashCode? Please, correct me if I said something wrong about HashSet implementation.
Another question that I have is: In general, what classes can use the hashCode() method of an java object? I'm asking this because, in the documentation, it says that everytime we override equals() method we need to override hashCode() method. Ok, it really makes sense, but my doubt is if it's just a recommendation we should do to keep everything 'nice and perfect' (putting in this way), or if it's really necessary, because maybe a lot of Java defaults classes will constantly uses hashCode() method of your objects. In my vision, I can't see other classes using this method instead of those classes related to Collections. Thank you very much guys
If you look at the actual javacode of HashSet you can see what it does:
// Dummy value to associate with an Object in the backing Map
private static final Object PRESENT = new Object();
...
public boolean add(E e) {
return map.put(e, PRESENT)==null;
}
So the element you are adding is the Key in the backing hashmap with a dummy value as the value. this dummy value is never actually used by the hashSet.
Your second question regarding overriding equals and hashcode:
It is really necessary to always override both if you want to override either one. This is because the contract for hashCode says equal objects must have the same hashcode. the default implementation of hashcode will give different values for each instance.
Therefore, if you override equals() but not hashcode() This could happen
object1.equals(object2) //true
MySet.add(object1);
MySet.contains(object2); //false but should be true if we overrode hashcode()
Since contains will use hashcode to find the bucket to search in we might get a different bucket back and not find the equal object.
If you look at the source for HashSet (the source comes with the JDK and is very informative), you will see that it creates an object to use as the value:
// Dummy value to associate with an Object in the backing Map
private static final Object PRESENT = new Object();
Each value that is added to the HashSet is used as a key to the backing HashMap with this PRESENT object as the value.
Regarding overriding equals() whenever you override hashCode() (and vice versa), it is very important that these two methods return consistent results. That is, they should agree with one another. For more details, see the book Effective Java by Josh Bloch.
I am having trouble with Hashmaps. Currently, my HashMap is a HashMap of an enum called Names, using a key of Key Signatures, or HashMap<KeySignature, Names>. Currently, the Name Enum stores values of KeySignatures, or C_FLAT_MAJOR(new KeySignature(7, Accidental.FLAT, Scale.MAJOR);. To get the Enum version of a given KeySignature, I've created the HashMap explained above:
private static final HashMap<KeySignature, Names> lookup = new HashMap<KeySignature, Names>();
static {
for (Names name : Names.values()){
lookup.put(new KeySignature(name.getKeySig()), name);
}
}
So, when I need to check what is the Enum version of a KeySignature, I call a method, located in the KeySignature class:
public Names getCommonName() {
return Names.lookup.get(this);
}
However, the value returned is always null.
I cannot figure out what is causing this, but is seems as if the HashMap.get() method is comparing the key and the argument by reference rather than value. Do I have to override the .equals and .hash methods of KeySignature, or am I looking in the entirely wrong direction?
The answer is yes.
If you are going to create instances of KeySignature on the fly, then the equals method needs to compare them "by value". The default implementation of equals simply tests to see if the objects are ==. So, you need get the hashmap to work, you need to override the default equals AND hashcode methods.
The other alternative would be to replace your code that creates new instances of KeySignature with alternative code that looks up an existing KeySignature instance for the given combination of note, Accidental and Scale.
I want to handle a set of objects of class (MyClass) in a HashSet. When I try to add an object that already exists (relying on equals an hashCode of MyClass), the method return false. Is there a way/method to get in return the actual object that already exists?
Please give me any advice to handle that collection of object be able to get the existing object in return when add returns false?
Just check if the hashset contains you're object:
if (hashSet.contains(obj)) {
doWhateverWith(obj);
}
Short of iterating over the set, no, there is no way to get the existing member of the set that is equal to the value just added. The best way to do that would be to write a set wrapper around HashMap that maps each added value to itself.
If equals(..) returns true, then the objects are the same, so you can use the one you are trying to add to the set.
Why would you let it return the object which you're trying to add? You already have it there!
Just do something like:
if (!set.add(item)) {
// It already contains the item.
doSomethingWith(item);
}
If that does not achieve the desired result, then it simply means that the item's equals() is poorly implemented.
One possible way would be:
myClass[] myArray = mySet.toArray(new myClass[mySet.size]);
List myList = Arrays.asList(myArray);
MyClass myObject = myList.get(myList.indexof(myObject));
But of course as some people pointed out if it failed to get inserted, then that element is the element you are looking for, unless of course that you want what is stored in that memory location, and not what the equals and hashCode tells you, i.e. not the logically equal object, but the == object.
When using a HashSet, no, as far as I know you can't do that except by iterating over the whole thing and calling equals() on each one. You could, however, use a HashMap and just map every object to itself. Then call put(), which will return the previously mapped value, if any.
http://download.oracle.com/javase/6/docs/api/java/util/HashSet.html#contains(java.lang.Object)
You can use that method to check if you like; but the thing you have to keep in mind is that you already have the object that exists.
java.util.Set implementations removes the duplicate elements.
How are duplicates elements deleted internally in a java.util.Set?
Actually AFAIK from the sources most Set implementations in java don't even check if the element is already contained.
They just always execute the add() on their internal structure which holds the set elements and let that object handle the duplication case.
e.g. HashSet calls put(K,V) on the internal HashMap which just inserts the new object overwriting the old entry if duplicate.
Reading a little into your question I'm guessing that you're seeing strange behaviour with a java.util.HashSet (typically what everyone uses by default).
Contary to the contract of java.util.Set it is possible to get the same object in a java.util.HashSet twice like this:
import java.util.HashSet;
import java.util.Set;
public class SetTest
{
public static void main(String[] args)
{
MyClass myObject = new MyClass(1, "testing 1 2 3");
Set<MyClass> set = new HashSet<MyClass>();
set.add(myObject);
myObject.setHashCode(2);
set.add(myObject);
System.out.println(set.size()); // this will print 2.
}
private static class MyClass
{
private int hashCode;
private String otherField;
public MyClass(int hashCode, String otherField)
{
this.hashCode = hashCode;
this.otherField = otherField;
}
public void setHashCode(int hashCode)
{
this.hashCode = hashCode;
}
public boolean equals(Object obj)
{
return obj != null && obj.getClass().equals(getClass()) && ((MyClass)obj).otherField.equals(otherField);
}
public int hashCode()
{
return hashCode;
}
}
}
After the pointer from #jitter and a look at the source you can see why this would happen.
Like #jitter says, the java.util.HashSet uses a java.util.HashMap internally. When the hash changes between the first and second add a different bucket is used in the java.util.HashMap and the object is in the set twice.
The code sample may look a little contrieved but I've seen this happen in the wild with domain classes where the hash is created from mutable fields and the equals method hasn't been kept in sync with those fields.
An easy way to find this out is to look in the source for the code you are interested in.
Each JDK has a src.zip included which contains the source code for the public classes so you can just locate the source for HashSet and have a look :) I often use Eclipse for this. Start it, create a new Java project, set the JVM to be an installed JDK (if not you are using the system default JRE which doesn't have src.zip), and Ctrl-Shift-T to go to HashSet.
Read your question more detailed:
You can't add duplicates, from java doc for Set.add() or do you mean addAll?:
Adds the specified element to this set if it is not already present (optional operation). More formally, adds the specified element e to this set if the set contains no element e2 such that (e==null ? e2==null : e.equals(e2)). If this set already contains the element, the call leaves the set unchanged and returns false. In combination with the restriction on constructors, this ensures that sets never contain duplicate elements.
Adds the specified element to the set if it is not already present.
If the set already contains the element, the call leaves the set unchanged and returns false.In combination with the restriction on constructors, this ensures that sets never contain duplicate elements.
First off, set doesn't "Delete" duplicates, it doesn't allow entering duplicates in the first place.
Let me walk you through the implementation of set.add(e) method.
set.add(e) returns boolean stating whether e has been added in the set or not.
Let's take this simple code for example:
We will get x as true and y as false.
Let us see what add() actually does:
So, HashSet basically uses HashMap internally, and sends the element as key (and an empty initialized object called PRESENT as the value.).
This map.put(k,v) either returns a null, if the key never existed, or it would return the old value which the key had.
Therefore while doing set.add(1) for the first time, we get null in response of map.put(1,PRESENT), and that's why we get true.
And when we call it the second time we don't get null in response to map.put(1,PRESENT) and hence the set.add(1) returns false.
(You can dig deeper into the put method, which internally calls putVal and uses hash to identify if a key is already existing, depending on which it returns a null or old Value.)
And since we are using HashMap internally, which uses hash to find uniqueness of a key, we would never end up having same element twice in a HashSet.
I'm having problems with Iterator.remove() called on a HashSet.
I've a Set of time stamped objects. Before adding a new item to the Set, I loop through the set, identify an old version of that data object and remove it (before adding the new object). the timestamp is included in hashCode and equals(), but not equalsData().
for (Iterator<DataResult> i = allResults.iterator(); i.hasNext();)
{
DataResult oldData = i.next();
if (data.equalsData(oldData))
{
i.remove();
break;
}
}
allResults.add(data)
The odd thing is that i.remove() silently fails (no exception) for some of the items in the set. I've verified
The line i.remove() is actually called. I can call it from the debugger directly at the breakpoint in Eclipse and it still fails to change the state of Set
DataResult is an immutable object so it can't have changed after being added to the set originally.
The equals and hashCode() methods use #Override to ensure they are the correct methods. Unit tests verify these work.
This also fails if I just use a for statement and Set.remove instead. (e.g. loop through the items, find the item in the list, then call Set.remove(oldData) after the loop).
I've tested in JDK 5 and JDK 6.
I thought I must be missing something basic, but after spending some significant time on this my colleague and I are stumped. Any suggestions for things to check?
EDIT:
There have been questions - is DataResult truly immutable. Yes. There are no setters. And when the Date object is retrieved (which is a mutable object), it is done by creating a copy.
public Date getEntryTime()
{
return DateUtil.copyDate(entryTime);
}
public static Date copyDate(Date date)
{
return (date == null) ? null : new Date(date.getTime());
}
FURTHER EDIT (some time later):
For the record -- DataResult was not immutable! It referenced an object which had a hashcode which changed when persisted to the database (bad practice, I know). It turned out that if a DataResult was created with a transient subobject, and the subobject was persisted, the DataResult hashcode was changed.
Very subtle -- I looked at this many times and didn't notice the lack of immutability.
I was very curious about this one still, and wrote the following test:
import java.util.HashSet;
import java.util.Iterator;
import java.util.Random;
import java.util.Set;
public class HashCodeTest {
private int hashCode = 0;
#Override public int hashCode() {
return hashCode ++;
}
public static void main(String[] args) {
Set<HashCodeTest> set = new HashSet<HashCodeTest>();
set.add(new HashCodeTest());
System.out.println(set.size());
for (Iterator<HashCodeTest> iter = set.iterator();
iter.hasNext();) {
iter.next();
iter.remove();
}
System.out.println(set.size());
}
}
which results in:
1
1
If the hashCode() value of an object has changed since it was added to the HashSet, it seems to render the object unremovable.
I'm not sure if that's the problem you're running into, but it's something to look into if you decide to re-visit this.
Under the covers, HashSet uses HashMap, which calls HashMap.removeEntryForKey(Object) when either HashSet.remove(Object) or Iterator.remove() is called. This method uses both hashCode() and equals() to validate that it is removing the proper object from the collection.
If both Iterator.remove() and HashSet.remove(Object) are not working, then something is definitely wrong with your equals() or hashCode() methods. Posting the code for these would be helpful in diagnosis of your issue.
Are you absolutely certain that DataResult is immutable? What is the type of the timestamp? If it's a java.util.Date are you making copies of it when you're initializing the DataResult? Keep in mind that java.util.Date is mutable.
For instance:
Date timestamp = new Date();
DataResult d = new DataResult(timestamp);
System.out.println(d.getTimestamp());
timestamp.setTime(System.currentTimeMillis());
System.out.println(d.getTimestamp());
Would print two different times.
It would also help if you could post some source code.
You should all be careful of any Java Collection that fetches its children by hashcode, in the case that its child type's hashcode depends on its mutable state. An example:
HashSet<HashSet<?>> or HashSet<AbstaractSet<?>> or HashMap variant:
HashSet retrieves an item by its hashCode, but its item type
is a HashSet, and hashSet.hashCode depends on its item's state.
Code for that matter:
HashSet<HashSet<String>> coll = new HashSet<HashSet<String>>();
HashSet<String> set1 = new HashSet<String>();
set1.add("1");
coll.add(set1);
print(set1.hashCode()); //---> will output X
set1.add("2");
print(set1.hashCode()); //---> will output Y
coll.remove(set1) // WILL FAIL TO REMOVE (SILENTLY)
Reason being is HashSet's remove method uses HashMap and it identifies keys by hashCode, while AbstractSet's hashCode is dynamic and depends upon the mutable properties of itself.
Thanks for all the help. I suspect the problem must be with equals() and hashCode() as suggested by spencerk. I did check those in my debugger and with unit tests, but I've got to be missing something.
I ended up doing a workaround-- copying all the items except one to a new Set. For kicks, I used Apache Commons CollectionUtils.
Set<DataResult> tempResults = new HashSet<DataResult>();
CollectionUtils.select(allResults,
new Predicate()
{
public boolean evaluate(Object oldData)
{
return !data.equalsData((DataResult) oldData);
}
}
, tempResults);
allResults = tempResults;
I'm going to stop here-- too much work to simplify down to a simple test case. But the help is miuch appreciated.
It's almost certainly the case the hashcodes don't match for the old and new data that are "equals()". I've run into this kind of thing before and you essentially end up spewing hashcodes for every object and the string representation and trying to figure out why the mismatch is happening.
If you're comparing items pre/post database, sometimes it loses the nanoseconds (depending on your DB column type) which can cause hashcodes to change.
Have you tried something like
boolean removed = allResults.remove(oldData)
if (!removed) // COMPLAIN BITTERLY!
In other words, remove the object from the Set and break the loop. That won't cause the Iterator to complain. I don't think this is a long term solution but would probably give you some information about the hashCode, equals and equalsData methods
The Java HashSet has an issue in "remove()" method. Check the link below. I switched to TreeSet and it works fine. But I need the O(1) time complexity.
https://bugs.openjdk.java.net/browse/JDK-8154740
If there are two entries with the same data, only one of them is replaced... have you accounted for that? And just in case, have you tried another collection data structure that doesn't use a hashcode, say a List?
I'm not up to speed on my Java, but I know that you can't remove an item from a collection when you are iterating over that collection in .NET, although .NET will throw an exception if it catches this. Could this be the problem?