I know Set uses implementation of Map<K,V>, where Set elements are keys. But what happens with values? They use private static final Object PRESENT = new Object() as constant value for each key.
Cool. But why? That means for each key we will store value we will never use, just so we can reuse implementation of Map? Why? Couldn't they just make Key implementation? And is that constant ever used or it just 'sits' there?
As mentioned in implementation, if you can see add method of HashSet returning boolean.
The add method calls put(key, PRESENT) on the internal HashMap. The remove method calls remove(key) on the internal HashMap, but it must return a boolean indicating whether the key was present. If null were stored as the value, then the HashSet would need to call containsKey first, then remove, to determine if the key was present -- additional overhead. Here, there is only the memory overhead of one Object, which is quite minimal.
public boolean add(E e)
{
return map.put(e, PRESENT)==null;
}
For each potential 'place' that can hold an entry in the HashSet, it must be possible to tell whether that place is occupied or not.
If you want to support a key value of null, as indeed HashSet does, then you can't use key=null as the "not used" indicator. Since you need to use key.equals() to find an object in the set, you can't have a special key object that might accidentally be equal to (not identical to) an actual key.
Thus you need a separate flag apart from the key to say whether the 'place' is occupied. This might as well be an object reference as anything else; then you can reuse the Map implementation.
Related
I'm trying to understand java.util.Collection and java.util.Map a little deeper but I have some doubts about HashSet funcionality:
In the documentation, it says: This class implements the Set interface, backed by a hash table (actually a HashMap instance). Ok, so I can see that a HashSet always has a Hashtable working in background. A hashtable is a struct that asks for a key and a value everytime you want to add a new element to it. Then, the value and the key are stored in a bucket based on the key hashCode. If the hashcodes of two keys are the same, they add both key values to the same bucket, using a linkedlist. Please, correct me if I said something wrong.
So, my question is: If a HashSet always has a Hashtable acting in background, then everytime we add a new element to the HashSet using HashSet.add() method, the HashSet should add it to its internal Hashtable. But, the Hashtable asks for a value and a key, so what key does it use? Does it just uses the value we're trying to add also as a key and then take its hashCode? Please, correct me if I said something wrong about HashSet implementation.
Another question that I have is: In general, what classes can use the hashCode() method of an java object? I'm asking this because, in the documentation, it says that everytime we override equals() method we need to override hashCode() method. Ok, it really makes sense, but my doubt is if it's just a recommendation we should do to keep everything 'nice and perfect' (putting in this way), or if it's really necessary, because maybe a lot of Java defaults classes will constantly uses hashCode() method of your objects. In my vision, I can't see other classes using this method instead of those classes related to Collections. Thank you very much guys
If you look at the actual javacode of HashSet you can see what it does:
// Dummy value to associate with an Object in the backing Map
private static final Object PRESENT = new Object();
...
public boolean add(E e) {
return map.put(e, PRESENT)==null;
}
So the element you are adding is the Key in the backing hashmap with a dummy value as the value. this dummy value is never actually used by the hashSet.
Your second question regarding overriding equals and hashcode:
It is really necessary to always override both if you want to override either one. This is because the contract for hashCode says equal objects must have the same hashcode. the default implementation of hashcode will give different values for each instance.
Therefore, if you override equals() but not hashcode() This could happen
object1.equals(object2) //true
MySet.add(object1);
MySet.contains(object2); //false but should be true if we overrode hashcode()
Since contains will use hashcode to find the bucket to search in we might get a different bucket back and not find the equal object.
If you look at the source for HashSet (the source comes with the JDK and is very informative), you will see that it creates an object to use as the value:
// Dummy value to associate with an Object in the backing Map
private static final Object PRESENT = new Object();
Each value that is added to the HashSet is used as a key to the backing HashMap with this PRESENT object as the value.
Regarding overriding equals() whenever you override hashCode() (and vice versa), it is very important that these two methods return consistent results. That is, they should agree with one another. For more details, see the book Effective Java by Josh Bloch.
I am trying to perform bi directional A star search, where in I encountered this issue.
Suppose I have an object A in a hashset H. Suppose there is another object B, such that A.equals(B) is true (and their hash values are also the same), though A and B point to different objects. Now, when I check if object B is in hashset H, it returns true, as expected. However, suppose now, I want to access some attribute in object A based on this, I then need to access object A. Note that accessing the same attribute in B will not work since they are equal only under the equals method, but do not point to the same object. How can I achieve this?
One way is to use a Hashmap, such that the value type is the same as the key type, and every time I store some key in the hashmap, I store along with it the same object as its value. But this incurs extra memory overhead of storing the value, when what I really need is a copy of the key itself. Is there any other way to achieve this?
I don't believe there's any way of doing this efficiently using HashSet<E>. (You could list all the keys and check them for equality, of course.)
However, although the HashMap approach would be irritating, it wouldn't actually take any more memory. HashSet is implemented using HashMap (at least in the stock JDK 7) so there's a full map entry for each set entry anyway... and no extra memory is taken to store the value, because they'd both just be references to the same object.
One way is to use a Hashmap, such that the value type is the same as the key type, and every time I store some key in the hashmap, I store along with it the same object as its value. But this incurs extra memory overhead of storing the value, when what I really need is a copy of the key itself.
In actual fact, the implementation of HashSet in (at least) the Oracle Java SE Library in Java 7 has a HashMap inside it. So your concern about the extra memory usage of HashMap is unwarranted.
Here is a link to the source code: http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/7u40-b43/java/util/HashSet.java/
Incidentally, the internal map is declared as HashMap<E, Object> rather than HashMap<E, E>. Thought exercise for the reader: why did they do that?
The only other reasonable alternative (using HashSet) would be to iterate over the set elements, testing each one to see if it is equal to the one you are looking for. That is obviously very (time) inefficient.
You could iterate over all values in the HashSet and check the equals method of B. If the iterator returns an object which is equal to B you have found A.
I'm profiling some old java code and it appears that my caching of values using a static HashMap and a access method does not work.
Caching code (a bit abstracted):
static HashMap<Key, Value> cache = new HashMap<Key, Value>();
public static Value getValue(Key key){
System.out.println("cache size="+ cache.size());
if (cache.containsKey(key)) {
System.out.println("cache hit");
return cache.get(key);
} else {
System.out.println("no cache hit");
Value value = calcValue();
cache.put(key, value);
return value;
}
}
Profiling code:
for (int i = 0; i < 100; i++)
{
getValue(new Key());
}
Result output:
cache size=0
no cache hit
(..)
cache size=99
no cache hit
It looked like a standard error in Key's hashing code or equals code.
However:
new Key().hashcode == new Key().hashcode // TRUE
new Key().equals(new Key()) // TRUE
What's especially weird is that cache.put(key, value) just adds another value to the hashmap, instead of replacing the current one.
So, I don't really get what's going on here. Am I doing something wrong?
edit
Ok, I see that in the real code the Key gets used in other methods and changes, which therefore get's reflected in the hashCode of the object in the HashMap. Could that be the cause of this behaviour, that it goes missing?
On a proper #Override of equals/hashCode
I'm not convinced that you #Override (you are using the annotation, right?) hashCode/equals properly. If you didn't use #Override, you may have defined int hashcode(), or boolean equals(Key), neither of which would do what is required.
On key mutation
If you are mutating the keys of the map, then yes, trouble will ensue. From the documentation:
Note: great care must be exercised if mutable objects are used as map keys. The behavior of a map is not specified if the value of an object is changed in a manner that affects equals comparisons while the object is a key in the map.
Here's an example:
Map<List<Integer>,String> map =
new HashMap<List<Integer>,String>();
List<Integer> theOneKey = new ArrayList<Integer>();
map.put(theOneKey, "theOneValue");
System.out.println(map.containsKey(theOneKey)); // prints "true"
theOneKey.add(42);
System.out.println(map.containsKey(theOneKey)); // prints "false"
By the way, prefer interfaces to implementation classes in type declarations. Here's a quote from Effective Java 2nd Edition: Item 52: Refer objects by their interfaces
[...] you should favor the use of interfaces rather than classes to refer to objects. If appropriate interface types exist, then parameters, return values, variables, and fields should all be declared using interface types.
In this case, if at all possible, you should declare cache as simply a Map instead of a HashMap.
I'd recommend double and triple checking the equals and hashCode methods. Note that it's hashCode, not hashcode.
Looking at the (abstracted) code, everything seems to be in order. It may be that the actual code is not like your redacted version, and that this is more a reflection of how you expect the code to work and not what is happening in practice!
If you can post the code, please do that. In the meantime, here are some pointers to try:
After adding a Key, use exactly the same Key instance again, and verify that it produces a cache hit.
In your test, verify the hashcodes are equal, and that the objects are equal.
Is the Map implementation really a HashMap? WeakHashMap will behave in the way you describe once the keys are no longer reachable.
I'm not sure what your Key class is, but (abstractly similarly to you) what I'd do for a simple check is:
Key k1 = new Key();
Key k2 = new Key();
System.out.println("k1 hash:" + k1.hashcode);
System.out.println("k2 hash:" + k2.hashcode);
System.out.println("ks equal:" + k1.equals(k2));
getValue(k1);
getValue(k2);
if this code shows the anomaly -- same hashcode, equal keys, yet no cache yet -- then there's cause to worry (or, better, debug your Key class;-). The way you're testing, with new Keys all the time, might produce keys that don't necessarily behave the same way.
Am I correct in assuming that if you have an object that is contained inside a Java Set<> (or as a key in a Map<> for that matter), any fields that are used to determine identity or relation (via hashCode(), equals(), compareTo() etc.) cannot be changed without causing unspecified behavior for operations on the collection? (edit: as alluded to in this other question)
(In other words, these fields should either be immutable, or you should require the object to be removed from the collection, then changed, then reinserted.)
The reason I ask is that I was reading the Hibernate Annotations reference guide and it has an example where there is a HashSet<Toy> but the Toy class has fields name and serial that are mutable and are also used in the hashCode() calculation... a red flag went off in my head and I just wanted to make sure I understood the implications of it.
The javadoc for Set says
Note: Great care must be exercised if
mutable objects are used as set
elements. The behavior of a set is not
specified if the value of an object is
changed in a manner that affects
equals comparisons while the object is
an element in the set. A special case
of this prohibition is that it is not
permissible for a set to contain
itself as an element.
This simply means you can use mutable objects in a set, and even change them. You just should make sure the change doesn't impact the way the Set finds the items. For HashSet, that would require not changing the fields used for calculating hashCode().
That is correct, it can cause some problems locating the map entry. Officially the behavior is undefined, so if you add it to a hashset or as a key in a hashmap, you should not be changing it.
Yes, that will cause bad things to happen.
// Given that the Toy class has a mutable field called 'name' which is used
// in equals() and hashCode():
Set<Toy> toys = new HashSet<Toy>();
Toy toy = new Toy("Fire engine", ToyType.WHEELED_VEHICLE, Color.RED);
toys.add(toy);
System.out.println(toys.contains(toy)); // true
toy.setName("Fast truck");
System.out.println(toys.contains(toy)); // false
In a HashSet/HashMap, you could mutate a contained object to change the results of compareTo() operation -- relative comparison isn't used to locate objects. But it'd be fatal inside a TreeSet/TreeMap.
You can also mutate objects that are inside an IdentityHashMap -- nothing other than object identity is used to locate contents.
Even though you can do these things with these qualifications, they make your code more fragile. What if someone wants to change to a TreeSet later, or add that mutable field to the hashCode/equality test?
Note that I'm not actually doing anything with a database here, so ORM tools are probably not what I'm looking for.
I want to have some containers that each hold a number of objects, with all objects in one container being of the same class. The container should show some of the behaviour of a database table, namely:
allow one of the object's fields to be used as a unique key, i. e. other objects that have the same value in that field are not added to the container.
upon accepting a new object, the container should issue a numeric id that is returned to the caller of the insertion method.
Instead of throwing an error when a "duplicate entry" is being requested, the container should just skip insertion and return the key of the already existing object.
Now, I would write a generic container class that accepts objects which implement an interface to get the value of the key field and use a HashMap keyed with those values as the actual storage class. Is there a better approach using existing built-in classes? I was looking through HashSet and the like, but they didn't seem to fit.
None of the Collections classes will do what you need. You'll have to write your own!
P.S. You'll also need to decide whether your class will be thread-safe or not.
P.P.S. ConcurrentHashMap is close, but not exactly the same. If you can subclass or wrap it or wrap the objects that enter your map such that you're relying only on that class for thread-safety, you'll have an efficient and thread-safe implementation.
You can simulate this behavior with a HashSet. If the objects you're adding to the collection have a field that you can use as a unique ID, just have that field returned by the object's hashCode() method (or use a calculated hash code value, either way should work).
HashSet won't throw an error when you add a duplicate entry, it just returns false. You could wrap (or extend) HashSet so that your add method returns the unique ID that you want as a return value.
I was thinking you could do it with ArrayList, using the current location in the array as the "id", but that doesn't prevent you from making an insert at an existing location, plus when you insert at that location, it will move everything up. But you might base your own class on ArrayList, returning the current value of .size() after a .add.
Is there a reason why the object's hash code couldn't be used as a "numeric id"?
If not, then all you'd need to do is wrap the call into a ConcurrentHashMap, return the object's hashCode and use the putIfAbsent(K key, V value) method to ensure you don't add duplicates.
putIfAbsent also returns the existing value, so you could get its hashCode to return to your user.
See ConcurrentHashMap