HashMap key problems

HashMap key problems - java

I'm profiling some old java code and it appears that my caching of values using a static HashMap and a access method does not work.
Caching code (a bit abstracted):
static HashMap<Key, Value> cache = new HashMap<Key, Value>();
public static Value getValue(Key key){
System.out.println("cache size="+ cache.size());
if (cache.containsKey(key)) {
System.out.println("cache hit");
return cache.get(key);
} else {
System.out.println("no cache hit");
Value value = calcValue();
cache.put(key, value);
return value;
}
}
Profiling code:
for (int i = 0; i < 100; i++)
{
getValue(new Key());
}
Result output:
cache size=0
no cache hit
(..)
cache size=99
no cache hit
It looked like a standard error in Key's hashing code or equals code.
However:
new Key().hashcode == new Key().hashcode // TRUE
new Key().equals(new Key()) // TRUE
What's especially weird is that cache.put(key, value) just adds another value to the hashmap, instead of replacing the current one.
So, I don't really get what's going on here. Am I doing something wrong?
edit
Ok, I see that in the real code the Key gets used in other methods and changes, which therefore get's reflected in the hashCode of the object in the HashMap. Could that be the cause of this behaviour, that it goes missing?

On a proper #Override of equals/hashCode
I'm not convinced that you #Override (you are using the annotation, right?) hashCode/equals properly. If you didn't use #Override, you may have defined int hashcode(), or boolean equals(Key), neither of which would do what is required.
On key mutation
If you are mutating the keys of the map, then yes, trouble will ensue. From the documentation:
Note: great care must be exercised if mutable objects are used as map keys. The behavior of a map is not specified if the value of an object is changed in a manner that affects equals comparisons while the object is a key in the map.
Here's an example:
Map<List<Integer>,String> map =
new HashMap<List<Integer>,String>();
List<Integer> theOneKey = new ArrayList<Integer>();
map.put(theOneKey, "theOneValue");
System.out.println(map.containsKey(theOneKey)); // prints "true"
theOneKey.add(42);
System.out.println(map.containsKey(theOneKey)); // prints "false"
By the way, prefer interfaces to implementation classes in type declarations. Here's a quote from Effective Java 2nd Edition: Item 52: Refer objects by their interfaces
[...] you should favor the use of interfaces rather than classes to refer to objects. If appropriate interface types exist, then parameters, return values, variables, and fields should all be declared using interface types.
In this case, if at all possible, you should declare cache as simply a Map instead of a HashMap.

I'd recommend double and triple checking the equals and hashCode methods. Note that it's hashCode, not hashcode.

Looking at the (abstracted) code, everything seems to be in order. It may be that the actual code is not like your redacted version, and that this is more a reflection of how you expect the code to work and not what is happening in practice!
If you can post the code, please do that. In the meantime, here are some pointers to try:
After adding a Key, use exactly the same Key instance again, and verify that it produces a cache hit.
In your test, verify the hashcodes are equal, and that the objects are equal.
Is the Map implementation really a HashMap? WeakHashMap will behave in the way you describe once the keys are no longer reachable.

I'm not sure what your Key class is, but (abstractly similarly to you) what I'd do for a simple check is:
Key k1 = new Key();
Key k2 = new Key();
System.out.println("k1 hash:" + k1.hashcode);
System.out.println("k2 hash:" + k2.hashcode);
System.out.println("ks equal:" + k1.equals(k2));
getValue(k1);
getValue(k2);
if this code shows the anomaly -- same hashcode, equal keys, yet no cache yet -- then there's cause to worry (or, better, debug your Key class;-). The way you're testing, with new Keys all the time, might produce keys that don't necessarily behave the same way.

Related

Why we need to override hashCode and equals?

By default hashCode and equals works fine.
I have used objects with hash tables like HashMap, without overriding this methods, and it was fine. For example:
public class Main{
public static void main(String[] args) throws Exception{
Map map = new HashMap<>();
Object key = new Main();
map.put(key, "2");
Object key2 = new Main();
map.put(key2, "3");
System.out.println(map.get(key));
System.out.println(map.get(key2));
}
}
This code works fine. By default hashCode returning memory address of object, and equals checks if two objects is the same. So what is the problem with using default implementation of this methods?

Note this example from an old pdf I have:
This code
public class Name {
private String first, last;
public Name(String first, String last) { this.first = first; this.last = last;
}
public boolean equals(Object o) {
if (!(o instanceof Name)) return false;
Name n = (Name)o;
return n.first.equals(first) && n.last.equals(last);
}
public static void main(String[] args) {
Set s = new HashSet();
s.add(new Name("Donald", "Duck"));
System.out.println(
s.contains(new Name("Donald", "Duck")));
}
}
...will not always give the same result because as it is stated in the pdf
Donald is in the set, but the set can’t find him. The Name class
violates the hashCode contract
Because, in this case, there are two strings composing the object the hashcode should also be composed of those two elements.
To fix this code we should add a hashCode method:
public int hashCode() {
return 31 * first.hashCode() + last.hashCode();
}
This question in the pdf ends saying that we should
override hashCode when overriding equals

In your example, whenever you want to retrieve something from you HashMap, you need to have key and key2, because their equals() is the same as object identity. This makes the HashMap completely useless, because you cannot retrieve anything from it without having these two keys. Passing the keys around doesn't make sense, because you could just as well pass the values around, it would be equally awkward.
Now try to imagine some use case, where a HashMap actually makes sense. For example, suppose that you get String-valued requests from the outside, and want to return, say, ip-addresses. The keys that come from the outside obviously cannot be the same as the keys you used to set up your map. Therefore you need some methods that compare requests from the outside to the keys you used during the initialization phase. This is exactly what equals is good for: it defines an equivalence relation on objects that are not identical in the sense of being represented by the same bits in physical memory. hashCode is a coarser version of equals, which is necessary to retrieve values from HashMaps quickly.

Your example is not very useful as it would be simpler to have simple variables. i.e. the only way to lookup the value in the map is to hold the original key. In which case, you may as well just hold the value and not have a Map in the first place.
If instead you want to be able to create a new key which is considered equivalent to a key used previously, you have to provide how equivalence is determined.

Given that most objects are never asked for their identity hash code, the system does not keep for most objects any information that would be sufficient to establish a permanent identity. Instead, Java uses two bits in the object header to distinguish three states:
The identity hashcode for the object has never been queried.
The identity hashcode has been queried, but the object has not been moved by the GC since then.
The identity hashcode has been queried, and the object has been moved since then.
For objects in the first state, asking for the identity hash code will change the object to the second state and process it as a second-state object.
For objects in the second state, including those which had moments before been in the first state, the identity hash code will be formed from the address.
When an object in the second state is moved by the GC, the GC will allocate an extra 32 bits to the object, which will be used to hold a hash-code derived from its original address. The object will then be assigned to the third state.
Subsequent requests for the hash code from a state-3 object will use that value that was stored when it was moved.
At times when the system knows that no objects within a certain address range are in state 2, it may change the formula used to compute hash codes from addresses in that range.
Although at any given time there may only be one object at any given address, it is entirely possible that an object might be asked for its identity hash code and later moved, and that another object might be placed at the either same address as the first one, or an address that would hash to the same value (the system might change the formula used to compute hash values to avoid duplication, but would be unable to eliminate it).

Java Modifying key object inside map

I am having a problem with JAVA map. I enter an object as a key in the map. Then I modify the key and the map does not consider the object as a key of the map any more. Even though the key inside the object has been modified accordingly.
I am working with the object CoreLabel from StanfordNLP but it applies to a general case I guess.
Map <CoreLabel, String> myMap = new HashMap...
CoreLabel key = someCreatedCoreLabel
myMap.put(key, someString)
myMap.get(key) != null ----> TRUE
key.setValue("someValue");
myMap.get(key) != null ----> FALSE
I hope I was clear enough. The question is why is the last statement false? I am not a very experienced programmer but I would expect it to be true. Maybe has something to do with the CoreLabel object?
I check if .equals() still holds, and it actually does
for(CoreLabel token: myMap.keySet()) {
if(key.equals(token))
System.out.println("OK");
}

This is explicitly documented in the Map Javadoc as dangerous and unlikely to work:
Note: great care must be exercised if mutable objects are used as map keys. The behavior of a map is not specified if the value of an object is changed in a manner that affects equals comparisons while the object is a key in the map. A special case of this prohibition is that it is not permissible for a map to contain itself as a key. While it is permissible for a map to contain itself as a value, extreme caution is advised: the equals and hashCode methods are no longer well defined on such a map.

The problem is that in modifying the value of the key, now the hash code of the key has changed as well. A HashMap will first use the hash code of the key to determine if it exists. The modified hash code didn't exist in the map, so it didn't even get to try the step of using the equals method. That's why it's a bad idea to change your key objects while they're in a HashMap.

How could a LinkedHashMap fail to find an entry produced by an iterator?

Under what circumstances given a correct implementation of hashCode and equals() can the following code return false?
myLinkedHashMap.containsKey(myLinkedHashMap.keySet().iterator().next())

Most likely scenario I can think of would be even though hashCode is "deterministic", it may be based on mutable fields. If you change the fields used to compute hashCode after it's put in the Map, then you won't be able to find it anymore.
Edit: should clarify you 'usually' won't be able to find it anymore. Occasionally it will still work since two numbers can still rehash into the same bucket. This, of course, only adds to the confusion when it happens!

Every hash algorithm I have seen is "deterministic", in that for a given set of input values, you get the same hash value.
If the hash code is computed based on mutable properties of the object, the hash code will change after it's in the hash map if any of those mutable properties are changed.

It's not clear what you mean by "deterministic", but any hash-changing mutation to the key after it's been inserted into the hash map could easily have that effect.
import java.util.*;
public class Test {
public static void main(String[] args) {
List<String> strings = new ArrayList<String>();
Map<List<String>, String> map = new LinkedHashMap<List<String>, String>();
map.put(strings, "");
System.out.println(map.containsKey(map.keySet().iterator().next())); // true
strings.add("Foo");
System.out.println(map.containsKey(map.keySet().iterator().next())); // false
}
}
The hash code of ArrayList<T> is deterministic, but that doesn't mean it won't change if the contents of the list changes.

If the hashCode() is based on instance attributes that are mutable and those attributes are changed after the insertion, the hashCode() call during the iteration will return something different. And the equals() should be based on these same attributes, it will be expected to fail as well.
When another thread has removed all the next items from an Map in the middle of an iteration, there will be no more next().
I would not use the hashCode() values as keys, I would you the objects themselves.

If your hashCode and equals don't agree with one another, this could return false. For example, if the equals method always returns false, this will return false, since there isn't any object that would ever compare equal to the keys in the map.
Hope this helps!

You might want to check hasNext() first.

You could remove the first key in another thread between getting the first keys and calling containsKey.

Get values for keys within a range in Java

Suppose I have a map in Java which looks like this:
{
39:"39 to 41",
41:"41 to 43",
43:"43 to 45",
45:">=45"
}
If the keys are in sorted order(either using treemap or linkedhashmap).Now if i try to get a value which is >=39 and <41.Then I should get the String "39 to 41".How do I do this efficiently?

It looks like you want more than a SortedMap; you want a NavigableMap! Specifically you can use the floorKey operation.
Here's an example:
NavigableMap<Integer,String> map =
new TreeMap<Integer, String>();
map.put(0, "Kid");
map.put(11, "Teens");
map.put(20, "Twenties");
map.put(30, "Thirties");
map.put(40, "Forties");
map.put(50, "Senior");
map.put(100, "OMG OMG OMG!");
System.out.println(map.get(map.floorKey(13))); // Teens
System.out.println(map.get(map.floorKey(29))); // Twenties
System.out.println(map.get(map.floorKey(30))); // Thirties
System.out.println(map.floorEntry(42).getValue()); // Forties
System.out.println(map.get(map.floorKey(666))); // OMG OMG OMG!
Note that there are also ceilingKey, lowerKey, higherKey, and also …Entry instead of …Key operations as well which returns a Map.Entry<K,V> instead of just the K.

Try Java 6 java.util.NavigableMap. http://download.oracle.com/javase/6/docs/api/java/util/NavigableMap.html.
In special use floorKey/floorEntry.
By example: floorKey(40) should return 39. floorEntry would return the value you are looking for.

With a sorted map, you could do something like that:
SortedMap<Integer,String> head = map.headMap(value+1);
if (head.isEmpty()) {
return null;
} else {
return head.get(head.lastKey());
}

I'm not sure that's going to be easy. One suggestion would be to "fill in the gaps", ie put in a value 40->"39 to 41" etc etc. I suppose that will only be possible if you know the whole range of numbers possible in the map.
Or mabybe something that overrides the get to check to see if the value is in the map, and expanding out until it finds something. I'm not sure that's going to be possible in its current guise, as you'd have to end up parsing the value strings.

You can recursively look for lower boundary.
public String descriptionFor(int value) {
String description = map.get(value);
return description == null ? descriptionFor(value--) : description;
}
You will need to have a minimum boundary.

You'd have to implement such a map yourself, I believe. You're right that it would have to be sorted; the implementation of get would have to iterate through the keys until it finds the largest key that is less than or equal to the argument.
If you subclass TreeMap it would initially appear that you can get this working via simply overriding the get() method. However, to maintain as much of the Map contract as possible you'll have to override other methods for consistency.
And what about e.g. containsKey()? Does your main contain a mapping for 40? If you return false, then a client can decide not to call get() based on this information; for these reason (and the formal definition) you have to return true. But then it makes it hard to determine whether the map "really contains" a given mapping; if you're looking to do something such as update without overwriting anything that already exists.
The remove() method might be tricky too. From my reading of the interface,
// Calling map.remove "Removes the mapping for a key from this map if it is present."
map.remove(x);
// Now that the mapping is removed, I believe the following must hold
assert map.get(x) == null;
assert map.containsKey(x);
Acting consistently here would be very tricky. If you have a mapping from 35-40 for example, and you call remove(38), then as I understand it you'd have to return null for any subsequent gets for the key 38, but return the aforementioned mapping for keys 35-37 or 39-40.
So while you can make a start on this by overriding TreeMap, perhaps the whole concept of Map is not quite what you want here. Unless you need this behaviour to slot into existing methods that take Map, it might be easier to create it yourself as a distinct class since it's not quite a Map, the way you're defining it.

HashSet.remove() and Iterator.remove() not working

I'm having problems with Iterator.remove() called on a HashSet.
I've a Set of time stamped objects. Before adding a new item to the Set, I loop through the set, identify an old version of that data object and remove it (before adding the new object). the timestamp is included in hashCode and equals(), but not equalsData().
for (Iterator<DataResult> i = allResults.iterator(); i.hasNext();)
{
DataResult oldData = i.next();
if (data.equalsData(oldData))
{
i.remove();
break;
}
}
allResults.add(data)
The odd thing is that i.remove() silently fails (no exception) for some of the items in the set. I've verified
The line i.remove() is actually called. I can call it from the debugger directly at the breakpoint in Eclipse and it still fails to change the state of Set
DataResult is an immutable object so it can't have changed after being added to the set originally.
The equals and hashCode() methods use #Override to ensure they are the correct methods. Unit tests verify these work.
This also fails if I just use a for statement and Set.remove instead. (e.g. loop through the items, find the item in the list, then call Set.remove(oldData) after the loop).
I've tested in JDK 5 and JDK 6.
I thought I must be missing something basic, but after spending some significant time on this my colleague and I are stumped. Any suggestions for things to check?
EDIT:
There have been questions - is DataResult truly immutable. Yes. There are no setters. And when the Date object is retrieved (which is a mutable object), it is done by creating a copy.
public Date getEntryTime()
{
return DateUtil.copyDate(entryTime);
}
public static Date copyDate(Date date)
{
return (date == null) ? null : new Date(date.getTime());
}
FURTHER EDIT (some time later):
For the record -- DataResult was not immutable! It referenced an object which had a hashcode which changed when persisted to the database (bad practice, I know). It turned out that if a DataResult was created with a transient subobject, and the subobject was persisted, the DataResult hashcode was changed.
Very subtle -- I looked at this many times and didn't notice the lack of immutability.

I was very curious about this one still, and wrote the following test:
import java.util.HashSet;
import java.util.Iterator;
import java.util.Random;
import java.util.Set;
public class HashCodeTest {
private int hashCode = 0;
#Override public int hashCode() {
return hashCode ++;
}
public static void main(String[] args) {
Set<HashCodeTest> set = new HashSet<HashCodeTest>();
set.add(new HashCodeTest());
System.out.println(set.size());
for (Iterator<HashCodeTest> iter = set.iterator();
iter.hasNext();) {
iter.next();
iter.remove();
}
System.out.println(set.size());
}
}
which results in:
1
1
If the hashCode() value of an object has changed since it was added to the HashSet, it seems to render the object unremovable.
I'm not sure if that's the problem you're running into, but it's something to look into if you decide to re-visit this.

Under the covers, HashSet uses HashMap, which calls HashMap.removeEntryForKey(Object) when either HashSet.remove(Object) or Iterator.remove() is called. This method uses both hashCode() and equals() to validate that it is removing the proper object from the collection.
If both Iterator.remove() and HashSet.remove(Object) are not working, then something is definitely wrong with your equals() or hashCode() methods. Posting the code for these would be helpful in diagnosis of your issue.

Are you absolutely certain that DataResult is immutable? What is the type of the timestamp? If it's a java.util.Date are you making copies of it when you're initializing the DataResult? Keep in mind that java.util.Date is mutable.
For instance:
Date timestamp = new Date();
DataResult d = new DataResult(timestamp);
System.out.println(d.getTimestamp());
timestamp.setTime(System.currentTimeMillis());
System.out.println(d.getTimestamp());
Would print two different times.
It would also help if you could post some source code.

You should all be careful of any Java Collection that fetches its children by hashcode, in the case that its child type's hashcode depends on its mutable state. An example:
HashSet<HashSet<?>> or HashSet<AbstaractSet<?>> or HashMap variant:
HashSet retrieves an item by its hashCode, but its item type
is a HashSet, and hashSet.hashCode depends on its item's state.
Code for that matter:
HashSet<HashSet<String>> coll = new HashSet<HashSet<String>>();
HashSet<String> set1 = new HashSet<String>();
set1.add("1");
coll.add(set1);
print(set1.hashCode()); //---> will output X
set1.add("2");
print(set1.hashCode()); //---> will output Y
coll.remove(set1) // WILL FAIL TO REMOVE (SILENTLY)
Reason being is HashSet's remove method uses HashMap and it identifies keys by hashCode, while AbstractSet's hashCode is dynamic and depends upon the mutable properties of itself.

Thanks for all the help. I suspect the problem must be with equals() and hashCode() as suggested by spencerk. I did check those in my debugger and with unit tests, but I've got to be missing something.
I ended up doing a workaround-- copying all the items except one to a new Set. For kicks, I used Apache Commons CollectionUtils.
Set<DataResult> tempResults = new HashSet<DataResult>();
CollectionUtils.select(allResults,
new Predicate()
{
public boolean evaluate(Object oldData)
{
return !data.equalsData((DataResult) oldData);
}
}
, tempResults);
allResults = tempResults;
I'm going to stop here-- too much work to simplify down to a simple test case. But the help is miuch appreciated.

It's almost certainly the case the hashcodes don't match for the old and new data that are "equals()". I've run into this kind of thing before and you essentially end up spewing hashcodes for every object and the string representation and trying to figure out why the mismatch is happening.
If you're comparing items pre/post database, sometimes it loses the nanoseconds (depending on your DB column type) which can cause hashcodes to change.

Have you tried something like
boolean removed = allResults.remove(oldData)
if (!removed) // COMPLAIN BITTERLY!
In other words, remove the object from the Set and break the loop. That won't cause the Iterator to complain. I don't think this is a long term solution but would probably give you some information about the hashCode, equals and equalsData methods

The Java HashSet has an issue in "remove()" method. Check the link below. I switched to TreeSet and it works fine. But I need the O(1) time complexity.
https://bugs.openjdk.java.net/browse/JDK-8154740

If there are two entries with the same data, only one of them is replaced... have you accounted for that? And just in case, have you tried another collection data structure that doesn't use a hashcode, say a List?

I'm not up to speed on my Java, but I know that you can't remove an item from a collection when you are iterating over that collection in .NET, although .NET will throw an exception if it catches this. Could this be the problem?

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.