I have following scenario (modified one than actual business purpose).
I have a program which predicts how much calories a person will
loose for the next 13 weeks based on certain attributes.
I want to cache this result in the database so that i don't call the
prediction again for the same combination.
I have class person
class Person { int personId; String weekStartDate; }
I have HashMap<List<Person>, Integer> - The key is 13 weeks data of a person and the value is the prediction
I will keep the hashvalue in the database for caching purpose
Is there a better way to handle above scenario? Any design pattern to support such scenarios
Depends: the implementation of hashCode() uses the elements of your list. So adding elements later on changes the result of that operation:
public int hashCode() {
int hashCode = 1;
for (E e : this)
hashCode = 31*hashCode + (e==null ? 0 : e.hashCode());
return hashCode;
}
Maps aren't build for keys that can change their hash values! And of course, it doesn't really make sense to implement that method differently.
So: it can work when your lists are all immutable, meaning that neither the list nor any of its members is modified after the list was used as key. But there is a certain risk: if you forget about that contract later on, and these lists see modifications, then you will run into interesting issues.
This works because the hashcode of the standard List implementations is computed with the hashcodes of the contents. You need to make sure, however, to also implement hashCode and equals in the Person class, otherwise you will get the same problem this guy had. See also my answer on that question.
I would suggest you define a class (say Data) and use it as a key in your hashmap. Override equals/hashcode accordingly with knowledge of data over weeks.
Related
I am confused regarding to the internal working of HashMap and as far as I understood, HashMap uses hashCode() and equals() methods of the given object that is stored in map (that's why the implementation of these methods are so important). On the other hand, the sources that I followed explains that as if HashMap uses its internal hashCode-equals methods to compare key values. Which one is right? Could you pls clarify me? I know the scenario of put / remove, but just need the place of the hashCode-equals methods. Are they in HashMap or Object?
In HashMap, Keys are hashed and compared, not the values. So, if you want to use your own objects as a Key, you need to implement hashCode and equals methods. e.g. You may want to store Student and corresponding Marksheet in a HashMap then for using Student objects as Key, override hashCode and equals method in that class.
If you just want to use String as key, then HashMap would call hashCode and equals method of String class
It uses it on the Object to facilitate key lookup. Here is an example.
Imagine you had 1000 files of individuals in a box and you had the ID number of one you had to find. So you need to go thru the box and search thru 1000 files to find the correct one.
Now imagine that you had two boxes. And all the even id's were in one box marked even and all the odd ones in a box marked odd. By looking at the ID you can pick the proper box (even or odd). So statistically you would only have to search thru about 500 files.
So the hashCode generates "boxes" based on the key.
And the search for the exact key within a box uses equals.
In my example, hashCode() could return 0 or 1 via something like the following:
#Override
public int hashCode() {
return id % 2; // returns 0 or 1 for even or odd box
}
Ideally a hashCode that generates many boxes (within reason) is best as the boxes become smaller (holds fewer objects). And a hashCode of a fixed value will work but it reduces the map to a single box with no benefit of improved performance.
Given that I some class with various fields in it:
class MyClass {
private String s;
private MySecondClass c;
private Collection<someInterface> coll;
// ...
#Override public int hashCode() {
// ????
}
}
and of that, I do have various objects which I'd like to store in a HashMap. For that, I need to have the hashCode() of MyClass.
I'll have to go into all fields and respective parent classes recursively to make sure they all implement hashCode() properly, because otherwise hashCode() of MyClass might not take into consideration some values. Is this right?
What do I do with that Collection? Can I always rely on its hashCode() method? Will it take into consideration all child values that might exist in my someInterface object?
I OPENED A SECOND QUESTION regarding the actual problem of uniquely IDing an object here: How do I generate an (almost) unique hash ID for objects?
Clarification:
is there anything more or less unqiue in your class? The String s? Then only use that as hashcode.
MyClass hashCode() of two objects should definitely differ, if any of the values in the coll of one of the objects is changed. HashCode should only return the same value if all fields of two objects store the same values, resursively. Basically, there is some time-consuming calculation going on on a MyClass object. I want to spare this time, if the calculation had already been done with the exact same values some time ago. For this purpose, I'd like to look up in a HashMap, if the result is available already.
Would you be using MyClass in a HashMap as the key or as the value? If the key, you have to override both equals() and hashCode()
Thus, I'm using the hashCode OF MyClass as the key in a HashMap. The value (calculation result) will be something different, like an Integer (simplified).
What do you think equality should mean for multiple collections? Should it depend on element ordering? Should it only depend on the absolute elements that are present?
Wouldn't that depend on the kind of Collection that is stored in coll? Though I guess ordering not really important, no
The response you get from this site is gorgeous. Thank you all
#AlexWien that depends on whether that collection's items are part of the class's definition of equivalence or not.
Yes, yes they are.
I'll have to go into all fields and respective parent classes recursively to make sure they all implement hashCode() properly, because otherwise hashCode() of MyClass might not take into consideration some values. Is this right?
That's correct. It's not as onerous as it sounds because the rule of thumb is that you only need to override hashCode() if you override equals(). You don't have to worry about classes that use the default equals(); the default hashCode() will suffice for them.
Also, for your class, you only need to hash the fields that you compare in your equals() method. If one of those fields is a unique identifier, for instance, you could get away with just checking that field in equals() and hashing it in hashCode().
All of this is predicated upon you also overriding equals(). If you haven't overridden that, don't bother with hashCode() either.
What do I do with that Collection? Can I always rely on its hashCode() method? Will it take into consideration all child values that might exist in my someInterface object?
Yes, you can rely on any collection type in the Java standard library to implement hashCode() correctly. And yes, any List or Set will take into account its contents (it will mix together the items' hash codes).
So you want to do a calculation on the contents of your object that will give you a unique key you'll be able to check in a HashMap whether the "heavy" calculation that you don't want to do twice has already been done for a given deep combination of fields.
Using hashCode alone:
I believe hashCode is not the appropriate thing to use in the scenario you are describing.
hashCode should always be used in association with equals(). It's part of its contract, and it's an important part, because hashCode() returns an integer, and although one may try to make hashCode() as well-distributed as possible, it is not going to be unique for every possible object of the same class, except for very specific cases (It's easy for Integer, Byte and Character, for example...).
If you want to see for yourself, try generating strings of up to 4 letters (lower and upper case), and see how many of them have identical hash codes.
HashMap therefore uses both the hashCode() and equals() method when it looks for things in the hash table. There will be elements that have the same hashCode() and you can only tell if it's the same element or not by testing all of them using equals() against your class.
Using hashCode and equals together
In this approach, you use the object itself as the key in the hash map, and give it an appropriate equals method.
To implement the equals method you need to go deeply into all your fields. All of their classes must have equals() that matches what you think of as equal for the sake of your big calculation. Special care needs to be be taken when your objects implement an interface. If the calculation is based on calls to that interface, and different objects that implement the interface return the same value in those calls, then they should implement equals in a way that reflects that.
And their hashCode is supposed to match the equals - when the values are equal, the hashCode must be equal.
You then build your equals and hashCode based on all those items. You may use Objects.equals(Object, Object) and Objects.hashCode( Object...) to save yourself a lot of boilerplate code.
But is this a good approach?
While you can cache the result of hashCode() in the object and re-use it without calculation as long as you don't mutate it, you can't do that for equals. This means that calculation of equals is going to be lengthy.
So depending on how many times the equals() method is going to be called for each object, this is going to be exacerbated.
If, for example, you are going to have 30 objects in the hashMap, but 300,000 objects are going to come along and be compared to them only to realize that they are equal to them, you'll be making 300,000 heavy comparisons.
If you're only going to have very few instances in which an object is going to have the same hashCode or fall in the same bucket in the HashMap, requiring comparison, then going the equals() way may work well.
If you decide to go this way, you'll need to remember:
If the object is a key in a HashMap, it should not be mutated as long as it's there. If you need to mutate it, you may need to make a deep copy of it and keep the copy in the hash map. Deep copying again requires consideration of all the objects and interfaces inside to see if they are copyable at all.
Creating a unique key for each object
Back to your original idea, we have established that hashCode is not a good candidate for a key in a hash map. A better candidate for that would be a hash function such as md5 or sha1 (or more advanced hashes, like sha256, but you don't need cryptographic strength in your case), where collisions are a lot rarer than a mere int. You could take all the values in your class, transform them into a byte array, hash it with such a hash function, and take its hexadecimal string value as your map key.
Naturally, this is not a trivial calculation. So you need to think if it's really saving you much time over the calculation you are trying to avoid. It is probably going to be faster than repeatedly calling equals() to compare objects, as you do it only once per instance, with the values it had at the time of the "big calculation".
For a given instance, you could cache the result and not calculate it again unless you mutate the object. Or you could just calculate it again only just before doing the "big calculation".
However, you'll need the "cooperation" of all the objects you have inside your class. That is, they will all need to be reasonably convertible into a byte array in such a way that two equivalent objects produce the same bytes (including the same issue with the interface objects that I mentioned above).
You should also beware of situations in which you have, for example, two strings "AB" and "CD" which will give you the same result as "A" and "BCD", and then you'll end up with the same hash for two different objects.
For future readers.
Yes, equals and hashCode go hand in hand.
Below shows a typical implementation using a helper library, but it really shows the "hand in hand" nature. And the helper library from apache keeps things simpler IMHO:
#Override
public boolean equals(Object o) {
if (this == o) {
return true;
}
if (o == null || getClass() != o.getClass()) {
return false;
}
MyCustomObject castInput = (MyCustomObject) o;
boolean returnValue = new org.apache.commons.lang3.builder.EqualsBuilder()
.append(this.getPropertyOne(), castInput.getPropertyOne())
.append(this.getPropertyTwo(), castInput.getPropertyTwo())
.append(this.getPropertyThree(), castInput.getPropertyThree())
.append(this.getPropertyN(), castInput.getPropertyN())
.isEquals();
return returnValue;
}
#Override
public int hashCode() {
return new org.apache.commons.lang3.builder.HashCodeBuilder(17, 37)
.append(this.getPropertyOne())
.append(this.getPropertyTwo())
.append(this.getPropertyThree())
.append(this.getPropertyN())
.toHashCode();
}
17, 37 .. those you can pick your own values.
From your clarifications:
You want to store MyClass in an HashMap as key.
This means the hashCode() is not allowed to change after adding the object.
So if your collections may change after object instantiation, they should not be part of the hashcode().
From http://docs.oracle.com/javase/8/docs/api/java/util/Map.html
Note: great care must be exercised if mutable objects are used as map
keys. The behavior of a map is not specified if the value of an object
is changed in a manner that affects equals comparisons while the
object is a key in the map.
For 20-100 objects it is not worth that you enter the risk of an inconsistent hash() or equals() implementation.
There is no need to override hahsCode() and equals() in your case.
If you don't overide it, java takes the unique object identity for equals and hashcode() (and that works, epsecially because you stated that you don't need an equals() considering the values of the object fields).
When using the default implementation, you are on the safe side.
Making an error like using a custom hashcode() as key in the HashMap when the hashcode changes after insertion, because you used the hashcode() of the collections as part of your object hashcode may result in an extremly hard to find bug.
If you need to find out whether the heavy calculation is finished, I would not absue equals(). Just write an own method objectStateValue() and call hashcode() on the collection, too. This then does not interfere with the objects hashcode and equals().
public int objectStateValue() {
// TODO make sure the fields are not null;
return 31 * s.hashCode() + coll.hashCode();
}
Another simpler possibility: The code that does the time consuming calculation can raise an calculationCounter by one as soon as the calculation is ready. You then just check whether or not the counter has changed. this is much cheaper and simpler.
Hi I'm wondering if it is possible to access the contents of a HashSet directly if you have the Hashcode for the object you're looking for, sort of like using the HashCode as a key in a HashMap.
I imagine it might work something sort of like this:
MyObject object1 = new MyObject(1);
Set<MyObject> MyHashSet = new HashSet<MyObject>();
MyHashSet.add(object1)
int hash = object1.getHashCode
MyObject object2 = MyHashSet[hash]???
Thanks!
edit: Thanks for the answers. Okay I understand that I might be pushing the contract of HashSet a bit, but for this particular project equality is solely determined by the hashcode and I know for sure that there will be only one object per hashcode/hashbucket. The reason I was pretty reluctant to use a HashMap is because I would need to convert the primitive ints I'm mapping with to Integer objects as a HashMap only takes in objects as keys, and I'm also worried that this might affect performance. Is there anything else I could do to implement something similar with?
The common implementation of HashSet is backed (rather lazily) by a HashMap so your effort to avoid HashMap is probably defeated.
On the basis that premature optimization is the root of all evil, I suggest you use a HashMap initially and if the boxing/unboxing overhead of int to and from Integer really is a problem you'll have to implement (or find) a handcrafted HashSet using primitive ints for comparison.
The standard Java library really doesn't want to concern itself with boxing/unboxing costs.
The whole language sold that performance issue for a considerable gain in simplicity long ago.
Notice that these days (since 2004!) the language automatically boxes and unboxes which reveals a "you don't need to be worrying about this" policy. In most cases it's right.
I don't know how 'richly' featured your HashKeyedSet needs to be but a basic hash-table is really not too hard.
HashSet is internally backed by a HashMap, which is unavailable through the public API unfortunately for this question. However, we can use reflection to gain access to the internal map and then find a key with an identical hashCode:
private static <E> E getFromHashCode(final int hashcode, HashSet<E> set) throws Exception {
// reflection stuff
Field field = set.getClass().getDeclaredField("map");
field.setAccessible(true);
// get the internal map
#SuppressWarnings("unchecked")
Map<E, Object> interalMap = (Map<E, Object>) (field.get(set));
// attempt to find a key with an identical hashcode
for (E elem : interalMap.keySet()) {
if (elem.hashCode() == hashcode) return elem;
}
return null;
}
Used in an example:
HashSet<String> set = new HashSet<>();
set.add("foo"); set.add("bar"); set.add("qux");
int hashcode = "qux".hashCode();
System.out.println(getFromHashCode(hashcode, set));
Output:
qux
This is not possible as HashSet is an object and there is no public API as such. Also multiple objects can have the same hashcode but the objects can be different.
Finally only arrays can be accessed using myArray[<index>] syntax.
You can easily write code that will directly access the internal data structures of the HashSet implementation using reflection. Of course, your code will depend on the implementation details of the particular JVM you are coding to. You also will be subject to the constraints of the SecurityManager (if any).
A typical implementation of HashSet uses a HashMap as its internal data structure. The HashMap has an array, which is indexed by the key's hashcode mapped to an index in the array. The hashcode mapping function is available by calling non-public methods in the implementation - you will have to read the source code and figure it out. Once you get to the right bucket, you will just need to find (using equals) the right entry in the bucket.
The best look-up structure is a HashTable. It provides constant access on average (linear in worst case).
This depends on the hash function. Ok.
My question is the following. Assuming a good implementation of a HashTable e.g. HashMap is there a best practice concerning the keys passed in the map?I mean it is recommended that the key must be an immutable object but I was wondering if there are other recommendations.
Example the size of the key? For example in a good hashmap (in the way described above) if we used String as keys, won't the "bottleneck" be in the string comparison for equals (trying to find the key)? So should the keys be kept small? Or are there objects that should not be used as keys? E.g. a URL? In such cases how can you choose what to use as a key?
The best performing key for an HashMap is probably an Integer, where hashCode() and equals() are implemented as:
public int hashCode() {
return value;
}
public boolean equals(Object obj) {
if (obj instanceof Integer) {
return value == ((Integer)obj).intValue();
}
return false;
}
Said that, the purpose of an HashMap is to map some object (value) to some others (key). The fact that a hash function is used to address the (value) objects is to provide fast, constant-time access.
it is recommended that the key must be an immutable object but I was wondering if there are other recommendations.
The recommendation is to Map objects to what you need: don't think what is faster; but think what is the best for your business logic to address the objects to retrieve.
The important requirement is that the key object must be immutable, because if you change the key object after storing it in the Map it may be not possible to retrieve the associated value later.
The key word in HashMap is Map. Your object should just map. If you sacrifice the mapping task optimizing the key, you are defeating the purpose of the Map - without probably achieving any performance boost.
I 100% agree with the first two comments in your question:
the major constraint is that it has to be the thing that you want to base the lookup on ;)
– Oli Charlesworth
The general rule is to use as the key whatever you need to look up with.
– Louis Wasserman
Remember the two rules for optimization:
Don't.
(for experts only) don't yet.
The third rule is: profile before to optimize.
You should use whatever key you want to use to lookup things in the data structure, it's typically a domain-specific constraint. With that said, keep in mind that both hashCode() and equals() will be used in finding a key in the table.
hashCode() is used to find the position of the key, while equals() is used to determine if the key you are searching for is actually the key that we just found using hashCode().
For example, consider two keys a and b that have the same hash code in a table using separate chaining. Then a search for a would require testing if a.equals(key) for potentially both a and b in the table once we find the index of the list containing a and b from hashCode().
it is recommended that the key must be an immutable object but I was wondering if there are other recommendations.
The key of the value should be final.
Most times a field of the object is used as key. If that field changes then the map cannot find it:
void foo(Employee e) {
map.put(e.getId(), e);
String newId = e.getId() + "new";
e.setId(newId);
Employee e2 = e.get(newId);
// e != e2 !
}
So Employee should not have a setId() method at all, but that is difficult because when you are writing Employee you don't know what it will be keyed by.
I digged up the implementation. I had an assumption that the effectiveness of the hashCode() method will be the key factor.
When I looked into the HashMap() and the Hashtable() implementation, I found that the implementation is quite similar (with one exception). Both are using and storing an internal hash code for all entries, so that's a good point that hashCode() is not so heavily influencing the performance.
Both are having a number of buckets, where the values are stored. It is important balance between the number of buckets (say n), and the average number of keys within a bucket (say k). The bucket is found in O(1) time, the content of the bucket is iterated in O(k) size, but the more bucket we have, the more memory will be allocated. Also, if many buckets are empty, it means that the hashCode() method for the key class does not the hashcode wide enough.
The algorithm works like this:
Take the `hashCode()` of the Key (and make a slight bijective transformation on it)
Find the appropriate bucket
Loop through the content of the bucket (which is some kind of LinkedList)
Make the comparison of the keys as follows:
1. Compare the hashcodes
(it is calculated in the first step, and stored for the entry)
2. Examine if key `==` the stored key (still no call)
(this step is missing from Hashtable)
3. Compare the keys by `key.equals(storedKey)`
To summarize:
hashCode() is called once per call (this is a must, you cannot do
without it)
equals() is called if the hashCode is not so well spread, and two keys happen to have the same hashcode
The same algorithm is for get() and put() (because in put() case you can set the value for an existing key). So, the most important thing is how the hashCode() method was implemented. That is the most frequently called method.
Two strategies are: make it fast and make it effective (well-spread). The JDK developers made efforts to make it both, but it's not always possible to have them both.
Numeric types are good
Object (and non-overriden classes) are good (hashCode() is native), except that you cannot specify an own equals()
String is not good, iterates through the characters, but caches after that (see my comment below)
Any class with synchronized hashCode() is not good
Any class that has an iteration is not good
Classes that have hashcode cache are a bit better (depends on the usage)
Comment on the String: To make it fast, in the first versions of JDK the String hash code calculation was made for the first 32 characters only. But the hashcode it produced was not well spread, so they decided to take all the characters into the hashcode.
I've been working all day and I somehow can't get this probably easy task figured out - probably a lack of coffee...
I have a synchronizedList where some Objects are being stored. Those objects have a field which is something like an ID. These objects carry information about a user and his current state (simplified).
The point is, that I only want one object for each user. So when the state of this user changes, I'd like to remove the "old" entry and store a new one in the List.
protected static class Objects{
...
long time;
Object ID;
...
}
...
if (Objects.contains(ID)) {
Objects.remove(ID);
Objects.add(newObject);
} else {
Objects.add(newObject);
}
Obviously this is not the way to go but should illustrate what I'm looking for...
Maybe the data structure is not the best for this purpose but any help is welcome!
EDIT:
Added some information...
A Set does not really seem to fit my purpose. The Objects store some other fields besides the ID which change all the time. The purpose is, that the list will somehow represent the latest activities of a user. I only need to track the last state and only keep that object which describes this situation.
I think I will try out re-arranging my code with a Map and see if that works...
You could use a HashMap (or LinkedHashMap/TreeMap if order is important) with a key of ID and a value of Objects. With generics that would be HashMap<Object, Objects>();
Then you can use
if (map.containsKey(ID)) {
map.remove(ID);
}
map.put(newID, newObject);
Alternatively, you could continue to use a List, but we can't just modify the collection while iterating, so instead we can use an iterator to remove the existing item, and then add the new item outside the loop (now that you're sure the old item is gone):
List<Objects> syncList = ...
for (Iterator<Objects> iterator = syncList.iterator(); iterator.hasNext();) {
Objects current = iterator.next();
if (current.getID().equals(ID)) {
iterator.remove();
}
}
syncList.add(newObject);
And you can't use a Set to have only the first one stored ?
because it basically is precisely what you require.
You could use a HashSet to store the objects and then override the hashCode method in the class that the HashSet will contain to return the hashcode of your identifying field.
A Map is easiest, but a Set reflects your logic better. In that case I'd advice a Set.
There are 2 ways to use a set, depending on the equals and hashCode of your data object.
If YourObject already uses the ID object to determine equals (and hashCode obeys the contract) you can use any Set you want, a HashSet is probably best then.
If YourObjects business logic requires a different equals, taking into account multiple fields beside the ID field, then a custom comparator should be used. A TreeSet is a Set which can use such a Comparator.
An example:
Comparator<MyObject> comp = new Comparator<MyObject>{
public int compare(MyObject o1, MyObject o2) {
// NOTE this compare is not very good as it obeys the contract but
// is not consistent with equals. compare() == 0 -> equals() != true here
// Better to use some more fields
return o1.getId().hashCode() < o2.getId().hashCode();
}
public boolean equals(Object other) {
return 01.getId().equals(o2.getId());
}
}
Set<MyObject> myObjects = new TreeSet(comp);
EDIT
I have updated the code above to reflect that id is not an int, as suggested by the question.
My first option would be a HashSet, this would require that you override the hashCode and equals methods (don't forget: if you override one, override consistently the other !) so that objects with the same ID field are considered equal.
But this might break something if this assumption is NOT to be made in other parts of your application. In that case you might opt for using a HashMap (with the ID as key) or implement your own MyHashSet class (backed by such a HashMap).