Accesing hidden getEntry(Object key) in HashMap - java

I have similar problem to one discussed here, but with stronger practical usage.
For example, I have a Map<String, Integer>, and I have some function, which is given a key and in case the mapped integer value is negative, puts NULL to the map:
Map<String, Integer> map = new HashMap<String, Integer>();
public void nullifyIfNegative(String key) {
Integer value = map.get(key);
if (value != null && value.intValue() < 0) {
map.put(key, null);
}
}
I this case, the lookup (and hence, hashCode calculation for the key) is done twice: one for lookup and one for replacement. It would be nice to have another method (which is already in HashMap) and allows to make this more effective:
public void nullifyIfNegative(String key) {
Map.Entry<String, Integer> entry = map.getEntry(key);
if (entry != null && entry.getValue().intValue() < 0) {
entry.setValue(null);
}
}
The same concerns cases, when you want to manipulate immutable objects, which can be map values:
Map<String, String>: I want to append something to the string value.
Map<String, int[]>: I want to insert a number into the array.
So the case is quite common. Solutions, which might work, but not for me:
Reflection. Is good, but I cannot sacrifice performance just for this nice feature.
Use org.apache.commons.collections.map.AbstractHashedMap (it has at least protected getEntry() method), but unfortunately, commons-collections do not support generics.
Use generic commons-collections, but this library (AFAIK) is out-of-date (not in sync with latest library version from Apache), and (what is critical) is not available in central maven repository.
Use value wrappers, which means "making values mutable" (e.g. use mutable integers [e.g. org.apache.commons.lang.mutable.MutableInt], or collections instead of arrays). This solutions leads to memory loss, which I would like to avoid.
Try to extend java.util.HashMap with custom class implementation (which should be in java.util package) and put it to endorsed folder (as java.lang.ClassLoader will refuse to load it in Class<?> defineClass(String name, byte[] b, int off, int len), see sources), but I don't want to patch JDK and it seems like the list of packages that can be endorsed, does not include java.util.
The similar question is already raised on sun.com bugtracker, but I would like to know, what is the opinion of the community and what can be the way out taking in mind the maximum memory & performance effectiveness.
If you agree, this is nice and beneficiary functionality, please, vote this bug!

As a logical matter, you're right in that the single getEntry would save you a hash lookup. As a practical matter, unless you have a specific use case where you have reason to be concerned about the performance hit( which seems pretty unlikely, hash lookup is common, O(1), and well optimized) what you're worrying about is probably negligible.
Why don't you write a test? Create a hashtable with a few 10's of millions of objects, or whatever's an order of magnitude greater than what your application is likely to create, and average the time of a get() over a million or so iterations (hint: it's going to be a very small number).
A bigger issue with what you're doing is synchronization. You should be aware that if you're doing conditional alterations on a map you could run into issues, even if you're using a Synchronized map, as you'd have to lock access to the key covering the span of both the get() and set() operations.

Not pretty, but you could use lightweight object to hold a reference to the actual value to avoid second lookups.
HashMap<String, String[]> map = ...;
// append value to the current value of key
String key = "key";
String value = "value";
// I use an array to hold a reference - even uglier than the whole idea itself ;)
String[] ref = new String[1]; // lightweigt object
String[] prev = map.put(key, ref);
ref[0] = (prev != null) ? prev[0] + value : value;
I wouldn't worry about hash lookup performance too much though (Steve B's answer is pretty good in pointing out why). Especially with String keys, I wouldn't worry too much about hashCode() as its result is cached. You could worry about equals() though as it might be called more than once per lookup. But for short strings (which are often used as keys) this is negligible too.

There are no performance gain from this proposal, because performance of Map in average case is O(1). But enabling access to the raw Entry in such case will raise another problem. It will be possible to change key in entry (even if it's only possible via reflection) and therefore break order of the internal array.

Related

Java: one of many keys map

I got a Map, which may contain one of the following Keys
Map<String, String> map = getMap();
I now want to check if one of some Keys are set. My current approach is to chain multiple map.getOrDefault(...)
Address address = new Address();
address.setStreet(map.getOrDefault("STORE_STREET"
, map.getOrDefault("OFFICE_STREET", ...));
or check for each key if it exists in the map.
if(map.containsKey("STORE_STREET")){
address.setStreet(map.get("STORE_STREET"));
}else if(map.containsKey("OFFICE_STREET")){
address.setStreet(map.get("OFFICE_STREET"));
}
Is there any way to make this easier/better to read? Unfortunately the map is given as such.
Normally, getOrDefault would be the way to go, but if you have multiple alternative keys, this does not only affect readability, but also turn the performance advantage into the opposite. With code like:
address.setStreet(map.getOrDefault("STORE_STREET", map.getOrDefault("OFFICE_STREET", ...));
You are looking up the alternative keys first, to get the fall-back value, before even looking whether the primary key (or a key with a higher precedence) is present.
One solution would be
Stream.of("STORE_STREET", "OFFICE_STREET", ...)
.map(map::get)
.filter(Objects::nonNull)
.findFirst()
.ifPresent(address::setStreet);
When executing this a single time, its performance might be less than a simple loop, due to the higher initialization overhead, however, the performance difference would be irrelevant then. For frequent execution, there will be no significant difference, so you should decide based on the readability (which is subjective, of course).
String []keys = {"STORE_STREET", "OFFICE_STREET", ...};
for (String k : keys)
{
if (map.containsKey(k))
return map.get(k);
}
return ""; // or throw an exception

How to make an efficient hashCode?

I have three hashCode methods as follows, I prioritised them based on their efficiency. I am wondering if there is any other way to make a more efficient hashCode method.
1) public int hashCode() { //terrible
return 5;
}
2) public int hashCode() { //a bit less terrible
return name.length;
}
3) public int hashCode() { //better
final int prime = 31;
int result = 1;
result = prime * result + ((name == null) ? 0 : name.hashCode());
return result;
}
There is no surefire way to guarantee that your hashcode function is optimal because it is measured by two different metrics.
Efficiency - How quick it is to calculate.
Collisions - What is the chance of collision.
Your:
Maximises efficiency at the expense of collisions.
Finds a spot somwhere in the middle - but still not good.
Least efficient but best for avoiding collisions - still not necessarily best.
You have to find the balance yourself.
Sometimes it is obvious when there is a very efficient method that never collides (e.g. the ordinal of an enum).
Sometimes memoising the values is a good solution - this way even a very inefficient method can be mitigated because it is only ever calculated once. There is an obvious emeory cost to this which also must be balanced.
Sometimes the overall functionality of your code contributes to your choice. Say you want to put File objects in a HashMap. A number of options are clear:
Use the hashcode of the file name.
Use the hashcode of the file path.
Use a crc of the contents of the file.
Use the hashcode of the SHA1 digest of the contents of the file.
Why collisions are bad
One of the main uses of hashcode is when inserting objects into a HashMap. The algorithm requests a hash code from the object and uses that to decide which bucket to put the object in. If the hash collides with another object there will be another object in that bucket, in which case the bucket will have to grow which costs time. If all hashes are unique then the map will be one item per bucket and thus maximally efficient.
See the excellent WikiPedia article on Hash Table for a deeper discussion on how HashMap works.
I prioritised them based on their efficiency
Your list is sorted by ascending efficiency—if by "efficiency" you mean the performance of your application as opposed to the latency of the hashCode method isolated from everything else. A hashcode with bad dispersion will result in a linear or near-linear search through a linked list inside HashMap, completely annulling the advantages of a hashtable.
Especially note that, on today's architectures, computation is much cheaper than pointer dereference, and it comes at a fixed low cost. A single cache miss is worth a thousand simple arithmetic operations and each pointer dereference is a potential cache miss.
In addition to the valuable answers so far, I'd like to add some other methods to consider:
3a):
public int hashCode() {
return Objects.hashCode(name);
}
Not many pros/cons in terms of performance, but a bit more concise.
4.) You should either provide more information about the class that you are talking about, or reconsider your design. But using a class as the key of a hash map when the only property of this class is a String, then you might also be able to just use the String directly. So option 4 is:
// Changing this...
Map<Key, Value> map;
map.put(key, value);
Value value = map.get(key);
// ... to this:
Map<String, Value> map;
map.put(key.getName(), value);
Value value = map.get(key.getName());
(And if this is not possible, because the "name" of a Key might change after it has been created, you're in bigger trouble anyhow - see the next point)
5.) Maybe you can precompute the hash code. In fact, this is also done in the java.lang.String class:
public final class String
implements java.io.Serializable, Comparable<String>, CharSequence {
...
/** Cache the hash code for the string */
private int hash; // Default to 0
But of course, this only makes sense for immutable classes. You should be aware of the fact that using mutable classes as keys of a Map is "dangerous" and may lead to consistency errors, and should only be done when you're absolutely sure that the instances that are used as keys won't change.
So if you want to use your class as the keys, and maybe your class even has more fields than just a single one, then you could store the hash code as a field:
class Key
{
private final String name;
... // Other fields...
private final int hashCode;
Key(String name, ...)
{
this.name = name;
... // Other fields
// Pre-compute and store the hash code:
this.hashCode = computeHashCode();
}
private int computeHashCode()
{
int result = 31;
result = 31 * result + Objects.hashCode(name);
result = 31 * result + ... // Other fields
return result;
}
}
My answer is going a different path - basically it is not answer, but a question: why do you worry about the performance of hashCode()?
Did you exhaustive profiling of your application and found that there is a performance problem originating from that one method on some of your objects?
If the answer to that question is "no" ... then - why do you think you need to worry about this one method? Why do you think that the default, generated by eclipse, probably used billions of times each day ... isn't good enough for you?
See here for explanations why it is in general a very bad idea to waste ones time with such questions.
Yes, there are better alternatives.
xxHash or MurmurHash3 are general-purpose hashing algorithms that are both faster and better in quality.

Accessing a HashSet using the HashCode directly? (Java)

Hi I'm wondering if it is possible to access the contents of a HashSet directly if you have the Hashcode for the object you're looking for, sort of like using the HashCode as a key in a HashMap.
I imagine it might work something sort of like this:
MyObject object1 = new MyObject(1);
Set<MyObject> MyHashSet = new HashSet<MyObject>();
MyHashSet.add(object1)
int hash = object1.getHashCode
MyObject object2 = MyHashSet[hash]???
Thanks!
edit: Thanks for the answers. Okay I understand that I might be pushing the contract of HashSet a bit, but for this particular project equality is solely determined by the hashcode and I know for sure that there will be only one object per hashcode/hashbucket. The reason I was pretty reluctant to use a HashMap is because I would need to convert the primitive ints I'm mapping with to Integer objects as a HashMap only takes in objects as keys, and I'm also worried that this might affect performance. Is there anything else I could do to implement something similar with?
The common implementation of HashSet is backed (rather lazily) by a HashMap so your effort to avoid HashMap is probably defeated.
On the basis that premature optimization is the root of all evil, I suggest you use a HashMap initially and if the boxing/unboxing overhead of int to and from Integer really is a problem you'll have to implement (or find) a handcrafted HashSet using primitive ints for comparison.
The standard Java library really doesn't want to concern itself with boxing/unboxing costs.
The whole language sold that performance issue for a considerable gain in simplicity long ago.
Notice that these days (since 2004!) the language automatically boxes and unboxes which reveals a "you don't need to be worrying about this" policy. In most cases it's right.
I don't know how 'richly' featured your HashKeyedSet needs to be but a basic hash-table is really not too hard.
HashSet is internally backed by a HashMap, which is unavailable through the public API unfortunately for this question. However, we can use reflection to gain access to the internal map and then find a key with an identical hashCode:
private static <E> E getFromHashCode(final int hashcode, HashSet<E> set) throws Exception {
// reflection stuff
Field field = set.getClass().getDeclaredField("map");
field.setAccessible(true);
// get the internal map
#SuppressWarnings("unchecked")
Map<E, Object> interalMap = (Map<E, Object>) (field.get(set));
// attempt to find a key with an identical hashcode
for (E elem : interalMap.keySet()) {
if (elem.hashCode() == hashcode) return elem;
}
return null;
}
Used in an example:
HashSet<String> set = new HashSet<>();
set.add("foo"); set.add("bar"); set.add("qux");
int hashcode = "qux".hashCode();
System.out.println(getFromHashCode(hashcode, set));
Output:
qux
This is not possible as HashSet is an object and there is no public API as such. Also multiple objects can have the same hashcode but the objects can be different.
Finally only arrays can be accessed using myArray[<index>] syntax.
You can easily write code that will directly access the internal data structures of the HashSet implementation using reflection. Of course, your code will depend on the implementation details of the particular JVM you are coding to. You also will be subject to the constraints of the SecurityManager (if any).
A typical implementation of HashSet uses a HashMap as its internal data structure. The HashMap has an array, which is indexed by the key's hashcode mapped to an index in the array. The hashcode mapping function is available by calling non-public methods in the implementation - you will have to read the source code and figure it out. Once you get to the right bucket, you will just need to find (using equals) the right entry in the bucket.

Data lookup method for small data set with Java?

We have to lookup some data based on three input data fields. The lookup has to be fast. There are only about 20 possible lookup combinations. We've implemented this using a static HashMap instance where we create a key by concatinating the three data fields. Is there a better way to do this or is this the way to go? Code is below.
Update: I'm not implying that this code is slow. Just curious if there is a better way to do this. I thought there might be a more elegant solution but I'm happy to keep this in place if there are no compelling alternatives!
Create class level static HashMap instance:
private static HashMap map = new HashMap();
How we load data into memory:
private void load(Iterator iterator) {
while (iterator.next()) {
Object o = it.next();
key = o.getField1() + "-" + o.getField2() + "-" o.getField3();
map.put(key, o.getData());
}
}
And how we look up the data based on the three fields:
private Stirng getData(String f1, String f2, String f3) {
String key = f1 + "-" + f2 + "-" f3;
return map.get(key);
}
Well, the question to ask yourself is of course "is it fast enough?" Because unless your application needs to be speedier and this is the bottleneck, it really doesn't matter. What you've got is already reasonably efficient.
That being said, if you want to squeeze every bit of speed possible out of this routine (without rewriting it in assembly language ;-) you might consider using an array instead of a HashMap, since there are only a small, limited number of keys. You'd have to develop some sort of hash function that hashes each object to a unique number between 0 and 19 (or however many elements you actually have). You may also be able to optimize the implementation of that hash function, although I couldn't tell you how exactly to do that without knowing the details of the objects you're working with.
You could create a special key object having three String fields to avoid building up the key string:
class MapKey {
public final String k1;
public final String k2;
public final String k3;
public MapKey(String k1, String k2, String k3) {
this.k1 = k1; this.k2 = k2; this.k3 = k3;
}
public MapKey(Object o) {
this.k1 = o.getField1(); this.k2 = o.getField2(); this.k3 = o.getField3();
}
public int hashCode() {
return k1.hashCode(); // if k1 is likely to be the same, also add hashes from k2 and k3
}
}
In your case I would keep using the implementation you outlined. For a large list of constant keys mapping to constant data, you could use Minimal Perfect Hashing. As it is not trivial to code this, and I am not sure about existing libraries, you have to consider the implementation cost before using this.
I think your approach is pretty fast. Any gains by implementing your own hashing algorithm would be very small, especially compared to the effort required.
One remark about your key format. You better make sure that your separator cannot occur in the field toString() values, otherwise you might get key collisions:
field1="a-", field2="b-", field3="c" -> key="a--b--c"
field1="a", field2="-b", field3="-c" -> key="a--b--c"
Concatenating strings is a bad idea for creating a key. My main object is that it is unclear. But in practice a significant proportion of implementations have bugs, notably that the separator can actually occur in the strings. In terms of performance, I have seen a program speed up ten percent simply by changing the key for a string hack to a meaningful key object. (If you really must be lazy about code, you can use Arrays.asList to make the key - see List.equals API doc.)
Since you only have 20 combinations it might be feasible to handcraft a "give me the index 1..20 of this combination" based on knowing the characteristics of each combination.
Are you in a position to list the exact list of combinations?
Another way to get this done is to create an Object to handle your key, with which you can override equals() (and hashCode()) to do a test against an incomming key, testing field1, field2 and field3 in turn.
EDIT (in response to comment):
As the value returned from hashCode() is used by your Map to put your keys into buckets, (from which it then will test equals), the value could theoretically be the same for all keys. I wouldn't suggest doing that, however, as you would not reap the benefits of HashMaps performance. You would essentially be iterating over all of your items in a bucket and testing equals().
One approach you could take would be to delegate the call to hashCode() to one of the values in your key container. You could always return the hashCode from field3, for example. In this case, you will distribute your keys to potentially as many buckets as there are distinct values for field3. Once your HashMap finds the bucket, it will still need to iterate over the items in the bucket to test the result of equals() until it finds a match.
You could create would be the sum of the values returned by hashCode() on all of your fields. As just discussed, this value does not need to be unique. Further, the potential for collision, and therefore larger buckets, is much smaller. With that in mind, your lookups on the HashMap should be quicker.
EDIT2:
the question of a good hash code for this key has been answered in a separate question here

What's the quickest way to remove an element from a Map by value in Java?

What's the quickest way to remove an element from a Map by value in Java?
Currently I'm using:
DomainObj valueToRemove = new DomainObj();
String removalKey = null;
for (Map.Entry<String, DomainObj> entry : map.entrySet()) {
if (valueToRemove.equals(entry.getValue())) {
removalKey = entry.getKey();
break;
}
}
if (removalKey != null) {
map.remove(removalKey);
}
The correct and fast one-liner would actually be:
while (map.values().remove(valueObject));
Kind of strange that most examples above assume the valueObject to be unique.
Here's the one-line solution:
map.values().remove(valueToRemove);
That's probably faster than defining your own iterator, since the JDK collection code has been significantly optimized.
As others have mentioned, a bimap will have faster value removes, though it requires more memory and takes longer to populate. Also, a bimap only works when the values are unique, which may or may not be the case in your code.
Without using a Bi-directional map (commons-collections and google collections have them), you're stuck with iterating the Map
map.values().removeAll(Collections.singleton(null));
reference to How to filter "Null" values from HashMap<String, String>?, we can do following for java 8:
map.values().removeIf(valueToRemove::equals);
If you don't have a reverse map, I'd go for an iterator.
DomainObj valueToRemove = new DomainObj();
for (
Iterator<Map.Entry<String, DomainObj>> iter = map.entrySet().iterator();
iter.hasNext();
) {
Map.Entry<String, DomainObj> entry = iter.next();
if (valueToRemove.equals(entry.getValue())) {
iter.remove();
break; // if only want to remove first match.
}
}
You could always use the values collection, since any changes made to that collection will result in the change being reflected in the map. So if you were to call Map.values().remove(valueToRemove) that should work - though I'm not sure if you'll see performance better than what you have with that loop. One idea would be to extend or override the map class such that the backing collection then is always sorted by value - that would enable you to do a binary search on the value which may be faster.
Edit: This is essentially the same as Alcon's answer except I don't think his will work since the entrySet is still going to be ordered by key - in which case you can't call .remove() with the value.
This is also assuming that the value is supposed to be unique or that you would want to remove any duplicates from the Map as well.
i would use this
Map x = new HashMap();
x.put(1, "value1");
x.put(2, "value2");
x.put(3, "value3");
x.put(4, "value4");
x.put(5, "value5");
x.put(6, "value6");
x.values().remove("value4");
edit:
because objects are referenced by "pointer" not by value.
N
If you have no way to figure out the key from the DomainObj, then I don't see how you can improve on that. There's no built in method to get the key from the value, so you have to iterate through the map.
If this is something you're doing all the time, you might maintain two maps (string->DomainObj and DomainObj->Key).
Like most of the other posters have said, it's generally an O(N) operation because you're going to have to look through the whole list of hashtable values regardless. #tackline has the right solution for keeping the memory usage at O(1) (I gave him an up-vote for that).
Your other option is to sacrifice memory space for the sake of speed. If your map is reasonably sized, you could store two maps in parallel.
If you have a Map then maintain a Map in parallel to it. When you insert/remove on one map, do it on the other also. Granted this is uglier because you're wasting space and you'll have to make sure the "hashCode" method of DomainObj is written properly, but your removal time drops from O(N) to O(1) because you can lookup the key/object mapping in constant time either direction.
Not generally the best solution, but if your number one concern is speed, I think this is probably as fast as you're gonna get.
====================
Addendum: This essentially what #msaeed suggested just sans the third party library.
A shorter usage of iterator is to use a values() iterator.
DomainObj valueToRemove = new DomainObj();
for (Iterator<DomainObj> it = map.values().iterator(); it.hasNext();)) {
if (valueToRemove.equals(it.next())) {
it.remove();
break;
}
}
We know this situation arise rarely but is extremely helpful. I'll prefer BidiMap from org.apache.commons.collections .
I don't think this will happen only once in the lifetime of your app.
So what I would do, is to delegate to another object the responsability to maintain a reference to the objects added to that map.
So the next time you need to remove it, you use that "reverse map" ...
class MapHolder {
private Map<String, DomainObj> originalMap;
private Map<DomainObj,String> reverseMap;
public void remove( DomainObj value ) {
if ( reverseMap.contains( value ) ) {
originalMap.remove( reverseMap.get( value ) );
reverseMap.remove( value );
}
}
}
This is much much faster than iterating.
Obviously you need to keep them synchronized. But it should not be that hard if you refector your code to have one object being responsible for the state of the map.
Remember that in OOP we have objects that have an state and behavior. If your data is passing around variables all over the place, you are creating unnecessary dependencies between objects
Yes, It will take you some time to correct the code, but the time spent correcting it, will save you a lot of headaches in the future. Think about it.

Categories