Confused about HashMap in Java8 - java

I have a hashmap that takes String and HashSet as key and values.
I am trying to update the map and add values in it.
I cannot understand which of the following methods to use-
map.putIfAbsent(str.substring(i,j),new HashSet<String>).add(str);
//this method gives nullpointerexception
map.computeIfPresent(str.substring(i,j),(k,v)->v).add(str);
In the output I can see the same key being added twice with an initial value and updated value.
Someone please tell me how to use these methods.

The preferable way to do it is with Map#computeIfAbsent. This way a new HashSet is not created unnecessarily, and it will return the value afterwards.
map.computeIfAbsent(str.substring(i, j), k -> new HashSet<>()).add(str);

There is no reason to choose between putIfAbsent and computeIfPresent. Most notably, computeIfPresent in entirely inappropriate as it, as its name suggests, only computes a new value, when there is already an old one, and (k,v)->v even makes this computation a no-op.
There are several options
containsKey, put and get. This is the most popular pre-Java 8 one, though its the most inefficient of this list, as it incorporates up to three hash lookups for the same key
String key=str.substring(i, j);
if(!map.containsKey(key))
map.put(key, new HashSet<>());
map.get(key).add(str);
get and put. Better than the first one, though it still may incorporate two lookups. For ordinary Maps, this was the best choice before Java 8:
String key=str.substring(i, j);
Set<String> set=map.get(key);
if(set==null)
map.put(key, set=new HashSet<>());
set.add(str);
putIfAbsent. Before Java 8, this option was only available to ConcurrentMaps.
String key=str.substring(i, j);
Set<String> set=new HashSet<>(), old=map.putIfAbsent(key, set);
(old!=null? old: set).add(str);
This only bears one hash lookup, but needs the unconditional creation of a new HashSet, even if we don’t need it. Here, it might be worth to perform a get first to defer the creation, especially when using a ConcurrentMap, as the get can be performed lock-free and may make the subsequent more expensive putIfAbsent unnecessary.
On the other hand, it must be emphasized, that this construct is not thread-safe, as the manipulation of the value Set is not guarded by anything.
computeIfAbsent. This Java 8 method allows the most concise and most efficient operation:
map.computeIfAbsent(str.substring(i, j), k -> new HashSet<>()).add(str);
This will only evaluate the function, if there is no old value, and unlike putIfAbsent, this method returns the new value, if there was no old value, in other words, it returns the right Set in either case, so we can directly add to it. Still, the add operation is performed outside the Map operation, so there’s no thread safety, even if the Map is thread safe. But for ordinary Maps, i.e. if thread safety is not a concern, this is the most efficient variant.
compute. This Java 8 method will always evaluate the function and can be used in two ways. The first one
map.compute(str.substring(i, j), (k,v) -> v==null? new HashSet<>(): v).add(str);
is just a more verbose variant of computeIfAbsent. The second
map.compute(str.substring(i, j), (k,v) -> {
if(v==null) v=new HashSet<>();
v.add(str);
return v;
});
will perform the Set update under the Map’s thread safety policy, so in case of ConcurrentHashMap, this will be a thread safe update, so using compute instead of computeIfAbsent has a valid use case when thread safety is a concern.

Related

Java: one of many keys map

I got a Map, which may contain one of the following Keys
Map<String, String> map = getMap();
I now want to check if one of some Keys are set. My current approach is to chain multiple map.getOrDefault(...)
Address address = new Address();
address.setStreet(map.getOrDefault("STORE_STREET"
, map.getOrDefault("OFFICE_STREET", ...));
or check for each key if it exists in the map.
if(map.containsKey("STORE_STREET")){
address.setStreet(map.get("STORE_STREET"));
}else if(map.containsKey("OFFICE_STREET")){
address.setStreet(map.get("OFFICE_STREET"));
}
Is there any way to make this easier/better to read? Unfortunately the map is given as such.
Normally, getOrDefault would be the way to go, but if you have multiple alternative keys, this does not only affect readability, but also turn the performance advantage into the opposite. With code like:
address.setStreet(map.getOrDefault("STORE_STREET", map.getOrDefault("OFFICE_STREET", ...));
You are looking up the alternative keys first, to get the fall-back value, before even looking whether the primary key (or a key with a higher precedence) is present.
One solution would be
Stream.of("STORE_STREET", "OFFICE_STREET", ...)
.map(map::get)
.filter(Objects::nonNull)
.findFirst()
.ifPresent(address::setStreet);
When executing this a single time, its performance might be less than a simple loop, due to the higher initialization overhead, however, the performance difference would be irrelevant then. For frequent execution, there will be no significant difference, so you should decide based on the readability (which is subjective, of course).
String []keys = {"STORE_STREET", "OFFICE_STREET", ...};
for (String k : keys)
{
if (map.containsKey(k))
return map.get(k);
}
return ""; // or throw an exception

Not thread safe methods of CuncurrentSkipListMap in Java

In my Java project I need to use TreeMap in multihreaded way. I found that ConcurrentSkipListMap is what that I need but some methods are not thread safe. One of them - containsKey(Object key). What is a typical solution for using this methods in multhreded way? In my program I need put key that will not replace old and if it's impossible I will be putting another key while will not get unique key. What construction should use instead containsKey as i can't lost information?
If you are worried about containsKey results going stale before you can act on them, or about this warning in the javadoc:
Additionally, the bulk operations putAll, equals, toArray, containsValue, and clear are not guaranteed to be performed atomically. For example, an iterator operating concurrently with a putAll operation might view only some of the added elements.
there are methods defined on ConcurrentSkipListMap that you can use instead. For instance, see putIfAbsent:
If the specified key is not already associated with a value, associate it with the given value. This is equivalent to
if (!map.containsKey(key))
return map.put(key, value);
else
return map.get(key);
except that the action is performed atomically.
Also see the methods remove and replace.

HashMap key problems

I'm profiling some old java code and it appears that my caching of values using a static HashMap and a access method does not work.
Caching code (a bit abstracted):
static HashMap<Key, Value> cache = new HashMap<Key, Value>();
public static Value getValue(Key key){
System.out.println("cache size="+ cache.size());
if (cache.containsKey(key)) {
System.out.println("cache hit");
return cache.get(key);
} else {
System.out.println("no cache hit");
Value value = calcValue();
cache.put(key, value);
return value;
}
}
Profiling code:
for (int i = 0; i < 100; i++)
{
getValue(new Key());
}
Result output:
cache size=0
no cache hit
(..)
cache size=99
no cache hit
It looked like a standard error in Key's hashing code or equals code.
However:
new Key().hashcode == new Key().hashcode // TRUE
new Key().equals(new Key()) // TRUE
What's especially weird is that cache.put(key, value) just adds another value to the hashmap, instead of replacing the current one.
So, I don't really get what's going on here. Am I doing something wrong?
edit
Ok, I see that in the real code the Key gets used in other methods and changes, which therefore get's reflected in the hashCode of the object in the HashMap. Could that be the cause of this behaviour, that it goes missing?
On a proper #Override of equals/hashCode
I'm not convinced that you #Override (you are using the annotation, right?) hashCode/equals properly. If you didn't use #Override, you may have defined int hashcode(), or boolean equals(Key), neither of which would do what is required.
On key mutation
If you are mutating the keys of the map, then yes, trouble will ensue. From the documentation:
Note: great care must be exercised if mutable objects are used as map keys. The behavior of a map is not specified if the value of an object is changed in a manner that affects equals comparisons while the object is a key in the map.
Here's an example:
Map<List<Integer>,String> map =
new HashMap<List<Integer>,String>();
List<Integer> theOneKey = new ArrayList<Integer>();
map.put(theOneKey, "theOneValue");
System.out.println(map.containsKey(theOneKey)); // prints "true"
theOneKey.add(42);
System.out.println(map.containsKey(theOneKey)); // prints "false"
By the way, prefer interfaces to implementation classes in type declarations. Here's a quote from Effective Java 2nd Edition: Item 52: Refer objects by their interfaces
[...] you should favor the use of interfaces rather than classes to refer to objects. If appropriate interface types exist, then parameters, return values, variables, and fields should all be declared using interface types.
In this case, if at all possible, you should declare cache as simply a Map instead of a HashMap.
I'd recommend double and triple checking the equals and hashCode methods. Note that it's hashCode, not hashcode.
Looking at the (abstracted) code, everything seems to be in order. It may be that the actual code is not like your redacted version, and that this is more a reflection of how you expect the code to work and not what is happening in practice!
If you can post the code, please do that. In the meantime, here are some pointers to try:
After adding a Key, use exactly the same Key instance again, and verify that it produces a cache hit.
In your test, verify the hashcodes are equal, and that the objects are equal.
Is the Map implementation really a HashMap? WeakHashMap will behave in the way you describe once the keys are no longer reachable.
I'm not sure what your Key class is, but (abstractly similarly to you) what I'd do for a simple check is:
Key k1 = new Key();
Key k2 = new Key();
System.out.println("k1 hash:" + k1.hashcode);
System.out.println("k2 hash:" + k2.hashcode);
System.out.println("ks equal:" + k1.equals(k2));
getValue(k1);
getValue(k2);
if this code shows the anomaly -- same hashcode, equal keys, yet no cache yet -- then there's cause to worry (or, better, debug your Key class;-). The way you're testing, with new Keys all the time, might produce keys that don't necessarily behave the same way.

What's the quickest way to remove an element from a Map by value in Java?

What's the quickest way to remove an element from a Map by value in Java?
Currently I'm using:
DomainObj valueToRemove = new DomainObj();
String removalKey = null;
for (Map.Entry<String, DomainObj> entry : map.entrySet()) {
if (valueToRemove.equals(entry.getValue())) {
removalKey = entry.getKey();
break;
}
}
if (removalKey != null) {
map.remove(removalKey);
}
The correct and fast one-liner would actually be:
while (map.values().remove(valueObject));
Kind of strange that most examples above assume the valueObject to be unique.
Here's the one-line solution:
map.values().remove(valueToRemove);
That's probably faster than defining your own iterator, since the JDK collection code has been significantly optimized.
As others have mentioned, a bimap will have faster value removes, though it requires more memory and takes longer to populate. Also, a bimap only works when the values are unique, which may or may not be the case in your code.
Without using a Bi-directional map (commons-collections and google collections have them), you're stuck with iterating the Map
map.values().removeAll(Collections.singleton(null));
reference to How to filter "Null" values from HashMap<String, String>?, we can do following for java 8:
map.values().removeIf(valueToRemove::equals);
If you don't have a reverse map, I'd go for an iterator.
DomainObj valueToRemove = new DomainObj();
for (
Iterator<Map.Entry<String, DomainObj>> iter = map.entrySet().iterator();
iter.hasNext();
) {
Map.Entry<String, DomainObj> entry = iter.next();
if (valueToRemove.equals(entry.getValue())) {
iter.remove();
break; // if only want to remove first match.
}
}
You could always use the values collection, since any changes made to that collection will result in the change being reflected in the map. So if you were to call Map.values().remove(valueToRemove) that should work - though I'm not sure if you'll see performance better than what you have with that loop. One idea would be to extend or override the map class such that the backing collection then is always sorted by value - that would enable you to do a binary search on the value which may be faster.
Edit: This is essentially the same as Alcon's answer except I don't think his will work since the entrySet is still going to be ordered by key - in which case you can't call .remove() with the value.
This is also assuming that the value is supposed to be unique or that you would want to remove any duplicates from the Map as well.
i would use this
Map x = new HashMap();
x.put(1, "value1");
x.put(2, "value2");
x.put(3, "value3");
x.put(4, "value4");
x.put(5, "value5");
x.put(6, "value6");
x.values().remove("value4");
edit:
because objects are referenced by "pointer" not by value.
N
If you have no way to figure out the key from the DomainObj, then I don't see how you can improve on that. There's no built in method to get the key from the value, so you have to iterate through the map.
If this is something you're doing all the time, you might maintain two maps (string->DomainObj and DomainObj->Key).
Like most of the other posters have said, it's generally an O(N) operation because you're going to have to look through the whole list of hashtable values regardless. #tackline has the right solution for keeping the memory usage at O(1) (I gave him an up-vote for that).
Your other option is to sacrifice memory space for the sake of speed. If your map is reasonably sized, you could store two maps in parallel.
If you have a Map then maintain a Map in parallel to it. When you insert/remove on one map, do it on the other also. Granted this is uglier because you're wasting space and you'll have to make sure the "hashCode" method of DomainObj is written properly, but your removal time drops from O(N) to O(1) because you can lookup the key/object mapping in constant time either direction.
Not generally the best solution, but if your number one concern is speed, I think this is probably as fast as you're gonna get.
====================
Addendum: This essentially what #msaeed suggested just sans the third party library.
A shorter usage of iterator is to use a values() iterator.
DomainObj valueToRemove = new DomainObj();
for (Iterator<DomainObj> it = map.values().iterator(); it.hasNext();)) {
if (valueToRemove.equals(it.next())) {
it.remove();
break;
}
}
We know this situation arise rarely but is extremely helpful. I'll prefer BidiMap from org.apache.commons.collections .
I don't think this will happen only once in the lifetime of your app.
So what I would do, is to delegate to another object the responsability to maintain a reference to the objects added to that map.
So the next time you need to remove it, you use that "reverse map" ...
class MapHolder {
private Map<String, DomainObj> originalMap;
private Map<DomainObj,String> reverseMap;
public void remove( DomainObj value ) {
if ( reverseMap.contains( value ) ) {
originalMap.remove( reverseMap.get( value ) );
reverseMap.remove( value );
}
}
}
This is much much faster than iterating.
Obviously you need to keep them synchronized. But it should not be that hard if you refector your code to have one object being responsible for the state of the map.
Remember that in OOP we have objects that have an state and behavior. If your data is passing around variables all over the place, you are creating unnecessary dependencies between objects
Yes, It will take you some time to correct the code, but the time spent correcting it, will save you a lot of headaches in the future. Think about it.

What basic operations on a Map are permitted while iterating over it?

Say I am iterating over a Map in Java... I am unclear about what I can to that Map while in the process of iterating over it. I guess I am mostly confused by this warning in the Javadoc for the Iterator interface remove method:
[...] The behavior of an iterator is unspecified if the underlying collection is modified while the iteration is in progress in any way other than by calling this method.
I know for sure that I can invoke the remove method without any issues. But while iterating over the Map collection, can I:
Change the value associated with a key with the Map class put method (put with an existing key)?
Add a new entry with the Map class put method (put with a new key)?
Remove an entry with the Map class remove method?
My guess is that I can probably safely do #1 (put to an existing key) but not safely do #2 or #3.
Thanks in advance for any clarification on this.
You can use Iterator.remove(), and if using an entrySet iterator (of Map.Entry's) you can use Map.Entry.setValue(). Anything else and all bets are off - you should not change the map directly, and some maps will
not permit either or both of the aforementioned methods.
Specifically, your (1), (2) and (3) are not permitted.
You might get away with setting an existing key's value through the Map object, but the Set.iterator() documentation specifically precludes that and it will be implementation specific:
If the map is modified while an iteration over the set is in progress (except through the iterator's own remove operation, or through the setValue operation on a map entry returned by the iterator) the results of the iteration are undefined. (emphasis added)
If you take a look at the HashMap class, you'll see a field called 'modCount'. This is how the map knows when it's been modified during iteration. Any method that increments modCount when you're iterating will cause it to throw a ConcurrentModificationException.
That said, you CAN put a value into a map if the key already exists, effectively updating the entry with the new value:
Map<String, Object> test = new HashMap<String, Object>();
test.put("test", 1);
for(String key : test.keySet())
{
test.put(key, 2); // this works!
}
System.out.println(test); // will print "test->2"
When you ask if you can perform these operations 'safely,' you don't have to worry too much because HashMap is designed to throw that ConcurrentModificationException as soon as it runs into a problem like this. These operations will fail fast; they won't leave the map in a bad state.
There is no global answer. The map interface let the choice to the users. Unfortunately, I think that all the implementations in the jdk use the fail-fast implementation (here is the definition of fail-fast, as it stated in the HashMap Javadoc):
The iterators returned by all of this
class's "collection view methods" are
fail-fast: if the map is structurally
modified at any time after the
iterator is created, in any way except
through the iterator's own remove
method, the iterator will throw a
ConcurrentModificationException. Thus,
in the face of concurrent
modification, the iterator fails
quickly and cleanly, rather than
risking arbitrary, non-deterministic
behavior at an undetermined time in
the future.
In general, if you want to change the Map while iterating over it, you should use one of the iterator's methods. I have not actually tested to see if #1 will will work, but the others definitely will not.

Categories