WeakHashMap and Concurrent Modification

WeakHashMap and Concurrent Modification - java

I'm reading the Java Doc about the WeakHashMap and I get the basic concept.
Because of the GC thread(s) acting in the background, you can get 'unusual behavior', such as a ConcurrentModificationException when iterating and etc.
The thing I don't get is that if the default implementation is not synchronized and does not contain lock in any way, then how come there is no possibility of getting an inconsistent state.
Say you have 2 threads. A GC thread deleting some key at a certain index and at same time and at the same index, a user thread is inserting in the array a key value pair.
To me, if there is no synchronization, then there is a high risk of getting a hash map that is inconsistent.
Even worse, doing something like this might actually be super dangerous because v might actually be null.
if (map.contains(k)) {
V v = map.get(k)
}
Am I missing something?

The inconsistent state issues you mention do not arise because the GC does not actively restructure WeakHashMaps. When the garbage collector frees the referent of a weak reference, the corresponding entry is not physically removed from the map; the entry merely becomes stale, with no key. At some later point, the entry may be physically removed during some other operation on the map, but the GC won't take on that responsibility.
You can see one Java version's implementation of this design on grepcode.

What you're describing is what the documentation explicitly states:
Because the garbage collector may discard keys at any time, a WeakHashMap may behave as though an unknown thread is silently removing entries.
The only mistake you're making is the assumption that you can protect the state by synchronizing. That doesn't work because the synchronization would not be mutual on the part of the GC. To quote the documentation:
In particular, even if you synchronize on a WeakHashMap instance and invoke none of its mutator methods, it is possible for the size method to return smaller values over time, for the isEmpty method to return false and then true, for the containsKey method to return true and later false for a given key, for the get method to return a value for a given key but later return null, for the put method to return null and the remove method to return false for a key that previously appeared to be in the map, and for successive examinations of the key set, the value collection, and the entry set to yield successively smaller numbers of elements.

Referring to
even if you synchronize on a WeakHashMap [...] it is possible for the size method to return smaller values over time
the javadoc sufficiently explains to me that there is a possibility for an inconsistent state and that it is completely independent from synchronization.
A few examples later, the given example is referred to, too:
for the containsKey method to return true and later false for a given key
So basically, one should never rely on the state of a WeakHashMap. but use it as atomic as possible. The given example should therefore be rephrased to
V v = map.get(k);
if(null != v) {
}
or
Optional.ofNullable(map.get(k)).ifPresent(() -> { } );

This class is intended primarily for use with key objects whose equals methods test for object identity using the == operator. Once such a key is discarded it can never be recreated, so it is impossible to do a lookup of that key in a WeakHashMap at some later time and be surprised that its entry has been removed.
So if one uses WeakHashMap for objects whose equals() is based on identity check, all is fine. The first case you mentioned ("A GC thread deleting some key at a certain index and at same time and at the same index, a user thread is inserting in the array a key value pair.") is impossible because as long as the user thread keeps a reference to the key object it cannot be discarded by GC.
And the same stands for the second example:
if (map.contains(k)) {
V v = map.get(k)
}
You keep reference k so the corresponding object is reachable and cannot be discarded.
But
This class will work perfectly well with key objects whose equals
methods are not based upon object identity, such as String instances.
With such recreatable key objects, however, the automatic removal of
WeakHashMap entries whose keys have been discarded may prove to be
confusing.

Related

How to lock on key in a ConcurrentHashMap

I am caching an object, which is created by a thread, into a map. The creation of the object is expensive, so I don't want multiple threads running to create the object because the put() hasn't returned. Once a thread tries to create an object for that key, other threads shouldn't try to create the object, even if put is not yet complete. Will using computeIfAbsent() work to acquire a 'lock' on that particular key? If not, is there another way to achieve this?

> Will using computeIfAbsent() work to acquire a 'lock' on that particular key?
Yes; per the Javadoc for ConcurrentHashMap.computeIfAbsent(...):
The entire method invocation is performed atomically, so the function is applied at most once per key.
That's really the whole point of the method.
However, to be clear, the lock is not completely specific to that one key; rather, ConcurrentHashMap typically works by splitting the map into multiple segments, and having one lock per segment. This allows a great deal of concurrency, and is usually the most efficient approach; but you should be aware that it means that some threads might block on your object creation even if they're not actually touching the same key.
If this is a problem for you, then another approach is to use something like ConcurrentHashMap<K, AtomicReference<V>> to decouple adding the map entry from populating the map entry. (AtomicReference<V> doesn't have a computeIfAbsent method, but at that point you can just use normal double-checked locking with a combination of get() and synchronized.)

Took some research, but we were probably after is the Java ConcurrentHashMap equivalent of .NET's .TryAdd method. Which is Java world is:
putIfAbsent
public V putIfAbsent(K key, V value);
If the specified key is not already associated with a value, associate it with the given value. This is equivalent to:
if (!map.containsKey(key))
return map.put(key, value);
else
return map.get(key);
except that the action is performed atomically.
I knew an atomic add operation had to exist; just was not easy to find. (which is odd because it's like the very first thing anyone would ever need to call).

What is the default behavior of clear in java concurrent hashmap

Internally does it lock all the rows and mark each key to be deleted? So that if another thread want to access a key that is about to be deleted it will provide the right behavior?
Or do we have to synchronized the clear function
synchronized (this) {
myMap.clear();
}
For example, consider the following
myMap = {1: 1, 2: 1, 3: 1}
//Thread1
myMap.clear()
//Thread2
myMap.compute(1, (k, v) -> {v == null ? 0 : k + 1})
What happens when thread1 execute first and thread2 want to access key1 but key1 is not yet deleted?
Is the result
{}
or
{1: 0}

The behavior of clear() in that context is unspecified1. The javadoc states:
"For aggregate operations such as putAll and clear, concurrent retrievals may reflect insertion or removal of only some entries."
Looking at the source code, clear() is implemented by clearing each of the segments one at a time. Each segment is locked while clearing, but there is no lock on the entire map. This means that another thread may add entries to a segment that has just been cleared .... before the overall clear() call returns.
So, in practice, either of the results / behaviors you propose is possible, depending on the size of maps, the distribution of keys between segments, the version of Java you are using, and ... timing.
Internally does it lock all the rows and mark each key to be deleted?
No. Each segment is locked (one at a time) while entries in the segment are removed. (This is done to avoid memory anomalies which might corrupt the segments' hash chains, etcetera)
Regarding this:
synchronized (this) {
myMap.clear();
}
That will not block other threads from inserting elements while the clear() is in progress. It will just stop two threads (executing the same code) from clearing at the same time.
If you want to guarantee that clear() clears the map, you would need to wrap the map using Collections.synchronizedMap wrapper, and use that consistently. In practice, that defeats the purpose of using ConcurrentHashMap.
Follow-up question:
So potentially there could be infinite loop of clearing right? If another thread keep adding element the size of the map is always > 0, the thread that is trying to clear will keep on running.
Nope. There will be no infinite loop. The clear() method is not looking at the size of the map.
What will actually happen is that the clear() call will return and the map won't necessarily be empty.
1 - On careful rereading, I've realized that the quoted javadoc doesn't directly answer the question. In fact, if you look at the "contract" in the Map.clear() javadoc, it states there that the map will be empty after the call returns. This is implicitly contradicted by the javadoc for the ConcurrentHashMap.clear() javadoc, and explicitly contradicted by what the code actually does.

Collectively considering two points from the official documentation, kind of provides the idea.
Retrieval operations (including get) generally do not block, so may overlap with update operations (including put and remove)
For aggregate operations such as putAll and clear, concurrent retrievals may reflect insertion or removal of only some entries.
So for your question you will get the value from the map if the key has not been deleted yet. However, you should not rely on the internal synchronization details of the ConcurrentHashMap and should base your code only on the thread safety guarantees provided by the class.

can hashmap have duplicate keys in multithreading environment [duplicate]

This question already has an answer here:
HashMap holding duplicate keys
(1 answer)
Closed 6 years ago.
If we do not use Collections.synchronizedMap() and let say i have a multi-threaded environment.
I know about race condition, re-sizing issue etc.
My question is can there be a case 2 threads Ta and Tb having same object and trying to put into a map.
Can there ever be 2 entries, if not how it is prevented. Is there a fraction of time diff between 2 put calls of 2 different threads running at same time.
As per my understanding, for both Ta and Tb both will check before putting, so can there be case of duplicate keys here.
Taking into consideration that we have overridden hashcode and equals properly.

The Javadoc for HashMap states:
Note that this implementation is not synchronized. If multiple threads
access a hash map concurrently, and at least one of the threads
modifies the map structurally, it must be synchronized externally. (A
structural modification is any operation that adds or deletes one or
more mappings; merely changing the value associated with a key that an
instance already contains is not a structural modification.) This is
typically accomplished by synchronizing on some object that naturally
encapsulates the map. If no such object exists, the map should be
"wrapped" using the Collections.synchronizedMap method.
So the docs say that you must synchronize access somehow, but do not say what will happen if you do not. That means that the behaviour when you do this is undefined -- all bets are off.
You can look at the source code for HashMap yourself. The heart of put is:
for (Entry<K,V> e = table[i]; e != null; e = e.next) {
Object k;
if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
V oldValue = e.value;
e.value = value;
e.recordAccess(this);
return oldValue;
}
}
modCount++;
addEntry(hash, key, value, i);
return null;
(Edit - this is the implementation in Java 6. Java 8's is dramatically different -- which reinforces the point)
We can speculate about the outcome if two threads attempt this simultaneously -- but it is pretty difficult to reason about. Sometimes it will result in two entries with the same key, sometimes it won't. It depends on timing.
TreeMap's put() is completely different of course, and its quirks when abused in this way will be different.
Any such behaviour is a quirk of the implementation, and the implementation may change in future without warning, because we are talking about undefined behaviour. The implementation makes no promises to you that it won't:
silently drop entries
go into an infinite loop
NullPointerException
claim huge amounts of memory
corrupt the store so that entries with other keys are lost
make previously removed entries reappear
create entries containing garbage from heap memory
etc.
The docs do state that a modification from elsewhere, while an Iterator is working on the object, will cause the Iterator to throw a ConcurrentModificationException -- but this is a different concern from synchronization, and could still happen if you used a SynchronizedMap
In summary, don't do it.

Is ConcurrentHashMap totally safe?

this is a passage from JavaDoc regarding ConcurrentHashMap. It says retrieval operations generally do not block, so may overlap with update operations. Does this mean the get() method is not thread safe?
"However, even though all operations are thread-safe, retrieval
operations do not entail locking, and there is not any support for
locking the entire table in a way that prevents all access. This class
is fully interoperable with Hashtable in programs that rely on its
thread safety but not on its synchronization details.
Retrieval operations (including get) generally do not block, so may
overlap with update operations (including put and remove). Retrievals
reflect the results of the most recently completed update operations
holding upon their onset."

The get() method is thread-safe, and the other users gave you useful answers regarding this particular issue.
However, although ConcurrentHashMap is a thread-safe drop-in replacement for HashMap, it is important to realize that if you are doing multiple operations you may have to change your code significantly. For example, take this code:
if (!map.containsKey(key))
return map.put(key, value);
else
return map.get(key);
In a multi-thread environment, this is a race condition. You have to use the ConcurrentHashMap.putIfAbsent(K key, V value) and pay attention to the return value, which tells you if the put operation was successful or not. Read the docs for more details.
Answering to a comment that asks for clarification on why this is a race condition.
Imagine there are two threads A, B that are going to put two different values in the map, v1 and v2 respectively, having the same key. The key is initially not present in the map. They interleave in this way:
Thread A calls containsKey and finds out that the key is not present, but is immediately suspended.
Thread B calls containsKey and finds out that the key is not present, and has the time to insert its value v2.
Thread A resumes and inserts v1, "peacefully" overwriting (since put is threadsafe) the value inserted by thread B.
Now thread B "thinks" it has successfully inserted its very own value v2, but the map contains v1. This is really a disaster because thread B may call v2.updateSomething() and will "think" that the consumers of the map (e.g. other threads) have access to that object and will see that maybe important update ("like: this visitor IP address is trying to perform a DOS, refuse all the requests from now on"). Instead, the object will be soon garbage collected and lost.

It is thread-safe. However, the way it is being thread-safe may not be what you expect. There are some "hints" you can see from:
This class is fully interoperable with Hashtable in programs that
rely on its thread safety but not on its synchronization details
To know the whole story in a more complete picture, you need to be aware of the ConcurrentMap interface.
The original Map provides some very basic read/update methods. Even I was able to make a thread-safe implementation of Map; there are lots of cases that people cannot use my Map without considering my synchronization mechanism. This is a typical example:
if (!threadSafeMap.containsKey(key)) {
threadSafeMap.put(key, value);
}
This piece of code is not thread-safe, even though the map itself is. Two threads calling containsKey() at the same time could think there is no such key they both therefore insert into the Map.
In order to fix the problem, we need to do extra synchronization explicitly. Assume the thread-safety of my Map is achieved by synchronized keywords, you will need to do:
synchronized(threadSafeMap) {
if (!threadSafeMap.containsKey(key)) {
threadSafeMap.put(key, value);
}
}
Such extra code needs you to know about the "synchronization details" of the map. In the above example, we need to know that the synchronization is achieved by "synchronized".
ConcurrentMap interface take this one step further. It defines some common "complex" actions that involves multiple access to map. For example, the above example is exposed as putIfAbsent(). With these "complex" actions, users of ConcurrentMap (in most case) don't need to synchronise actions with multiple access to the map. Hence, the implementation of Map can perform more complicated synchronization mechanism for better performance. ConcurrentHashhMap is a good example. Thread-safety is in fact maintained by keeping separate locks for different partitions of the map. It is thread-safe because concurrent access to the map will not corrupt the internal data structure, or cause any update lost unexpected, etc.
With all the above in mind, the meaning of Javadoc will be clearer:
"Retrieval operations (including get) generally do not block" because ConcurrentHashMap is not using "synchronized" for its thread-safety. The logic of get itself takes care of the thread-safeness; and If you look further in the Javadoc:
The table is internally partitioned to try to permit the indicated number
of concurrent updates without contention
Not only is retrieval non-blocking, even updates can happen concurrently. However, non-blocking/concurrent-updates does not means that it is thread-UNsafe. It simply means that it is using some ways other than simple "synchronized" for thread-safety.
However, as the internal synchronization mechanism is not exposed, if you want to do some complicated actions other than those provided by ConcurrentMap, you may need to consider changing your logic, or consider not using ConcurrentHashMap. For example:
// only remove if both key1 and key2 exists
if (map.containsKey(key1) && map.containsKey(key2)) {
map.remove(key1);
map.remove(key2);
}

ConcurrentHashmap.get() is thread-safe, in the sense that
It will not throw any exception, including ConcurrentModificationException
It will return a result that was true at some (recent) time in past. This means that two back-to-back calls to get can return different results. Of course, this true of any other Map as well.

HashMap is divided into "buckets" based on hashCode. ConcurrentHashMap uses this fact. Its synchronization mechanism is based on blocking buckets rather than on entire Map. This way few threads can simultaneously write to few different buckets (one thread can write to one bucket at a time).
Reading from ConcurrentHashMap almost doesn't use synchronization. Synchronization is used when while fetching value for key, it sees null value. Since ConcurrentHashMap can't store null as values (yes, aside from keys, values also can't be nulls) it suggests that fetching null while reading happened in the middle of initializing map entry (key-value pair) by another thread: when key was assigned, but value not yet, and it still holds default null.
In such case reading thread will need to wait until entry will be written fully.
So results from read() will be based on current state of map. If you read value of key that was in the middle of updating you will likely get old value since writing process hasn't finished yet.

get() in ConcurrentHashMap is thread-safe because It reads the value
which is Volatile. And in cases when value is null of any key, then
get() method waits till it gets the lock and then it reads the updated
value.
When put() method is updating CHM, then it sets the value of that key to null, and then it creates a new entry and updates the CHM. This null value is used by get() method as signal that another thread is updating the CHM with the same key.

It just means that when one thread is updating and one thread is reading there is no guarantee that the one that called the ConcurrentHashMap method first, in time, will have their operation occur first.
Think about an update on the item telling where Bob is. If one thread asks where Bob is at about the same time that another thread updates to say he came 'inside', you can't predict whether the reader thread will get Bob's status as 'inside' or 'outside'. Even if the update thread calls the method first, the reader thread might get the 'outside' status.
The threads will not cause each other problems. The code is ThreadSafe.
One thread won't go into an infinite loop or start generating wierd NullPointerExceptions or get "itside" with half of the old status and half of the new.

Does re-putting an object into a ConcurrentHashMap cause a "happens-before" memory relation?

I'm working with existing code that has an object store in the form of a ConcurrentHashMap. Within the map are stored mutable objects, use by multiple threads. No two threads try to modify an object at once by design. My concern is regarding the visibility of the modifications between the threads.
Currently the objects' code has synchronization on the "setters" (guarded by the object itself). There is no synchronization on the "getters" nor are the members volatile. This, to me, would mean that visibility is not guaranteed. However, when an object is modified it is re-put back into the map (the put() method is called again, same key). Does this mean that when another thread pulls the object out of the map, it will see the modifications?
I've researched this here on stackoverflow, in JCIP, and in the package description for java.util.concurrent. I've basically confused myself I think... but the final straw that caused me to ask this question was from the package description, it states:
Actions in a thread prior to placing an object into any concurrent collection happen-before actions subsequent to the access or removal of that element from the collection in another thread.
In relation to my question, do "actions" include the modifications to the objects stored in the map before the re-put()? If all this does result in visibility across threads, is this an efficient approach? I'm relatively new to threads and would appreciate your comments.
Edit:
Thank you all for you responses! This was my first question on StackOverflow and it has been very helpful to me.
I have to go with ptomli's answer because I think it most clearly addressed my confusion. To wit, establishing a "happens-before" relation doesn't necessarily affect modification visibility in this case. My "title question" is poorly constructed regarding my actual question described in the text. ptomli's answer now jives with what I read in JCIP: "To ensure all threads see the most up-to-date values of shared mutable variables, the reading and writing threads must synchronize on a common lock" (page 37). Re-putting the object back into the map doesn't provide this common lock for the modification to the inserted object's members.
I appreciate all the tips for change (immutable objects, etc), and I wholeheartedly concur. But for this case, as I mentioned there is no concurrent modification because of careful thread handling. One thread modifies an object, and another thread later reads the object (with the CHM being the object conveyer). I think the CHM is insufficient to ensure that the later executing thread will see the modifications from the first given the situation I provided. However, I think many of you correctly answered the title question.

You call concurrHashMap.put after each write to an object. However you did not specified that you also call concurrHashMap.get before each read. This is necessary.
This is true of all forms of synchronization: you need to have some "checkpoints" in both threads. Synchronizing only one thread is useless.
I haven't checked the source code of ConcurrentHashMap to make sure that put and get trigger an happens-before, but it is only logical that they should.
There is still an issue with your method however, even if you use both put and get. The problem happens when you modify an object and it is used (in an inconsistent state) by the other thread before it is put. It's a subtle problem because you might think the old value would be read since it hasn't been put yet and it would not cause a problem. The problem is that when you don't synchronize, you are not guaranteed to get a consistent older object, but rather the behavior is undefined. The JVM can update whatever part of the object in the other threads, at any time. It's only when using some explicit synchronization that you are sure you are updating the values in a consistent way across threads.
What you could do:
(1) synchronize all accesses (getters and setters) to your objects everywhere in the code. Be careful with the setters: make sure that you can't set the object in an inconsistent state. For example, when setting first and last name, having two synchronized setters is not sufficient: you must get the object lock for both operations together.
or
(2) when you put an object in the map, put a deep copy instead of the object itself. That way the other threads will never read an object in an inconsistent state.
EDIT:
I just noticed
Currently the objects' code has synchronization on the "setters"
(guarded by the object itself). There is no synchronization on the
"getters" nor are the members volatile.
This is not good. As I said above synchronizing on only one thread is no synchronization at all. You might synchronize on all your writer threads, but who cares since the readers won't get the right values.

I think this has been already said across a few answers but to sum it up
If your code goes
CHM#get
call various setters
CHM#put
then the "happens-before" provided by the put will guarantee that all the mutate calls are executed before the put. This means that any subsequent get will be guaranteed to see those changes.
Your problem is that the actual state of the object will not be deterministic because if the actual flow of events is
thread 1: CHM#get
thread 1: call setter
thread 2: CHM#get
thread 1: call setter
thread 1: call setter
thread 1: CHM#put
then there is no guarantee over what the state of the object will be in thread 2. It might see the object with the value provided by the first setter or it might not.
The immutable copy would be the best approach as then only completely consistent objects are published. Making the various setters synchronized (or the underlying references volatile) still doesn't let you publish consistent state, it just means that the object will always see the latest value for each getter on each call.

I think your question relates more to the objects you're storing in the map, and how they react to concurrent access, than the concurrent map itself.
If the instances you're storing in the map have synchronized mutators, but not synchronized accessors, then I don't see how they can be thread safe as described.
Take the Map out of the equation and determine if the instances you're storing are thread safe by themselves.
However, when an object is modified it is re-put back into the map (the put() method is called again, same key). Does this mean that when another thread pulls the object out of the map, it will see the modifications?
This exemplifies the confusion. The instance that is re-put into the Map will be retrieved from the Map by another thread. This is the guarantee of the concurrent map. That has nothing to do with visibility of the state of the stored instance itself.

My understanding is that it should work for all gets after the re-put, but this would be a very unsafe method of synchronization.
What happens to gets that happen before the re-put, but while modifications are happening. They may see only some of the changes, and the object would have an inconsistent state.
If you can, I'd recommend store immutable objects in the map. Then any get will retrieve a version of the object that was current when it did the get.

That's a code snippet from java.util.concurrent.ConcurrentHashMap (Open JDK 7):
919 public V get(Object key) {
920 Segment<K,V> s; // manually integrate access methods to reduce overhead
921 HashEntry<K,V>[] tab;
922 int h = hash(key.hashCode());
923 long u = (((h >>> segmentShift) & segmentMask) << SSHIFT) + SBASE;
924 if ((s = (Segment<K,V>)UNSAFE.getObjectVolatile(segments, u)) != null &&
925 (tab = s.table) != null) {
926 for (HashEntry<K,V> e = (HashEntry<K,V>) UNSAFE.getObjectVolatile
927 (tab, ((long)(((tab.length - 1) & h)) << TSHIFT) + TBASE);
928 e != null; e = e.next) {
929 K k;
930 if ((k = e.key) == key || (e.hash == h && key.equals(k)))
931 return e.value;
932 }
933 }
934 return null;
935 }
UNSAFE.getObjectVolatile() is documented as getter with internal volatile semantics, thus the memory barrier will be crossed when getting the reference.

yes, put incurs a volatile write, even if key-value already exists in the map.
using ConcurrentHashMap to publish objects across thread is pretty effecient. Objects should not be modified further once they are in the map. (They don't have to be strictly immutable (with final fields))

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.