Internally does it lock all the rows and mark each key to be deleted? So that if another thread want to access a key that is about to be deleted it will provide the right behavior?
Or do we have to synchronized the clear function
synchronized (this) {
myMap.clear();
}
For example, consider the following
myMap = {1: 1, 2: 1, 3: 1}
//Thread1
myMap.clear()
//Thread2
myMap.compute(1, (k, v) -> {v == null ? 0 : k + 1})
What happens when thread1 execute first and thread2 want to access key1 but key1 is not yet deleted?
Is the result
{}
or
{1: 0}
The behavior of clear() in that context is unspecified1. The javadoc states:
"For aggregate operations such as putAll and clear, concurrent retrievals may reflect insertion or removal of only some entries."
Looking at the source code, clear() is implemented by clearing each of the segments one at a time. Each segment is locked while clearing, but there is no lock on the entire map. This means that another thread may add entries to a segment that has just been cleared .... before the overall clear() call returns.
So, in practice, either of the results / behaviors you propose is possible, depending on the size of maps, the distribution of keys between segments, the version of Java you are using, and ... timing.
Internally does it lock all the rows and mark each key to be deleted?
No. Each segment is locked (one at a time) while entries in the segment are removed. (This is done to avoid memory anomalies which might corrupt the segments' hash chains, etcetera)
Regarding this:
synchronized (this) {
myMap.clear();
}
That will not block other threads from inserting elements while the clear() is in progress. It will just stop two threads (executing the same code) from clearing at the same time.
If you want to guarantee that clear() clears the map, you would need to wrap the map using Collections.synchronizedMap wrapper, and use that consistently. In practice, that defeats the purpose of using ConcurrentHashMap.
Follow-up question:
So potentially there could be infinite loop of clearing right? If another thread keep adding element the size of the map is always > 0, the thread that is trying to clear will keep on running.
Nope. There will be no infinite loop. The clear() method is not looking at the size of the map.
What will actually happen is that the clear() call will return and the map won't necessarily be empty.
1 - On careful rereading, I've realized that the quoted javadoc doesn't directly answer the question. In fact, if you look at the "contract" in the Map.clear() javadoc, it states there that the map will be empty after the call returns. This is implicitly contradicted by the javadoc for the ConcurrentHashMap.clear() javadoc, and explicitly contradicted by what the code actually does.
Collectively considering two points from the official documentation, kind of provides the idea.
Retrieval operations (including get) generally do not block, so may overlap with update operations (including put and remove)
For aggregate operations such as putAll and clear, concurrent retrievals may reflect insertion or removal of only some entries.
So for your question you will get the value from the map if the key has not been deleted yet. However, you should not rely on the internal synchronization details of the ConcurrentHashMap and should base your code only on the thread safety guarantees provided by the class.
Related
I'am trying to clarify HashMap vs ConcurrentHashMap regarding type-safety and also performance. I came across a lot of good articles, but still getting troubles figuring it all out.
Let's take the following example using a ConcurrentHashMap, where I will try to add a value for a key not already there and returning it, the new way of doing it would be:
private final Map<K,V> map = new ConcurrentHashMap<>();
return map.putIfAbsent(k, new Object());
let's assume we don't want to use the putIfAbsent method, the above code should look something like this:
private final Map<K,V> map = new ConcurrentHashMap<>();
synchronized (map) {
V value = map.get(key); //Edit adding the value fetch inside synchronized block
if (!nonNull(value)) {
map.put(key, new Object());
}
}
return map.get(key)
Is the problem with this approach the fact that the whole map is locked whereas in first approach the putIfAbsent method only synchronizes on the bucket on which the hash of the key is, and thus leading to less performance ? Would the second approach work fine with just a HashMap ?
Is the problem with this approach the fact that the whole map is locked
There are two problems with this approach.
It's not intrinsic
The fact that you've acquired the lock on the map reference has zero effect whatsoever, except in regards to any other code that (tries) to acquire this lock. Crucially, ConcurrentHashmap itself does not acquire this lock.
So, if, during that second snippet (with synchronized), some other thread does this:
map.putIfAbsent(key, new Object());
Then it may occur that your map.get(key) call returns null, and nevertheless your followup map.put call ends up overwriting. In other words, that both your thread, and that hypothetical thread running putIfAbsent, both decided to write.
Presumably, if that is just fine in your book, that'd be weird. Why use putIfAbsent and check if map.get returns null in the first place?
Had the other thread done this:
synchronized (map) {
map.putIfAbsent(key, new Object());
}
then there'd be no problem; either your get-check-if-null-then-set code will set and the putIfAbsent call is a noop, or vice versa, but they couldn't possibly both 'decide to write'.
Which leads us to;
This is pointless
There are two different ways to achieve concurrency with maps: Intrinsic and extrinsic. There is zero point in doing both, and they do not interact.
If you have structure whereby all access (both read and write) out of a plain old entirely non-multicore capable java.util.HashMap goes through some shared lock (the hashmap instance itself, or any other lock, long as all threads that interact with that particular map instance use the same one), then that works fine and there is therefore no reason or point to using ConcurrentHashMap instead.
The point of ConcurrentHashMap is to streamline concurrent processes without the use of extrinsic locking: To let the map do the locking.
One of the reasons you want this is that the ConcurrentHashMap impl is significantly faster at the jobs it is capable of doing; these jobs are spelled out explicitly: It's the methods that ConcurrentHashMap has.
Atomicity
The central problem of your code snippet is that it lacks atomicity. Check-then-act is fundamentally broken in concurrent models (in your case: Check: Is key 'k' associated with no value or null?, then Act: Set the mapping of key 'k' to value 'v'). This is broken because what if the thing you checked changes in between? What if you have two threads that both 'check-and-act' and then run simultaneously; then they both check first, then both act first, and broken things ensue: One of the two threads will be acting upon a state that isn't equal to the state as it was when you checked, which means your check's broken.
The right model is act-then-check: Act first, and then check the result of the operation. Of course, this requires redefining, and integrating, the code you wrote explicitly in your snippet, into the very definition of your 'act' phase.
In other words, putIfAbsent is not a convenience method! is a fundamental operation! It's the only way (short of extrinsic locking) to convey the notion of: "Perform the action of associating 'v' with 'k', but only if there is no association yet. I'll check the results of this operation next". There is no way to break that down into if (!map.containsKey(key)) map.put(key, v); because check-then-act does not work in concurrent modelling.
Conclusions
Either get rid of concurrenthashmap, or get rid of synchronized. Having code that uses both is probably broken and even if it isn't, it's error prone, confusing, and I can guarantee you there's a much better way to write it (better in that it is more idiomatic, easier to read, more flexible in the face of future change requests, easier to test, and less likely to have hard-to-test-for bugs in it).
If you can state all operations you need to perform 100% in terms of the methods that CHM has, then do that, because CHM is vastly superior. It even has mechanisms for arbitrary operations: For example, unlike basic hashmaps, you can iterate through a CHM even if other threads are also messing with it, whereas with a normal hashmap you need to hold the lock for the entire duration of the operation, which means any other thread trying to do anything to that hashmap, even just 'ask for its size', need to wait. Hence, for most use cases, CHM results in orders of magnitude better performance.
in first approach the putIfAbsent method only synchronizes on the bucket
That is incorrect, ConcurrentHashMap doesn't synchronize on anything, it uses different mechanics to ensure thread safety.
Would the second approach work fine with just a HashMap ?
Yes, except the second approach is flawed. If using synchronization to make a Map thread-safe, then all access of the Map should use synchronization. As such, it would be best to call Collections.synchronizedMap(map). Performance will be worse than using ConcurrentHashMap.
private final Map<Integer, Object> map = Collections.synchronizedMap(new HashMap<>());
let's assume we don't want to use the putIfAbsent method.
Why? Oh, because it wastes a allocation if the key is already in the map, which is why we should be using computeIfAbsent() instead
map.computeIfAbsent(key, k -> new Object());
Are actions in a thread prior to calling ConcurrentMap.remove() guaranteed to happen-before actions subsequent to seeing the removal from another thread?
Documentation says this regarding objects placed into the collection:
Actions in a thread prior to placing an object into any concurrent collection happen-before actions subsequent to the access or removal of that element from the collection in another thread.
Example code:
{
final ConcurrentMap map = new ConcurrentHashMap();
map.put(1, new Object());
final int[] value = { 0 };
new Thread(() -> {
value[0]++;
value[0]++;
value[0]++;
value[0]++;
value[0]++;
map.remove(1); // A
}).start();
new Thread(() -> {
if (map.get(1) == null) { // B
System.out.println(value[0]); // expect 5
}
}).start();
}
Is A in a happens-before relationship with B? Therefore, should the program only, if ever, print 5?
You have found an interesting subtle aspect of these concurrency tools that is easy to overlook.
First, it’s impossible to provide a general guaranty regarding removal and the retrieval of a null reference, as the latter only proves the absence of a mapping but not a previous removal, i.e. the thread could have read the map’s initial state, before the key ever had a mapping, which, of course, can’t establish a happens-before relationship with the actions that happened after the map’s construction.
Also, if there are multiple threads removing the same key, you can’t assume a happens-before relationship, when retrieving null, as you don’t know which removal has been completed. This issue is similar to the scenario when two threads insert the same value, but the latter can be fixed on the application side by only perform insertions of distinguishable values or by following the usual pattern of performing the desired modifications on the value object which is going to be inserted and to query the retrieved object only. For a removal, there is no such fix.
In your special case, there’s a happens-before relationship between the map.put(1, new Object()) action and the start of the second thread, so if the second thread encounters null when querying the key 1, it’s clear that it witnessed the sole removal of your code, still, the specification didn’t bother to provide an explicit guaranty for this special case.
Instead, the specification of Java 8’s ConcurrentHashMap says,
Retrievals reflect the results of the most recently completed update operations holding upon their onset. (More formally, an update operation for a given key bears a happens-before relation with any (non-null) retrieval for that key reporting the updated value.)
clearly ruling out null retrievals.
I think, with the current (Java 8) ConcurrentHashMap implementation, your code can’t break as it is rather conservative in that it performs all access to its internal backing array with volatile semantics. But that is only the current implementation and, as explained above, your code is a special case and likely to become broken with every change towards a real-life application.
No, you have the order wrong.
There is a happens-before edge from the put() to the subsequent get(). That edge is not symmetric, and doesn't work in the other direction. There is no happens-before edge from at get() to another get() or a remove(), or from a put() to another put().
In this case, you put an object in the map. Then you modify another object. That's a no-no. There's no edge from the those writes to the get() in the second thread, so those writes may not be visible to the second thread.
On Intel hardware, I think this will always work. However, it isn't guaranteed by the Java memory model, so you have to be wary if you ever port this code to different hardware.
A does not need to happen before B.
Only the original put happens before both. Thus a null at B means that A happened.
However write back of thread local memory cache and instruction order of ++ and remove are not mentioned. volatile is not used; instead a Map and an array are used to hopefully keep thread data synchrone. On writing the data back, in-order relation should hold again.
To my understanding A could remove and be written back, then the last ++ happen, and something like 4 being printed at B. I would add volatile to the array. The Map itself will go fine.
I am far from certain, but as I did not see a corresponding answer, I stick my neck out. (To learn myself.)
As ConcurrentHashMap is a thread safe collection, the statement map.remove(1) must have a read barrier and a write barrier if it alters the map. The expression map.get(1) must have a read barrier or one, or both of those operations are not thread safe.
In reality ConcurrentHashMap up to Java 7, uses partitioned locks, so it always has a read/write barrier for nearly every operation.
A ConcurrentSkipListMap doesn't have to use locks, but to perform any thread safe write action, a write barrier is required.
This means your test should always act as expected.
I learned yesterday that I've been incorrectly using collections with concurrency for many, many years.
Whenever I create a collection that needs to be accessed by more than one thread I wrap it in one of the Collections.synchronized* methods. Then, whenever mutating the collection I also wrap it in a synchronized block (I don't know why I was doing this, I must have thought I read it somewhere).
However, after reading the API more closely, it seems you need the synchronized block when iterating the collection. From the API docs (for Map):
It is imperative that the user manually synchronize on the returned map when iterating over any of its collection views:
And here's a small example:
List<O> list = Collections.synchronizedList(new ArrayList<O>());
...
synchronized(list) {
for(O o: list) { ... }
}
So, given this, I have two questions:
Why is this even necessary? The only explanation I can think of is they're using a default iterator instead of a managed thread-safe iterator, but they could have created a thread-safe iterator and fixed this mess, right?
More importantly, what is this accomplishing? By putting the iteration in a synchronized block you are preventing multiple threads from iterating at the same time. But another thread could mutate the list while iterating so how does the synchronized block help there? Wouldn't mutating the list somewhere else screw with the iteration whether it's synchronized or not? What am I missing?
Thanks for the help!
Why is this even necessary? The only explanation I can think of is
they're using a default iterator instead of a managed thread-safe
iterator, but they could have created a thread-safe iterator and fixed
this mess, right?
Iterating works with one element at a time. For the Iterator to be thread-safe, they'd need to make a copy of the collection. Failing that, any changes to the underlying Collection would affect how you iterate with unpredictable or undefined results.
More importantly, what is this accomplishing? By putting the iteration
in a synchronized block you are preventing multiple threads from
iterating at the same time. But another thread could mutate the list
while iterating so how does the synchronized block help there?
Wouldn't mutating the list somewhere else screw with the iteration
whether it's synchronized or not? What am I missing?
The methods of the object returned by synchronizedList(List) work by synchronizing on the instance. So no other thread could be adding/removing from the same List while you are inside a synchronized block on the List.
The basic case
All of the methods of the object returned by Collections.synchronizedList() are synchronized to the list object itself. Whenever a method is called from one thread, every other thread calling any method of it is blocked until the first call finishes.
So far so good.
Iterare necesse est
But that doesn't stop another thread from modifying the collection when you're between calls to next() on its Iterator. And if that happens, your code will fail with a ConcurrentModificationException. But if you do the iteration in a synchronized block too, and you synchronize on the same object (i.e. the list), this will stop other threads from calling any mutator methods on the list, they have to wait until your iterating thread releases the monitor for the list object. The key is that the mutator methods are synchronized to the same object as your iterator block, this is what's stopping them.
We're not out of the woods yet...
Note though that while the above guarantees basic integrity, it doesn't guarantee correct behaviour at all times. You might have other parts of your code that make assumptions which don't hold up in a multi-threaded environment:
List<Object> list = Collections.synchronizedList( ... );
...
if (!list.contains( "foo" )) {
// there's nothing stopping another thread from adding "foo" here itself, resulting in two copies existing in the list
list.add( "foo" );
}
...
synchronized( list ) { //this block guarantees that "foo" will only be added once
if (!list.contains( "foo" )) {
list.add( "foo" );
}
}
Thread-safe Iterator?
As for the question about a thread-safe iterator, there is indeed a list implementation with it, it's called CopyOnWriteArrayList. It is incredibly useful but as indicated in the API doc, it is limited to a handful of use cases only, specifically when your list is only modified very rarely but iterated over so frequently (and by so many threads) that synchronizing iterations would cause a serious bottle-neck. If you use it inappropriately, it can vastly degrade the performance of your application, as each and every modification of the list creates an entire new copy.
Synchronizing on the returned list is necessary, because internal operations synchronize on a mutex, and that mutex is this, i.e. the synchronized collection itself.
Here's some relevant code from Collections, constructors for SynchronizedCollection, the root of the synchronized collection hierarchy.
SynchronizedCollection(Collection<E> c) {
if (c==null)
throw new NullPointerException();
this.c = c;
mutex = this;
}
(There is another constructor that takes a mutex, used to initialize synchronized "view" collections from methods such as subList.)
If you synchronize on the synchronized list itself, then that does prevent another thread from mutating the list while you're iterating over it.
The imperative that you synchronize of the synchronized collection itself exists because if you synchronize on anything else, then what you have imagined could happen - another thread mutating the collection while you're iterating over it, because the objects locked are different.
Sotirios Delimanolis answered your second question "What is this accomplishing?" effectively. I wanted to amplify his answer to your first question:
Why is this even necessary? The only explanation I can think of is they're using a default iterator instead of a managed thread-safe iterator, but they could have created a thread-safe iterator and fixed this mess, right?
There are several ways to approach making a "thread-safe" iterator. As is typical with software systems, there are multiple possibilities, and they offer different tradeoffs in terms of performance (liveness) and consistency. Off the top of my head I see three possibilities.
1. Lockout + Fail-fast
This is what's suggested by the API docs. If you lock the synchronized wrapper object while iterating it (and the rest of the code in the system written correctly, so that mutation method calls also all go through the synchronized wrapper object), the iteration is guaranteed to see a consistent view of the contents of the collection. Each element will be traversed exactly once. The downside, of course, is that other threads are prevented from modifying or even reading the collection while it's being iterated.
A variation of this would use a reader-writer lock to allow reads but not writes during iteration. However, the iteration itself can mutate the collection, so this would spoil consistency for readers. You'd have to write your own wrapper to do this.
The fail-fast comes into play if the lock isn't taken around the iteration and somebody else modifies the collection, or if the lock is taken and somebody violates the locking policy. In this case if the iteration detects that the collection has been mutated out from under it, it throws ConcurrentModificationException.
2. Copy-on-write
This is the strategy employed by CopyOnWriteArrayList among others. An iterator on such a collection does not require locking, it will always show consistent results during iterator, and it will never throw ConcurrentModificationException. However, writes will always copy the entire array, which can be expensive. Perhaps more importantly, the notion of consistency is altered. The contents of the collection might have changed while you were iterating it -- more precisely, while you were iterating a snapshot of its state some time in the past -- so any decisions you might make now are potentially out of date.
3. Weakly Consistent
This strategy is employed by ConcurrentLinkedDeque and similar collections. The specification contains the definition of weakly consistent. This approach also doesn't require any locking, and iteration will never throw ConcurrentModificationException. But the consistency properties are extremely weak. For example, you might attempt to copy the contents of a ConcurrentLinkedDeque by iterating over it and adding each element encountered to a newly created List. But other threads might be modifying the deque while you're iterating it. In particular, if a thread removes an element "behind" where you've already iterated, and then adds an element "ahead" of where you're iterating, the iteration will probably observe both the removed element and the added element. The copy will thus have a "snapshot" that never actually existed at any point in time. Ya gotta admit that's a pretty weak notion of consistency.
The bottom line is that there's no simple notion of making an iterator thread safe that would "fix this mess". There are several different ways -- possibly more than I've explained here -- and they all involve differing tradeoffs. It's unlikely that any one policy will "do the right thing" in all circumstances for all programs.
this is a passage from JavaDoc regarding ConcurrentHashMap. It says retrieval operations generally do not block, so may overlap with update operations. Does this mean the get() method is not thread safe?
"However, even though all operations are thread-safe, retrieval
operations do not entail locking, and there is not any support for
locking the entire table in a way that prevents all access. This class
is fully interoperable with Hashtable in programs that rely on its
thread safety but not on its synchronization details.
Retrieval operations (including get) generally do not block, so may
overlap with update operations (including put and remove). Retrievals
reflect the results of the most recently completed update operations
holding upon their onset."
The get() method is thread-safe, and the other users gave you useful answers regarding this particular issue.
However, although ConcurrentHashMap is a thread-safe drop-in replacement for HashMap, it is important to realize that if you are doing multiple operations you may have to change your code significantly. For example, take this code:
if (!map.containsKey(key))
return map.put(key, value);
else
return map.get(key);
In a multi-thread environment, this is a race condition. You have to use the ConcurrentHashMap.putIfAbsent(K key, V value) and pay attention to the return value, which tells you if the put operation was successful or not. Read the docs for more details.
Answering to a comment that asks for clarification on why this is a race condition.
Imagine there are two threads A, B that are going to put two different values in the map, v1 and v2 respectively, having the same key. The key is initially not present in the map. They interleave in this way:
Thread A calls containsKey and finds out that the key is not present, but is immediately suspended.
Thread B calls containsKey and finds out that the key is not present, and has the time to insert its value v2.
Thread A resumes and inserts v1, "peacefully" overwriting (since put is threadsafe) the value inserted by thread B.
Now thread B "thinks" it has successfully inserted its very own value v2, but the map contains v1. This is really a disaster because thread B may call v2.updateSomething() and will "think" that the consumers of the map (e.g. other threads) have access to that object and will see that maybe important update ("like: this visitor IP address is trying to perform a DOS, refuse all the requests from now on"). Instead, the object will be soon garbage collected and lost.
It is thread-safe. However, the way it is being thread-safe may not be what you expect. There are some "hints" you can see from:
This class is fully interoperable with Hashtable in programs that
rely on its thread safety but not on its synchronization details
To know the whole story in a more complete picture, you need to be aware of the ConcurrentMap interface.
The original Map provides some very basic read/update methods. Even I was able to make a thread-safe implementation of Map; there are lots of cases that people cannot use my Map without considering my synchronization mechanism. This is a typical example:
if (!threadSafeMap.containsKey(key)) {
threadSafeMap.put(key, value);
}
This piece of code is not thread-safe, even though the map itself is. Two threads calling containsKey() at the same time could think there is no such key they both therefore insert into the Map.
In order to fix the problem, we need to do extra synchronization explicitly. Assume the thread-safety of my Map is achieved by synchronized keywords, you will need to do:
synchronized(threadSafeMap) {
if (!threadSafeMap.containsKey(key)) {
threadSafeMap.put(key, value);
}
}
Such extra code needs you to know about the "synchronization details" of the map. In the above example, we need to know that the synchronization is achieved by "synchronized".
ConcurrentMap interface take this one step further. It defines some common "complex" actions that involves multiple access to map. For example, the above example is exposed as putIfAbsent(). With these "complex" actions, users of ConcurrentMap (in most case) don't need to synchronise actions with multiple access to the map. Hence, the implementation of Map can perform more complicated synchronization mechanism for better performance. ConcurrentHashhMap is a good example. Thread-safety is in fact maintained by keeping separate locks for different partitions of the map. It is thread-safe because concurrent access to the map will not corrupt the internal data structure, or cause any update lost unexpected, etc.
With all the above in mind, the meaning of Javadoc will be clearer:
"Retrieval operations (including get) generally do not block" because ConcurrentHashMap is not using "synchronized" for its thread-safety. The logic of get itself takes care of the thread-safeness; and If you look further in the Javadoc:
The table is internally partitioned to try to permit the indicated number
of concurrent updates without contention
Not only is retrieval non-blocking, even updates can happen concurrently. However, non-blocking/concurrent-updates does not means that it is thread-UNsafe. It simply means that it is using some ways other than simple "synchronized" for thread-safety.
However, as the internal synchronization mechanism is not exposed, if you want to do some complicated actions other than those provided by ConcurrentMap, you may need to consider changing your logic, or consider not using ConcurrentHashMap. For example:
// only remove if both key1 and key2 exists
if (map.containsKey(key1) && map.containsKey(key2)) {
map.remove(key1);
map.remove(key2);
}
ConcurrentHashmap.get() is thread-safe, in the sense that
It will not throw any exception, including ConcurrentModificationException
It will return a result that was true at some (recent) time in past. This means that two back-to-back calls to get can return different results. Of course, this true of any other Map as well.
HashMap is divided into "buckets" based on hashCode. ConcurrentHashMap uses this fact. Its synchronization mechanism is based on blocking buckets rather than on entire Map. This way few threads can simultaneously write to few different buckets (one thread can write to one bucket at a time).
Reading from ConcurrentHashMap almost doesn't use synchronization. Synchronization is used when while fetching value for key, it sees null value. Since ConcurrentHashMap can't store null as values (yes, aside from keys, values also can't be nulls) it suggests that fetching null while reading happened in the middle of initializing map entry (key-value pair) by another thread: when key was assigned, but value not yet, and it still holds default null.
In such case reading thread will need to wait until entry will be written fully.
So results from read() will be based on current state of map. If you read value of key that was in the middle of updating you will likely get old value since writing process hasn't finished yet.
get() in ConcurrentHashMap is thread-safe because It reads the value
which is Volatile. And in cases when value is null of any key, then
get() method waits till it gets the lock and then it reads the updated
value.
When put() method is updating CHM, then it sets the value of that key to null, and then it creates a new entry and updates the CHM. This null value is used by get() method as signal that another thread is updating the CHM with the same key.
It just means that when one thread is updating and one thread is reading there is no guarantee that the one that called the ConcurrentHashMap method first, in time, will have their operation occur first.
Think about an update on the item telling where Bob is. If one thread asks where Bob is at about the same time that another thread updates to say he came 'inside', you can't predict whether the reader thread will get Bob's status as 'inside' or 'outside'. Even if the update thread calls the method first, the reader thread might get the 'outside' status.
The threads will not cause each other problems. The code is ThreadSafe.
One thread won't go into an infinite loop or start generating wierd NullPointerExceptions or get "itside" with half of the old status and half of the new.
I have a synchronized Map (via Collections.synchronizedMap()) that is read and updated by Thread A. Thread B accesses the Map only via Map.keySet() (read-only).
How should I synchronize this? The docs say that keySet() (for a Collections.synchronizedMap) "Needn't be in synchronized block". I can put Thread A's read/write access within a synchronized block, but is that even necessary?
I guess it seems odd to me to even use a synchronized Map, or a synchronized block, if Map.keySet doesn't need to be synchronized (according to the docs link above)...
Update: I missed that iteration of the keySet must be synchronized, even though retrieving the keySet does not require sync. Not particularly exciting to have the keySet without being able to look through it, so end result = synchronization required. Using a ConcurrentHashMap instead.
To make a truly read/write versus read/only locking Map wrapper, you can take a look at the wrapper the Collections uses for synchronizedMap() and replace all of the synchronized statements with a ReentrantReadWriteLock. This is a good bit of work. Instead, you should consider switching to using a ConcurrentHashMap which does all of the right things there.
In terms of keySet(), it doesn't need to be in a synchronized block because it is already being synchronized by the Collections.synchronizedMap(). The Javadocs is just pointing out that if you are iterating through the map, you need to synchronize on it because you are doing multiple operations, but you don't need to synchronize when you are getting the keySet() which is wrapped in a SynchronizedSet class which does its own synchronization.
Lastly, your question seemed to be implying that you don't need to synchronize on something if you are just reading from it. You have to remember that synchronization not only protects against race conditions but also ensures that the data is properly shared by each of the processors. Even if you are accessing a Map as read-only, you still need to synchronize on it if any other thread is updating it.
The docs are telling you how to properly synchronize multi-step operations that need to be atomic, in this case iterating over the map:
Map m = Collections.synchronizedMap(new HashMap());
...
Set s = m.keySet(); // Needn't be in synchronized block
...
synchronized(m) { // Synchronizing on m, not s!
Iterator i = s.iterator(); // Must be in synchronized block
while (i.hasNext())
foo(i.next());
}
Note how the actual iteration must be in a synchronized block. The docs are just saying that it doesn't matter if obtaining the keySet() is in the synchronized block, because it's a live view of the Map. If the keys in the map change between the reference to the key set being obtained and the beginning of the synchronized block, the key set will reflect those changes.
And by the way, the docs you cite are only for a Map returned by Collections.synchronizedMap. The statement does not necessarily apply to all Maps.
The docs are correct. The map returned from Collections.synchronizedMap() will properly wrap synchronized around all calls sent to the original Map. However, the set impl returned by keySet() does not have the same property, so you must ensure it is read under the same lock.
Without this synchronization, there is no guarantee that Thread B will ever see any update made by Thread A.
You might want to investigate ConcurrentHashMap. It provides useful semantics for exactly this use case. Iterating over a collection view in CHM (like keySet()) gives useful concurrent behavior ("weakly consistent" iterators). You will traverse all keys from the state of the collection at iteration and you may or may not see changes after the iterator was created.