Is iteration via Collections.synchronizedSet(...).forEach() guaranteed to be thread safe? - java

As we know, iterating over a concurrent collection is not thread safe by default, so one cannot use:
Set<E> set = Collections.synchronizedSet(new HashSet<>());
//fill with data
for (E e : set) {
process(e);
}
This happens as data may be added during iteration, because there is no exclusive lock on set.
This is describe in the javadoc of Collections.synchronizedSet:
public static Set synchronizedSet(Set s)
Returns a synchronized (thread-safe) set backed by the specified set. In order to guarantee serial access, it is critical that all access to the backing set is accomplished through the returned set.
It is imperative that the user manually synchronize on the returned set when iterating over it:
Set s = Collections.synchronizedSet(new HashSet());
...
synchronized (s) {
Iterator i = s.iterator(); // Must be in the synchronized block
while (i.hasNext())
foo(i.next());
}
Failure to follow this advice may result in non-deterministic behavior.
However, this does not apply to Set.forEach, which inherits the default method forEach from Iterable.forEach.
Now I looked into the source code, and here we can see that we have the following structure:
We ask for a Collections.synchronizedSet().
We get one:
public static <T> Set<T> synchronizedSet(Set<T> s) {
return new SynchronizedSet<>(s);
}
...
static class SynchronizedSet<E>
extends SynchronizedCollection<E>
implements Set<E> {
private static final long serialVersionUID = 487447009682186044L;
SynchronizedSet(Set<E> s) {
super(s);
}
SynchronizedSet(Set<E> s, Object mutex) {
super(s, mutex);
}
public boolean equals(Object o) {
if (this == o)
return true;
synchronized (mutex) {return c.equals(o);}
}
public int hashCode() {
synchronized (mutex) {return c.hashCode();}
}
}
It extends SynchronizedCollection, which has the following interesting methods next to the obvious ones:
// Override default methods in Collection
#Override
public void forEach(Consumer<? super E> consumer) {
synchronized (mutex) {c.forEach(consumer);}
}
#Override
public boolean removeIf(Predicate<? super E> filter) {
synchronized (mutex) {return c.removeIf(filter);}
}
#Override
public Spliterator<E> spliterator() {
return c.spliterator(); // Must be manually synched by user!
}
#Override
public Stream<E> stream() {
return c.stream(); // Must be manually synched by user!
}
#Override
public Stream<E> parallelStream() {
return c.parallelStream(); // Must be manually synched by user!
}
The mutex used here is the same object as to which all operations of Collections.synchronizedSet lock to.
Now we can, judging by the implementation say that it is thread safe to use Collections.synchronizedSet(...).forEach(...), but is it also thread safe by specification?
(Confusingly enough, Collections.synchronizedSet(...).stream().forEach(...) is not thread safe by implementation, and the verdict of the specification seems to be unknown aswell.)

As you wrote, judging by implementation, forEach() is thread-safe for the collections provided with JDK (see disclaimer below) as it requires monitor of mutex to be acquired to proceed.
Is it also thread safe by specification?
My opinion - no, and here is an explanation. Collections.synchronizedXXX() javadoc, rewritten in short words, says - "all methods are thread-safe except for those used for iterating over it".
My other, although very subjective argument is what yshavit wrote - unless told/read that, consider API/class/whatever not thread-safe.
Now, let's take a closer look at the javadocs. I guess I may state that method forEach() is used to iterate over it, so, following the advice from javadoc, we should consider it not thread-safe, although it is opposite to reality (implementation).
Anyway, I agree with yshavit's statement that the documentation should be updated as this is most likely a documentation, not implementation flaw. But, no one can say for sure except for JDK developers, see concerns below.
The last point I'd like to mention within this discussion - we can assume that custom collection can be wrapped with Collections.synchronizedXXX(), and the implementation of forEach() of this collection can be... can be anything. The collection might perform asynchronous processing of elements within the forEach() method, spawn a thread for each element... it is bounded only by author's imagination, and synchronized(mutex) wrap cannot guarantee thread-safety for such cases. That particular issue might be the reason not to declare forEach() method as thread-safe..

It’s worth to have a look at the documentation of Collections.synchronizedCollection rather than Collections.synchronizedSet() as that documentation has been cleaned up already:
It is imperative that the user manually synchronize on the returned collection when traversing it
via Iterator, Spliterator or Stream: …
I think, this makes it pretty clear that there is a distinction between the iteration via an object other than the synchronized Collection itself and using its forEach method. But even with the old wording you can draw the conclusion that there is such a distinction:
It is imperative that the user manually synchronize on the returned set when iterating over it:…
(emphasis by me)
Compare to the documentation for Iterable.forEach:
Performs the given action for each element of the Iterable until all elements have been processed or the action throws an exception.
While it is clear to the developer that there must be an (internal) iteration happening to achieve this, this iteration is an implementation detail. From the given specification’s wording it’s just a (meta-)action for performing an action to each element.
When using that method, the user is not iterating over the elements and hence not responsible for the synchronization mentioned in the Collections.synchronized… documentation.
However, that’s a bit subtle and it’s good that the documentation of synchronizedCollection lists the cases for manual synchronization explicitly and I think the documentation of the other methods should be adapted as well.

As #Holger said, the doc clearly says user must manually synchronize collections returned by Collections.synchronizedXyz() when tranversing it via Iterator:
It is imperative that the user manually synchronize on the returned collection when traversing it via Iterator, Spliterator or Stream:
Collection c = Collections.synchronizedCollection(myCollection);
...
synchronized (c) {
Iterator i = c.iterator(); // Must be in the synchronized block
while (i.hasNext())
foo(i.next());
}
I want to explain a bit more about code.
Consider Collections.synchronizedList() method. It returns Collections.SynchronizedList class instance, which extends SynchronizedCollection which defines iterator() as follows:
public Iterator<E> iterator() {
return c.iterator(); // Must be manually synched by user!
}
Compare this with other methods of SynchronizedCollections, for example:
public String toString() {
synchronized (mutex) {return c.toString();}
}
Thus SynchronizedList inherits iterator() from SynchronizedCollection, which needs to be synched manually by user.

Related

Does client-side locking violates encapsulation of synchronization policy?

As mentioned by Java_author,
Client-side locking entails guarding client code that uses some object X with the lock, X uses to guard its own state.
That object X in below code is list. Above point says, using lock owned by ListHelper type object to synchronize putIfAbsent(), is a wrong lock.
package compositeobjects;
import java.util.ArrayList;
import java.util.Collections;
import java.util.List;
public class ListHelper<E> {
private List<E> list =
Collections.synchronizedList(new ArrayList<E>());
public boolean putIfAbsent(E x) {
synchronized(list){
boolean absent = !list.contains(x);
if(absent) {
list.add(x);
}
return absent;
}
}
}
But, Java author says,
Client-side locking has a lot in common with class extension—they both couple the behavior of the derived class to the implementation of the base class. Just as extension violates encapsulation of implementation [EJ Item 14], client-side locking violates encapsulation of synchronization policy.
My understanding is, the nested class instance that Collections.synchronizedList() return, also uses lock owned by list object.
Why usage of client-side locking(with list) in ListHelper, violates encapsulation of synchronization policy?
You are relying upon the fact that the synchronizedList uses itself as the monitor, which happens to be true at present.
You're even relying upon the fact that synchronizedList uses synchronized to achieve synchronization, which also happens to be true at present (it's a reasonable assumption, but it's not one that is necessary).
There are ways in which the implementation of synchronizedList could be changed such that your code wouldn't work correctly.
For instance, the constructor of synchronizedList:
SynchronizedList(List<E> list) {
super(list);
// ...
}
could be changed to
SynchronizedList(List<E> list) {
super(list, new Object());
// ...
}
Now, the mutex field used by the methods in the SynchronizedList implementation is no longer this (effectively), so synchronizing externally on list would no longer work.
With that said, the fact that using synchronized (list) does have the intended effect is described in the Javadoc, so this behavior won't be changed, so what you are doing now is absolutely fine; it has been designed using a leaky abstraction, and so shouldn't be designed like this if you were doing something similar from scratch, but that leaky abstraction's properties are documented.
Your code basically creates a synchronized set. It only adds elements if it not already in the list, the very definition of a set.
Regardless of how the synchronized list does its own locking the code you have must provide its own locking mechanism because there are two calls to the synchronized list, in which between the list will release its lock. So if two threads would add the same object they could both pass the contains check and both add it to the list. The compound synchronize makes sure that it isn't. Paramount is that all the list usage code goes through your utility class, otherwise it will still fail.
As I have written in an comment, exactly the same behaviour can be achieved with a synchronized set, which will also make sure that the element has not yet been added while locking the entire operation. By using this synchronized set access and modification without usage of your utility class is ok.
Edit:
If your code needs a list and not a set, and a LinkedHashSet isn't the option I would create a new synchronized list myself:
public class SynchronizedList<E> implements List<E> {
private List<E> wrapped = new ArrayList<E>();
....
#override
public int size() {
synchronized(this) {
return wrapped.size();
}
}
....
#override
public void add(E element) {
synchronized(this) {
boolean absent = !wrapped.contains(x);
if(absent) {
wrapped.add(element);
}
return absent;
}
}

What happens when we pass hashtable inside Collections.synchronizedMap()

Today I have asked a question in my interview.
The question is that Collections.synchronizedMap() is
used to synchronize the map which are by default not thread safe like hashmap.
His question is but we can pass any kind of map inside this method.
So what is the effect when we pass a hashtable inside this method because hashtable is by default synchronized.
The behavior of the map will be the same, but the performance will be affected, because each method will acquire two synchronization locks instead of one.
For example, consider calling the method size() on the resulting map. The implementation in the Collections.SynchronizedMap class looks like this:
public int size() {
synchronized(mutex) {return m.size();} // first lock
}
... where, m.size() calls the implementation in Hashtable:
public synchronized int size() { // second lock
return count;
}
The first lock object is the mutex field in SynchronizedMap. The second lock is implicit - the Hashtable instance itself.
You would have two synchronization levels: one at the level of the synchronized map itself, implemented by a mutex object, and one at the level of the wrapped instance:
public boolean isEmpty() {
// first level synchronization
synchronized(mutex) {
// second level synchronization if c is a Hashtable
return c.isEmpty();
}
}
The additional synchronization is not needed and can lead to lower performance.
Another effect is that you won't be able to use API from Hashtable like Hashtable#elements since the wrapped collection is now strictly a Map instance.
It will get wrapped into a SynchronizedMap, from java.util.Collections:
public static <K,V> Map<K,V> synchronizedMap(Map<K,V> m) {
return new SynchronizedMap<>(m);
}
The synchronizedMap() method does not distinguish between the types of Maps passed into it.
"His question is but we can pass any kind of map inside this method."
The answer is yes, because the constructor of SynchronizedMap accepts every Map in it's signature.
"So what is the effect when we pass a hashtable inside this method because hashtable is by default synchronized"
The answer is: We are showing ignorance to the ConcurrentHashMap which is most likely the tool to be uses instead of a blocking implementation.
If you see the code in the SynchronizedCollection. The methods will delegate the call to the underlying collection but adding synchronized block on top of the call something like this
public int size() {
synchronized (mutex) {return c.size();}
}
The implementation of the size looks like this in HashTable class
public synchronized int size() {
return count;
}
So, if you pass in HashTable to SynchronizedCollection, the thread accessing to SynchronizedCollection will have to take the locks at 2 levels once for synchronized block and another for synchronized method.
If there are other threads using the HashTable object directly, they can block the threads using the SynchronizedCollection even when the thread got the lock on SynchronizedCollection.

Non-thread-safe Attempt to Implement Put-if-absent?

There is one code snippet in the 4th chapter in Java Concurrency in Practice
public class ListHelper<E> {
public List<E> list =
Collections.synchronizedList(new ArrayList<E>());
...
public synchronized boolean putIfAbsent(E x) {
boolean absent = !list.contains(x);
if (absent)
list.add(x);
return absent;
}
}
it says this is thread safe for using different locks, putIfAbsent is not atomic relative to other operations on the List.
But I think "synchronized" preventing multithreads enter putIfAbsent, if there are other methods that do other operations on the List, key word synchronized should also be as the method atttribute. So following this way, should it be thread safe? Under what case "it is not atomic"?
putIfAbsent is not atomic relative to other operations on the List. But I think "synchronized" preventing multithreads enter putIfAbsent
This is true but there is no guarantees that there are other ways threads are accessing the list. The list field is public (which is always a bad idea) which means that other threads can call methods on the list directly. To properly protect the list, you should make it private and add add(...) and other methods to your ListHelper that are also synchronized to fully control all access to the synchronized-list.
// we are synchronizing the list so no reason to use Collections.synchronizedList
private List<E> list = new ArrayList<E>();
...
public synchronized boolean add(E e) {
return list.add(e);
}
If the list is private and all of the methods are synchronized that access the list then you can remove the Collections.synchronizedList(...) since you are synchronizing it yourself.
if there are other methods that do other operations on the List, key word synchronized should also be as the method atttribute. So following this way, should it be thread safe?
Not sure I fully parse this part of the question. But if you make the list be private and you add other methods to access the list that are all synchronized then you are correct.
Under what case "it is not atomic"?
putIfAbsent(...) is not atomic because there are multiple calls to the synchronized-list. If multiple threads are operating on the list then another thread could have called list.add(...) between the time putIfAbsent(...) called list.contains(x) and then calls list.add(x). The Collections.synchronizedList(...) protects the list against corruption by multiple threads but it cannot protect against race-conditions when there are multiple calls to list methods that could interleave with calls from other threads.
Any unsynchronized method that modifies the list may introduce the absent element after list.contains() returns false, but before the element has been added.
Picture this as two threads:
boolean absent = !list.contains(x); // Returns true
-> list.add(theSameElementAsX); // Another thread
if(absent) // absent is true, but the list has been modified!
list.add(x);
return absent;
This could be accomplished with simply a method as follows:
public void add(E e) {
list.add(e);
}
If the method were synchronized, there would be no problem, since the add method wouldn't be able to run before putIfAbsent() was fully finished.
A proper correction would include making the List private, and making sure that compound operations on it are properly synchronized (i.e. on the class or the list itself).
Thread safety is not composable! Imagine a program built entirely out of "thread safe" classes. Is the program itself "thread safe?" Not necessarily. It depends on what the program does with those classes.
The synchronizedList wrapper makes each individual method of a List "thread safe". What does that mean? It means that none of those wrapped methods can corrupt the internal structure of the list when called in a multi-threaded environment.
That doesn't protect the way in which any given program uses the list. In the example code, the list appears to be used as an implementation of a set: The program doesn't allow the same object to appear in the list more than one time. There's nothing in the synchronizedList wrapper that will enforce that particular guarantee though, because that guarantee has nothing to do with the internal structure of the list. The list can be perfectly valid as a list, but not valid as a set.
That's why the additional synchronization on the putIfAbsent() method.
Collections.synchronizedList() creates a collection which adds synchronization on private mutex for every single method of it. This mutex is list this for one-argument factory used in example, or can be provided when two-argument factory is used. That's why we need an external lock to make subsequent contains() and add() calls atomic.
In case the list is available directly, not via ListHelper, this code is broken, because access will be guarded by different locks in that case. To prevent that, it is possible to make list private to prevent direct access, and wrap all neccesary API with synchronization on the same mutex declared in ListHelper or on the this of ListHelper itself.

Does a lock on class, locks class variables too? - java

I have the below class
public class Example{
public static List<String> list = new ArrayList<String>();
public static void addElement(String val){
synchronized(list){
list.add(val);
}
}
public static synchronized void printElement(){
Iterator<String> it = list.iterator();
while(it.hasNext()){
//print element
}
}
}
Will the iterator() call in the printElement method throw ConcurrentModificationException? The basic question is if the lock on class object is acquired(as done in printElement method), will it lock the class members/ variables too? please help me with the answer.
Does a lock on class, locks class variables too? - java
Your lock is on your instance, not your class. And no, it only locks the instance.
Will the iterator() call in the printElement method throw ConcurrentModificationException?
It will if the code in that method modifies the list during the iteration. But if all of your code in that class also synchronizes, and you haven't given a reference to that list to anything outside your class, then you know that only the code in that method is running.
You'd probably be better off, though, synchronizing on the list itself. That way, even if you've given out a reference to the list, assuming all code that uses it synchronizes on it, you'll be safe from concurrent mods:
public static void printElement(){
// ^--- No `synchronized ` here unless you REALLY need it for other reasons
synchronized (list) {
Iterator<String> it = list.iterator();
while(it.hasNext()){
//print element
}
}
}
If you are giving out references and want to be really sure, either use a list returned by Collections.synchronizedList or something from the java.util.concurrent package.
No, a synchronized method does not lock the object variables, a synchronized method will lock only this.
Your code is not thread safe, since you are locking on different objects on addElement and printElement. There is nothing preventing the insertion to occur while iterating the list, if both method are called concurrently.
Will the iterator() call in the printElement method throw ConcurrentModificationException?
Yes, if addElement and printElement is called by two threads simultaneously.To avoid, ConcurrentModificationException, you could use CopyOnWriteList.
if the lock on class object is acquired(as done in printElement method), will it lock the class members/ variables too?
synchonized method printElement will aquire the lock of this object.Hence it wont allow another synchronized method or synchornized(this) block to be called at the same time, in your class, if there is any.
You never called addElement method, so locking has no effect on this code snippet. While you are iterating over a collection, if you insert/delete element to/from the same collection you get ConcurrentModificationException.
From Javadoc:
For example, it is not generally permissible for one thread to modify a Collection while another thread is iterating over it. In general, the results of the iteration are undefined under these circumstances. Some Iterator implementations (including those of all the general purpose collection implementations provided by the JRE) may choose to throw this exception if this behavior is detected. Iterators that do this are known as fail-fast iterators, as they fail quickly and cleanly, rather that risking arbitrary, non-deterministic behavior at an undetermined time in the future.
ArrayList throws ConcurrentModificationException when there will be concurrently modification on the collection or while iterating in case of change in collection structure.
You should better lock the list Object resouce. if list is having getter method to access it outside, so from out side it could able to modify the structure.
synchronized (list) {
Iterator<String> it = list.iterator();
while(it.hasNext()){
//print element
}
}

Determining synchronization scope?

in trying to improve my understanding on concurrency issues, I am looking at the following scenario (Edit: I've changed the example from List to Runtime, which is closer to what I am trying):
public class Example {
private final Object lock = new Object();
private final Runtime runtime = Runtime.getRuntime();
public void add(Object o) {
synchronized (lock) { runtime.exec(program + " -add "+o); }
}
public Object[] getAll() {
synchronized (lock) { return runtime.exec(program + " -list "); }
}
public void remove(Object o) {
synchronized (lock) { runtime.exec(program + " -remove "+o); }
}
}
As it stands, each method is by thread safe when used standalone. Now, what I'm trying to figure out is how to handle where the calling class wishes to call:
for (Object o : example.getAll()) {
// problems if multiple threads perform this operation concurrently
example.remove(b);
}
But as noted, there is no guarantee that the state will be consistent between the call to getAll() and the calls to remove(). If multiple threads call this, I'll be in trouble. So my question is - How should I enable the developer to perform the operation in a thread safe manner? Ideally I wish to enforce the thread safety in a way that makes it difficult for the developer to avoid/miss, but at the same time not complicated to achieve. I can think of three options so far:
A: Make the lock 'this', so the synchronization object is accessible to calling code, which can then wrap the code blocks. Drawback: Hard to enforce at compile time:
synchronized (example) {
for (Object o : example.getAll()) {
example.remove(b);
}
}
B: Place the combined code into the Example class - and benefit from being able to optimize the implementation, as in this case. Drawback: Pain to add extensions, and potential mixing unrelated logic:
public class Example {
...
public void removeAll() {
synchronized (lock) { Runtime.exec(program + " -clear"); }
}
}
C: Provide a Closure class. Drawback: Excess code, potentially too generous of a synchronization block, could in fact make deadlocks easier:
public interface ExampleClosure {
public void execute(Example example);
}
public Class Example {
...
public void execute(ExampleClosure closure) {
synchronized (this) { closure.execute(this); }
}
}
example.execute(new ExampleClosure() {
public void execute(Example example) {
for (Object o : example.getAll()) {
example.remove(b);
}
}
}
);
Is there something I'm missing? How should synchronization be scoped to ensure the code is thread safe?
Use a ReentrantReadWriteLock which is exposed via the API. That way, if someone needs to synchronize several API calls, they can acquire a lock outside of the method calls.
In general, this is a classic multithreaded design issue. By synchronizing the data structure rather than synchronizing concepts that use the data structure, it's hard to avoid the fact that you essentially have a reference to the data structure without a lock.
I would recommend that locks not be done so close to the data structure. But it's a popular option.
A potential technique to make this style work is to use an editing tree-walker. Essentially, you expose a function that does a callback on each element.
// pointer to function:
// - takes Object by reference and can be safely altered
// - if returns true, Object will be removed from list
typedef bool (*callback_function)(Object *o);
public void editAll(callback_function func) {
synchronized (lock) {
for each element o { if (callback_function(o)) {remove o} } }
}
So then your loop becomes:
bool my_function(Object *o) {
...
if (some condition) return true;
}
...
editAll(my_function);
...
The company I work for (corensic) has test cases extracted from real bugs to verify that Jinx is finding the concurrency errors properly. This type of low level data structure locking without higher level synchronization is pretty common pattern. The tree editing callback seems to be a popular fix for this race condition.
I think everyone is missing his real problem. When iterating over the new array of Object's and trying to remove one at a time the problem is still technically unsafe (though ArrayList implantation would not explode, it just wouldnt have expected results).
Even with CopyOnWriteArrayList there is the possibility that there is an out of date read on the current list to when you are trying to remove.
The two suggestions you offered are fine (A and B). My general suggestion is B. Making a collection thread-safe is very difficult. A good way to do it is to give the client as little functionality as possible (within reason). So offering the removeAll method and removing the getAll method would suffice.
Now you can at the same time say, 'well I want to keep the API the way it is and let the client worry about additional thread-safety'. If thats the case, document thread-safety. Document the fact that a 'lookup and modify' action is both non atomic and non thread-safe.
Today's concurrent list implementations are all thread safe for the single functions that are offered (get, remove add are all thread safe). Compound functions are not though and the best that could be done is documenting how to make them thread safe.
I think j.u.c.CopyOnWriteArrayList is a good example of similar problem you're trying to solve.
JDK had a similar problem with Lists - there were various ways to synchronize on arbitrary methods, but no synchronization on multiple invocations (and that's understandable).
So CopyOnWriteArrayList actually implements the same interface but has a very special contract, and whoever calls it, is aware of it.
Similar with your solution - you should probably implement List (or whatever interface this is) and at the same time define special contracts for existing/new methods. For example, getAll's consistency is not guaranteed, and calls to .remove do not fail if o is null, or isn't inside the list, etc. If users want both combined and safe/consistent options - this class of yours would provide a special method that does exactly that (e.g. safeDeleteAll), leaving other methods close to original contract as possible.
So to answer your question - I would pick option B, but would also implement interface your original object is implementing.
From the Javadoc for List.toArray():
The returned array will be "safe" in
that no references to it are
maintained by this list. (In other
words, this method must allocate a new
array even if this list is backed by
an array). The caller is thus free to
modify the returned array.
Maybe I don't understand what you're trying to accomplish. Do you want the Object[] array to always be in-sync with the current state of the List? In order to achieve that, I think you would have to synchronize on the Example instance itself and hold the lock until your thread is done with its method call AND any Object[] array it is currently using. Otherwise, how will you ever know if the original List has been modified by another thread?
You have to use the appropriate granularity when you choose what to lock. What you're complaining about in your example is too low a level of granularity, where the lock doesn't cover all the methods that have to happen together. You need to make methods that combine all the actions that need to happen together within the same lock.
Locks are reentrant so the high-level method can call low-level synchronized methods without a problem.

Categories