Does client-side locking violates encapsulation of synchronization policy? - java

As mentioned by Java_author,
Client-side locking entails guarding client code that uses some object X with the lock, X uses to guard its own state.
That object X in below code is list. Above point says, using lock owned by ListHelper type object to synchronize putIfAbsent(), is a wrong lock.
package compositeobjects;
import java.util.ArrayList;
import java.util.Collections;
import java.util.List;
public class ListHelper<E> {
private List<E> list =
Collections.synchronizedList(new ArrayList<E>());
public boolean putIfAbsent(E x) {
synchronized(list){
boolean absent = !list.contains(x);
if(absent) {
list.add(x);
}
return absent;
}
}
}
But, Java author says,
Client-side locking has a lot in common with class extension—they both couple the behavior of the derived class to the implementation of the base class. Just as extension violates encapsulation of implementation [EJ Item 14], client-side locking violates encapsulation of synchronization policy.
My understanding is, the nested class instance that Collections.synchronizedList() return, also uses lock owned by list object.
Why usage of client-side locking(with list) in ListHelper, violates encapsulation of synchronization policy?

You are relying upon the fact that the synchronizedList uses itself as the monitor, which happens to be true at present.
You're even relying upon the fact that synchronizedList uses synchronized to achieve synchronization, which also happens to be true at present (it's a reasonable assumption, but it's not one that is necessary).
There are ways in which the implementation of synchronizedList could be changed such that your code wouldn't work correctly.
For instance, the constructor of synchronizedList:
SynchronizedList(List<E> list) {
super(list);
// ...
}
could be changed to
SynchronizedList(List<E> list) {
super(list, new Object());
// ...
}
Now, the mutex field used by the methods in the SynchronizedList implementation is no longer this (effectively), so synchronizing externally on list would no longer work.
With that said, the fact that using synchronized (list) does have the intended effect is described in the Javadoc, so this behavior won't be changed, so what you are doing now is absolutely fine; it has been designed using a leaky abstraction, and so shouldn't be designed like this if you were doing something similar from scratch, but that leaky abstraction's properties are documented.

Your code basically creates a synchronized set. It only adds elements if it not already in the list, the very definition of a set.
Regardless of how the synchronized list does its own locking the code you have must provide its own locking mechanism because there are two calls to the synchronized list, in which between the list will release its lock. So if two threads would add the same object they could both pass the contains check and both add it to the list. The compound synchronize makes sure that it isn't. Paramount is that all the list usage code goes through your utility class, otherwise it will still fail.
As I have written in an comment, exactly the same behaviour can be achieved with a synchronized set, which will also make sure that the element has not yet been added while locking the entire operation. By using this synchronized set access and modification without usage of your utility class is ok.
Edit:
If your code needs a list and not a set, and a LinkedHashSet isn't the option I would create a new synchronized list myself:
public class SynchronizedList<E> implements List<E> {
private List<E> wrapped = new ArrayList<E>();
....
#override
public int size() {
synchronized(this) {
return wrapped.size();
}
}
....
#override
public void add(E element) {
synchronized(this) {
boolean absent = !wrapped.contains(x);
if(absent) {
wrapped.add(element);
}
return absent;
}
}

Related

Will all java threads see shared resource updates after modification in synchronized method?

Is it necessary to use specialized concurrent versions of Java's Collections data structures (e.g. CopyOnWriteArrayList vs ArrayList) if all access to a pair of data structures is always wrapped with the acquisition and release of a lock (in particular, using a static synchronized method for any modifications to the data structure). For example:
public static synchronized Item doIt() {
// remove something from data structure 1
// add the removed item to data structure 2
// return removed item
}
I know the synchronized method will enforce only one thread at a time performing the updates, but by the time a thread has exited the method, are the other threads guaranteed to see the updated data structures, or do I still need specialized concurrent data structures for that guarantee?
Edit:
Here's a better example of what I'm trying to do:
private static final List<Item> A;
private static final HashMap<Integer,Item> B;
public static Item doSomething() {
// some stuff ...
Item item = doIt();
// some other stuff ...
return item;
}
private static synchronized Item doIt() {
Item theItem = A.remove( A.size()-1 );
B.put( theItem.getId(), theItem );
return theItem;
}
Yes, if the access is always wrapped in synchronized methods/blocks.
This is because, synchronized establishes a happens-before relation between synchronized methods/blocks (on the same object). Quoting from Synchronized Methods in the Java Tutorial:
Second, when a synchronized method exits, it automatically establishes
a happens-before relationship with any subsequent invocation of a
synchronized method for the same object. This guarantees that changes
to the state of the object are visible to all threads.
However, it is important that you really wrap all access in synchronized blocks. If you would, for example, return a reference to the list from a synchronized method like this
public synchronized List<Object> GetList() {
return this.myList;
}
and use the list outside the synchronized method, you will not get that guarantee!
Synchronization is about quite a bit more than just mutual exclusion. Namely, it is about the visibility of all actions which go on within the block (and precede it as well) to any other thread which subsequently acquires the same lock.
Therefore you don't need concurrent data structures when you use locking for all access to the structures.
Finally, just to make sure: you must use locking for all access, including all reads.

Is iteration via Collections.synchronizedSet(...).forEach() guaranteed to be thread safe?

As we know, iterating over a concurrent collection is not thread safe by default, so one cannot use:
Set<E> set = Collections.synchronizedSet(new HashSet<>());
//fill with data
for (E e : set) {
process(e);
}
This happens as data may be added during iteration, because there is no exclusive lock on set.
This is describe in the javadoc of Collections.synchronizedSet:
public static Set synchronizedSet(Set s)
Returns a synchronized (thread-safe) set backed by the specified set. In order to guarantee serial access, it is critical that all access to the backing set is accomplished through the returned set.
It is imperative that the user manually synchronize on the returned set when iterating over it:
Set s = Collections.synchronizedSet(new HashSet());
...
synchronized (s) {
Iterator i = s.iterator(); // Must be in the synchronized block
while (i.hasNext())
foo(i.next());
}
Failure to follow this advice may result in non-deterministic behavior.
However, this does not apply to Set.forEach, which inherits the default method forEach from Iterable.forEach.
Now I looked into the source code, and here we can see that we have the following structure:
We ask for a Collections.synchronizedSet().
We get one:
public static <T> Set<T> synchronizedSet(Set<T> s) {
return new SynchronizedSet<>(s);
}
...
static class SynchronizedSet<E>
extends SynchronizedCollection<E>
implements Set<E> {
private static final long serialVersionUID = 487447009682186044L;
SynchronizedSet(Set<E> s) {
super(s);
}
SynchronizedSet(Set<E> s, Object mutex) {
super(s, mutex);
}
public boolean equals(Object o) {
if (this == o)
return true;
synchronized (mutex) {return c.equals(o);}
}
public int hashCode() {
synchronized (mutex) {return c.hashCode();}
}
}
It extends SynchronizedCollection, which has the following interesting methods next to the obvious ones:
// Override default methods in Collection
#Override
public void forEach(Consumer<? super E> consumer) {
synchronized (mutex) {c.forEach(consumer);}
}
#Override
public boolean removeIf(Predicate<? super E> filter) {
synchronized (mutex) {return c.removeIf(filter);}
}
#Override
public Spliterator<E> spliterator() {
return c.spliterator(); // Must be manually synched by user!
}
#Override
public Stream<E> stream() {
return c.stream(); // Must be manually synched by user!
}
#Override
public Stream<E> parallelStream() {
return c.parallelStream(); // Must be manually synched by user!
}
The mutex used here is the same object as to which all operations of Collections.synchronizedSet lock to.
Now we can, judging by the implementation say that it is thread safe to use Collections.synchronizedSet(...).forEach(...), but is it also thread safe by specification?
(Confusingly enough, Collections.synchronizedSet(...).stream().forEach(...) is not thread safe by implementation, and the verdict of the specification seems to be unknown aswell.)
As you wrote, judging by implementation, forEach() is thread-safe for the collections provided with JDK (see disclaimer below) as it requires monitor of mutex to be acquired to proceed.
Is it also thread safe by specification?
My opinion - no, and here is an explanation. Collections.synchronizedXXX() javadoc, rewritten in short words, says - "all methods are thread-safe except for those used for iterating over it".
My other, although very subjective argument is what yshavit wrote - unless told/read that, consider API/class/whatever not thread-safe.
Now, let's take a closer look at the javadocs. I guess I may state that method forEach() is used to iterate over it, so, following the advice from javadoc, we should consider it not thread-safe, although it is opposite to reality (implementation).
Anyway, I agree with yshavit's statement that the documentation should be updated as this is most likely a documentation, not implementation flaw. But, no one can say for sure except for JDK developers, see concerns below.
The last point I'd like to mention within this discussion - we can assume that custom collection can be wrapped with Collections.synchronizedXXX(), and the implementation of forEach() of this collection can be... can be anything. The collection might perform asynchronous processing of elements within the forEach() method, spawn a thread for each element... it is bounded only by author's imagination, and synchronized(mutex) wrap cannot guarantee thread-safety for such cases. That particular issue might be the reason not to declare forEach() method as thread-safe..
It’s worth to have a look at the documentation of Collections.synchronizedCollection rather than Collections.synchronizedSet() as that documentation has been cleaned up already:
It is imperative that the user manually synchronize on the returned collection when traversing it
via Iterator, Spliterator or Stream: …
I think, this makes it pretty clear that there is a distinction between the iteration via an object other than the synchronized Collection itself and using its forEach method. But even with the old wording you can draw the conclusion that there is such a distinction:
It is imperative that the user manually synchronize on the returned set when iterating over it:…
(emphasis by me)
Compare to the documentation for Iterable.forEach:
Performs the given action for each element of the Iterable until all elements have been processed or the action throws an exception.
While it is clear to the developer that there must be an (internal) iteration happening to achieve this, this iteration is an implementation detail. From the given specification’s wording it’s just a (meta-)action for performing an action to each element.
When using that method, the user is not iterating over the elements and hence not responsible for the synchronization mentioned in the Collections.synchronized… documentation.
However, that’s a bit subtle and it’s good that the documentation of synchronizedCollection lists the cases for manual synchronization explicitly and I think the documentation of the other methods should be adapted as well.
As #Holger said, the doc clearly says user must manually synchronize collections returned by Collections.synchronizedXyz() when tranversing it via Iterator:
It is imperative that the user manually synchronize on the returned collection when traversing it via Iterator, Spliterator or Stream:
Collection c = Collections.synchronizedCollection(myCollection);
...
synchronized (c) {
Iterator i = c.iterator(); // Must be in the synchronized block
while (i.hasNext())
foo(i.next());
}
I want to explain a bit more about code.
Consider Collections.synchronizedList() method. It returns Collections.SynchronizedList class instance, which extends SynchronizedCollection which defines iterator() as follows:
public Iterator<E> iterator() {
return c.iterator(); // Must be manually synched by user!
}
Compare this with other methods of SynchronizedCollections, for example:
public String toString() {
synchronized (mutex) {return c.toString();}
}
Thus SynchronizedList inherits iterator() from SynchronizedCollection, which needs to be synched manually by user.

Non-thread-safe Attempt to Implement Put-if-absent?

There is one code snippet in the 4th chapter in Java Concurrency in Practice
public class ListHelper<E> {
public List<E> list =
Collections.synchronizedList(new ArrayList<E>());
...
public synchronized boolean putIfAbsent(E x) {
boolean absent = !list.contains(x);
if (absent)
list.add(x);
return absent;
}
}
it says this is thread safe for using different locks, putIfAbsent is not atomic relative to other operations on the List.
But I think "synchronized" preventing multithreads enter putIfAbsent, if there are other methods that do other operations on the List, key word synchronized should also be as the method atttribute. So following this way, should it be thread safe? Under what case "it is not atomic"?
putIfAbsent is not atomic relative to other operations on the List. But I think "synchronized" preventing multithreads enter putIfAbsent
This is true but there is no guarantees that there are other ways threads are accessing the list. The list field is public (which is always a bad idea) which means that other threads can call methods on the list directly. To properly protect the list, you should make it private and add add(...) and other methods to your ListHelper that are also synchronized to fully control all access to the synchronized-list.
// we are synchronizing the list so no reason to use Collections.synchronizedList
private List<E> list = new ArrayList<E>();
...
public synchronized boolean add(E e) {
return list.add(e);
}
If the list is private and all of the methods are synchronized that access the list then you can remove the Collections.synchronizedList(...) since you are synchronizing it yourself.
if there are other methods that do other operations on the List, key word synchronized should also be as the method atttribute. So following this way, should it be thread safe?
Not sure I fully parse this part of the question. But if you make the list be private and you add other methods to access the list that are all synchronized then you are correct.
Under what case "it is not atomic"?
putIfAbsent(...) is not atomic because there are multiple calls to the synchronized-list. If multiple threads are operating on the list then another thread could have called list.add(...) between the time putIfAbsent(...) called list.contains(x) and then calls list.add(x). The Collections.synchronizedList(...) protects the list against corruption by multiple threads but it cannot protect against race-conditions when there are multiple calls to list methods that could interleave with calls from other threads.
Any unsynchronized method that modifies the list may introduce the absent element after list.contains() returns false, but before the element has been added.
Picture this as two threads:
boolean absent = !list.contains(x); // Returns true
-> list.add(theSameElementAsX); // Another thread
if(absent) // absent is true, but the list has been modified!
list.add(x);
return absent;
This could be accomplished with simply a method as follows:
public void add(E e) {
list.add(e);
}
If the method were synchronized, there would be no problem, since the add method wouldn't be able to run before putIfAbsent() was fully finished.
A proper correction would include making the List private, and making sure that compound operations on it are properly synchronized (i.e. on the class or the list itself).
Thread safety is not composable! Imagine a program built entirely out of "thread safe" classes. Is the program itself "thread safe?" Not necessarily. It depends on what the program does with those classes.
The synchronizedList wrapper makes each individual method of a List "thread safe". What does that mean? It means that none of those wrapped methods can corrupt the internal structure of the list when called in a multi-threaded environment.
That doesn't protect the way in which any given program uses the list. In the example code, the list appears to be used as an implementation of a set: The program doesn't allow the same object to appear in the list more than one time. There's nothing in the synchronizedList wrapper that will enforce that particular guarantee though, because that guarantee has nothing to do with the internal structure of the list. The list can be perfectly valid as a list, but not valid as a set.
That's why the additional synchronization on the putIfAbsent() method.
Collections.synchronizedList() creates a collection which adds synchronization on private mutex for every single method of it. This mutex is list this for one-argument factory used in example, or can be provided when two-argument factory is used. That's why we need an external lock to make subsequent contains() and add() calls atomic.
In case the list is available directly, not via ListHelper, this code is broken, because access will be guarded by different locks in that case. To prevent that, it is possible to make list private to prevent direct access, and wrap all neccesary API with synchronization on the same mutex declared in ListHelper or on the this of ListHelper itself.

Determining synchronization scope?

in trying to improve my understanding on concurrency issues, I am looking at the following scenario (Edit: I've changed the example from List to Runtime, which is closer to what I am trying):
public class Example {
private final Object lock = new Object();
private final Runtime runtime = Runtime.getRuntime();
public void add(Object o) {
synchronized (lock) { runtime.exec(program + " -add "+o); }
}
public Object[] getAll() {
synchronized (lock) { return runtime.exec(program + " -list "); }
}
public void remove(Object o) {
synchronized (lock) { runtime.exec(program + " -remove "+o); }
}
}
As it stands, each method is by thread safe when used standalone. Now, what I'm trying to figure out is how to handle where the calling class wishes to call:
for (Object o : example.getAll()) {
// problems if multiple threads perform this operation concurrently
example.remove(b);
}
But as noted, there is no guarantee that the state will be consistent between the call to getAll() and the calls to remove(). If multiple threads call this, I'll be in trouble. So my question is - How should I enable the developer to perform the operation in a thread safe manner? Ideally I wish to enforce the thread safety in a way that makes it difficult for the developer to avoid/miss, but at the same time not complicated to achieve. I can think of three options so far:
A: Make the lock 'this', so the synchronization object is accessible to calling code, which can then wrap the code blocks. Drawback: Hard to enforce at compile time:
synchronized (example) {
for (Object o : example.getAll()) {
example.remove(b);
}
}
B: Place the combined code into the Example class - and benefit from being able to optimize the implementation, as in this case. Drawback: Pain to add extensions, and potential mixing unrelated logic:
public class Example {
...
public void removeAll() {
synchronized (lock) { Runtime.exec(program + " -clear"); }
}
}
C: Provide a Closure class. Drawback: Excess code, potentially too generous of a synchronization block, could in fact make deadlocks easier:
public interface ExampleClosure {
public void execute(Example example);
}
public Class Example {
...
public void execute(ExampleClosure closure) {
synchronized (this) { closure.execute(this); }
}
}
example.execute(new ExampleClosure() {
public void execute(Example example) {
for (Object o : example.getAll()) {
example.remove(b);
}
}
}
);
Is there something I'm missing? How should synchronization be scoped to ensure the code is thread safe?
Use a ReentrantReadWriteLock which is exposed via the API. That way, if someone needs to synchronize several API calls, they can acquire a lock outside of the method calls.
In general, this is a classic multithreaded design issue. By synchronizing the data structure rather than synchronizing concepts that use the data structure, it's hard to avoid the fact that you essentially have a reference to the data structure without a lock.
I would recommend that locks not be done so close to the data structure. But it's a popular option.
A potential technique to make this style work is to use an editing tree-walker. Essentially, you expose a function that does a callback on each element.
// pointer to function:
// - takes Object by reference and can be safely altered
// - if returns true, Object will be removed from list
typedef bool (*callback_function)(Object *o);
public void editAll(callback_function func) {
synchronized (lock) {
for each element o { if (callback_function(o)) {remove o} } }
}
So then your loop becomes:
bool my_function(Object *o) {
...
if (some condition) return true;
}
...
editAll(my_function);
...
The company I work for (corensic) has test cases extracted from real bugs to verify that Jinx is finding the concurrency errors properly. This type of low level data structure locking without higher level synchronization is pretty common pattern. The tree editing callback seems to be a popular fix for this race condition.
I think everyone is missing his real problem. When iterating over the new array of Object's and trying to remove one at a time the problem is still technically unsafe (though ArrayList implantation would not explode, it just wouldnt have expected results).
Even with CopyOnWriteArrayList there is the possibility that there is an out of date read on the current list to when you are trying to remove.
The two suggestions you offered are fine (A and B). My general suggestion is B. Making a collection thread-safe is very difficult. A good way to do it is to give the client as little functionality as possible (within reason). So offering the removeAll method and removing the getAll method would suffice.
Now you can at the same time say, 'well I want to keep the API the way it is and let the client worry about additional thread-safety'. If thats the case, document thread-safety. Document the fact that a 'lookup and modify' action is both non atomic and non thread-safe.
Today's concurrent list implementations are all thread safe for the single functions that are offered (get, remove add are all thread safe). Compound functions are not though and the best that could be done is documenting how to make them thread safe.
I think j.u.c.CopyOnWriteArrayList is a good example of similar problem you're trying to solve.
JDK had a similar problem with Lists - there were various ways to synchronize on arbitrary methods, but no synchronization on multiple invocations (and that's understandable).
So CopyOnWriteArrayList actually implements the same interface but has a very special contract, and whoever calls it, is aware of it.
Similar with your solution - you should probably implement List (or whatever interface this is) and at the same time define special contracts for existing/new methods. For example, getAll's consistency is not guaranteed, and calls to .remove do not fail if o is null, or isn't inside the list, etc. If users want both combined and safe/consistent options - this class of yours would provide a special method that does exactly that (e.g. safeDeleteAll), leaving other methods close to original contract as possible.
So to answer your question - I would pick option B, but would also implement interface your original object is implementing.
From the Javadoc for List.toArray():
The returned array will be "safe" in
that no references to it are
maintained by this list. (In other
words, this method must allocate a new
array even if this list is backed by
an array). The caller is thus free to
modify the returned array.
Maybe I don't understand what you're trying to accomplish. Do you want the Object[] array to always be in-sync with the current state of the List? In order to achieve that, I think you would have to synchronize on the Example instance itself and hold the lock until your thread is done with its method call AND any Object[] array it is currently using. Otherwise, how will you ever know if the original List has been modified by another thread?
You have to use the appropriate granularity when you choose what to lock. What you're complaining about in your example is too low a level of granularity, where the lock doesn't cover all the methods that have to happen together. You need to make methods that combine all the actions that need to happen together within the same lock.
Locks are reentrant so the high-level method can call low-level synchronized methods without a problem.

What object should I lock on when I am passing a Collection<Foo> into a separate class?

Please refer to UML
The Connection class's constructor initializes its foos member via
foos = Collections.synchronizedList( new ArrayList<Foo>(10) );
When Connection#start() is invoked, it creates an instance of Poller (while passing the foos reference into Poller's constructor) & Poller is started (Poller is a Runnable).
Question: The Poller thread will add to & remove objects from the list based on external events. Periodically clients will invoke Connection#snapshot() to retrieve the list. Since the implementation within Poller will perform a check to avoid duplicates during additions, it is not thread safe.
e.g. implemention of Poller#run
if( _foos.indexOf( newFoo ) == -1 )
{
_foos.add( newFoo );
}
What can I synchronize on in Connection as well as Poller to order to be thread safe?
I'd take a look at CopyOnWriteArrayList as a replacement for the ArrayList in the example above. That way you won't need to synchronize on anything since you have a thread safe collection out of the box, so to speak...
http://java.sun.com/j2se/1.5.0/docs/api/java/util/concurrent/CopyOnWriteArrayList.html
From the API CopyOnWriteArrayList is...
A thread-safe variant of ArrayList in
which all mutative operations (add,
set, and so on) are implemented by
making a fresh copy of the underlying
array.
n.b. This is only a viable solution if the number of traversals outweigh the number of additions/updates to the collection. Is this the case?
There is a clean solution using interfaces and anonymous inner classes.
In the Connection class add the following:
public static interface FooWorker {
void onFoos(List<Foo> list);
}
public synchronized void withFoosSafely(FooWorker worker) {
worker.onFoos(foos);
}
In the Poller class do the following:
public void doWork() {
connection.withFoosSafely(new FooWorker() {
public void onFoos(List<Foo> list) {
/// add, remove and change the list as you see fit
/// everything inside this method is thread-safe
}
});
}
It requires a bit of additional code (no closures yet in Java) but it guarantees thread safety and also makes sure clients don't need to take care of locking - less potential bugs in the future.
You might return a new List from snapshot():
public List<Foo> snapshot() {
return new ArrayList<Foo>(foos);
}
Given that you're returning a "snapshot", it seems OK to me that the list is guaranteed to be up-to-date only at the moment it gets returned.
If you're expecting clients to add/remove members from foos, then you'd probably need to expose those operations as methods on Connection.
Perhaps I am not getting the point, it seems Connection#snapshot should be synchronized on this (or on _foos) and so does the code block of Poller that manages Connection._foos.
What am I missing?

Categories