how to make a 'modified' ArrayList thread safe in java? - java

I am implementing a distributed mutex, and I need to keep track of all the requests that have been made.
I've a message class which is comparable and I've a modified ArrayList
requestList = new ArrayList<Message>() {
public synchronized boolean add(Message msg) {
boolean ret = super.add(msg);
Collections.sort(requestList);
return ret;
}
I suspect that this requestList is getting modified by two threads and I'm seeing elements at the top of the list which should not be there. how do I make this requestList thread safe?
would doing as follows work?
requestList = Collections.synchronizedList(new ArrayList<Message>() {
public synchronized boolean add(Message msg) {
boolean ret = super.add(msg);
Collections.sort(requestList);
return ret;
});
also how does Collection.synchronizedList work? by sort of putting a 'synchronized' for all the methods of the ArrayList?

would doing as follows work?
No.
Your custom add method is synchronizing on a different object to the requestList object is using to synchronize. Hence, there won't be mutual exclusion.
There is a Collections.synchronizedList method overload that takes a mutex object as an extra parameter. Unfortunately, it is declared as package private, so you won't be able to use it.
so, how do I have a list which IS thread safe, and maintains the sorted order on each add?
There is no simple solution, but a couple of approaches that would work are:
Change your code so that the requestList object is private. Implement all access to the list via wrapper methods ... and declare them as synchronized.
Write your own custom synchronized list wrapper that can synchronizes on a mutex that is provided to it. Then instantiate the wrapper with your this object as the mutex; i.e. the object that your add method is synchronizing on.
Assuming that your add method is actually part of a "sorted list" wrapper class, the first option is the best one.
The suggestion of using a PriorityBlockingQueue as an alternative to a List is a good one, though you do lose some of the methods that you get in the List API. (For example, you can't do index-based operations on a Queue ...)

Related

Is my local variable a reference to my external collection and does locking lock the original?

I have two classes, one of which contains a list which should always be synchronised but for which I want to use an accessor.
class MyObject {
private List<Thing> things = Collections.synchronizedList(new ArrayList<>());
public List<Thing> getThings() { return things; }
}
In another class (an Android activity) I want to safely iterate over the collection
class MyActivity {
private void someMethod() {
List<Thing> localThings = myObjectInstance.getThings();
//we have to manually synchronise when iterating synchronizedList
synchronized(localThings) {
for (Thing thing : localThings) {
//do something to a thing
}
}
}
}
My question is, what is happening in my block of code above? is my localThings variable a pointer to the same collection as MyObject::things or have I created a new collection (in which case there's no need for me to synchronise). If it's the former am I keeping it threadsafe by doing this (is the original collection locked)?
My question is, what is happening in my block of code above? is my
localThings variable a pointer to the same collection as
MyObject::things
Yes. You return things, where things is a reference. The caller receives a copy of the reference, which necessarily refers to the same object.
or have I created a new collection (in which case
there's no need for me to synchronise).
No, but you could create and return a new List:
return new ArrayList<Thing>(things);
However, you probably need to synchronize on things to safely initialize the copy that way. If you do not need to be able to change the size of the copy then you could get around the need for external synchronization like this:
return Arrays.asList(things.toArray(new Thing[0]));
If it's the former am I
keeping it threadsafe by doing this (is the original collection
locked)?
Given that the collection referenced by localThings is a synchronized list wrapper provided by Collections.synchronizedList() and that you are iterating over it inside a block synchronized on that object, no other thread will be able to invoke any of that list's methods until control leaves the synchronized block. This level of mutual exclusion is necessary if it would otherwise be possible for a different thread to add elements to or remove elements from the list while the iteration is proceeding, and it is sufficient to protect against that.

What happens when we pass hashtable inside Collections.synchronizedMap()

Today I have asked a question in my interview.
The question is that Collections.synchronizedMap() is
used to synchronize the map which are by default not thread safe like hashmap.
His question is but we can pass any kind of map inside this method.
So what is the effect when we pass a hashtable inside this method because hashtable is by default synchronized.
The behavior of the map will be the same, but the performance will be affected, because each method will acquire two synchronization locks instead of one.
For example, consider calling the method size() on the resulting map. The implementation in the Collections.SynchronizedMap class looks like this:
public int size() {
synchronized(mutex) {return m.size();} // first lock
}
... where, m.size() calls the implementation in Hashtable:
public synchronized int size() { // second lock
return count;
}
The first lock object is the mutex field in SynchronizedMap. The second lock is implicit - the Hashtable instance itself.
You would have two synchronization levels: one at the level of the synchronized map itself, implemented by a mutex object, and one at the level of the wrapped instance:
public boolean isEmpty() {
// first level synchronization
synchronized(mutex) {
// second level synchronization if c is a Hashtable
return c.isEmpty();
}
}
The additional synchronization is not needed and can lead to lower performance.
Another effect is that you won't be able to use API from Hashtable like Hashtable#elements since the wrapped collection is now strictly a Map instance.
It will get wrapped into a SynchronizedMap, from java.util.Collections:
public static <K,V> Map<K,V> synchronizedMap(Map<K,V> m) {
return new SynchronizedMap<>(m);
}
The synchronizedMap() method does not distinguish between the types of Maps passed into it.
"His question is but we can pass any kind of map inside this method."
The answer is yes, because the constructor of SynchronizedMap accepts every Map in it's signature.
"So what is the effect when we pass a hashtable inside this method because hashtable is by default synchronized"
The answer is: We are showing ignorance to the ConcurrentHashMap which is most likely the tool to be uses instead of a blocking implementation.
If you see the code in the SynchronizedCollection. The methods will delegate the call to the underlying collection but adding synchronized block on top of the call something like this
public int size() {
synchronized (mutex) {return c.size();}
}
The implementation of the size looks like this in HashTable class
public synchronized int size() {
return count;
}
So, if you pass in HashTable to SynchronizedCollection, the thread accessing to SynchronizedCollection will have to take the locks at 2 levels once for synchronized block and another for synchronized method.
If there are other threads using the HashTable object directly, they can block the threads using the SynchronizedCollection even when the thread got the lock on SynchronizedCollection.

intrinsic vs mutex lock

Looking at the implementation of Collection's SynchronizedList i recognized that all access on the internal list is synchronized by locking on the wrapper's final member "mutex". As all access is synchronized on the same object, why dont we ommit the extra mutex object and just synchronize on the list itself? Is it only because of the fact that anybody else could synchronize on that list and we might get deadlocks?
I am asking, because i consider to implement a container-class, that holds two lists.
The container offers e.g. .addToL1(...) and .addToL2(...). In this case the inner lists are not accessible, so it should be sufficient to synchronize on the lists intrinsically, correct?
The most robust solution is to lock on an object that the caller has no way of getting access to. (We will ignore reflection and Unsafe for a moment) The JDK developers have to consider the worst thing any developer could do with the library because someone will do that and things they couldn't have thought of.
However, sometimes simplicity is the most important driver esp if you know who will be using it and understand its limitations.
In this case it is specifically done because sublists created from the list have to synchronize on the parent object
public List<E> subList(int fromIndex, int toIndex) {
synchronized(mutex) {
return new SynchronizedList<E>(list.subList(fromIndex, toIndex),
mutex);
}
}
If you create your synchronized list by Collections.synchronizedList( list );, it will set the mutex to this (that is, the synchronized list object itself).
But you can also call Collections.synchronizedList() with two parameters, the second of which will then be used as the mutex.
And as in general it isn't a good idea to synchronize on publicly visible objects, I prefer to always use the 2 parameter version and hide the mutex object from clients of the code.

Non-thread-safe Attempt to Implement Put-if-absent?

There is one code snippet in the 4th chapter in Java Concurrency in Practice
public class ListHelper<E> {
public List<E> list =
Collections.synchronizedList(new ArrayList<E>());
...
public synchronized boolean putIfAbsent(E x) {
boolean absent = !list.contains(x);
if (absent)
list.add(x);
return absent;
}
}
it says this is thread safe for using different locks, putIfAbsent is not atomic relative to other operations on the List.
But I think "synchronized" preventing multithreads enter putIfAbsent, if there are other methods that do other operations on the List, key word synchronized should also be as the method atttribute. So following this way, should it be thread safe? Under what case "it is not atomic"?
putIfAbsent is not atomic relative to other operations on the List. But I think "synchronized" preventing multithreads enter putIfAbsent
This is true but there is no guarantees that there are other ways threads are accessing the list. The list field is public (which is always a bad idea) which means that other threads can call methods on the list directly. To properly protect the list, you should make it private and add add(...) and other methods to your ListHelper that are also synchronized to fully control all access to the synchronized-list.
// we are synchronizing the list so no reason to use Collections.synchronizedList
private List<E> list = new ArrayList<E>();
...
public synchronized boolean add(E e) {
return list.add(e);
}
If the list is private and all of the methods are synchronized that access the list then you can remove the Collections.synchronizedList(...) since you are synchronizing it yourself.
if there are other methods that do other operations on the List, key word synchronized should also be as the method atttribute. So following this way, should it be thread safe?
Not sure I fully parse this part of the question. But if you make the list be private and you add other methods to access the list that are all synchronized then you are correct.
Under what case "it is not atomic"?
putIfAbsent(...) is not atomic because there are multiple calls to the synchronized-list. If multiple threads are operating on the list then another thread could have called list.add(...) between the time putIfAbsent(...) called list.contains(x) and then calls list.add(x). The Collections.synchronizedList(...) protects the list against corruption by multiple threads but it cannot protect against race-conditions when there are multiple calls to list methods that could interleave with calls from other threads.
Any unsynchronized method that modifies the list may introduce the absent element after list.contains() returns false, but before the element has been added.
Picture this as two threads:
boolean absent = !list.contains(x); // Returns true
-> list.add(theSameElementAsX); // Another thread
if(absent) // absent is true, but the list has been modified!
list.add(x);
return absent;
This could be accomplished with simply a method as follows:
public void add(E e) {
list.add(e);
}
If the method were synchronized, there would be no problem, since the add method wouldn't be able to run before putIfAbsent() was fully finished.
A proper correction would include making the List private, and making sure that compound operations on it are properly synchronized (i.e. on the class or the list itself).
Thread safety is not composable! Imagine a program built entirely out of "thread safe" classes. Is the program itself "thread safe?" Not necessarily. It depends on what the program does with those classes.
The synchronizedList wrapper makes each individual method of a List "thread safe". What does that mean? It means that none of those wrapped methods can corrupt the internal structure of the list when called in a multi-threaded environment.
That doesn't protect the way in which any given program uses the list. In the example code, the list appears to be used as an implementation of a set: The program doesn't allow the same object to appear in the list more than one time. There's nothing in the synchronizedList wrapper that will enforce that particular guarantee though, because that guarantee has nothing to do with the internal structure of the list. The list can be perfectly valid as a list, but not valid as a set.
That's why the additional synchronization on the putIfAbsent() method.
Collections.synchronizedList() creates a collection which adds synchronization on private mutex for every single method of it. This mutex is list this for one-argument factory used in example, or can be provided when two-argument factory is used. That's why we need an external lock to make subsequent contains() and add() calls atomic.
In case the list is available directly, not via ListHelper, this code is broken, because access will be guarded by different locks in that case. To prevent that, it is possible to make list private to prevent direct access, and wrap all neccesary API with synchronization on the same mutex declared in ListHelper or on the this of ListHelper itself.

What object should I lock on when I am passing a Collection<Foo> into a separate class?

Please refer to UML
The Connection class's constructor initializes its foos member via
foos = Collections.synchronizedList( new ArrayList<Foo>(10) );
When Connection#start() is invoked, it creates an instance of Poller (while passing the foos reference into Poller's constructor) & Poller is started (Poller is a Runnable).
Question: The Poller thread will add to & remove objects from the list based on external events. Periodically clients will invoke Connection#snapshot() to retrieve the list. Since the implementation within Poller will perform a check to avoid duplicates during additions, it is not thread safe.
e.g. implemention of Poller#run
if( _foos.indexOf( newFoo ) == -1 )
{
_foos.add( newFoo );
}
What can I synchronize on in Connection as well as Poller to order to be thread safe?
I'd take a look at CopyOnWriteArrayList as a replacement for the ArrayList in the example above. That way you won't need to synchronize on anything since you have a thread safe collection out of the box, so to speak...
http://java.sun.com/j2se/1.5.0/docs/api/java/util/concurrent/CopyOnWriteArrayList.html
From the API CopyOnWriteArrayList is...
A thread-safe variant of ArrayList in
which all mutative operations (add,
set, and so on) are implemented by
making a fresh copy of the underlying
array.
n.b. This is only a viable solution if the number of traversals outweigh the number of additions/updates to the collection. Is this the case?
There is a clean solution using interfaces and anonymous inner classes.
In the Connection class add the following:
public static interface FooWorker {
void onFoos(List<Foo> list);
}
public synchronized void withFoosSafely(FooWorker worker) {
worker.onFoos(foos);
}
In the Poller class do the following:
public void doWork() {
connection.withFoosSafely(new FooWorker() {
public void onFoos(List<Foo> list) {
/// add, remove and change the list as you see fit
/// everything inside this method is thread-safe
}
});
}
It requires a bit of additional code (no closures yet in Java) but it guarantees thread safety and also makes sure clients don't need to take care of locking - less potential bugs in the future.
You might return a new List from snapshot():
public List<Foo> snapshot() {
return new ArrayList<Foo>(foos);
}
Given that you're returning a "snapshot", it seems OK to me that the list is guaranteed to be up-to-date only at the moment it gets returned.
If you're expecting clients to add/remove members from foos, then you'd probably need to expose those operations as methods on Connection.
Perhaps I am not getting the point, it seems Connection#snapshot should be synchronized on this (or on _foos) and so does the code block of Poller that manages Connection._foos.
What am I missing?

Categories