Java synchronized ( ArrayList ) or Vector?

Java synchronized ( ArrayList ) or Vector? - java

I know that ArrayList is not synchronized, but I can access it in a synchronized way as :
synchronized( arraylist )
{
arraylist.add( "new item" );
}
or I can use a vector instead - which I see from every blog that it should be avoided.
Please let me know your thoughts

There is a concurrent List implementation: CopyOnWriteArrayList that supports concurrency and is better than the options described above.
Still, I would recommend using another collection like a concurrent Queue through BlockingQueue and implemented by LinkedBlockingQueue. I would ask you to provide more info on your problem to get more accurate help.

Yes, you can access a List and make it synchronized by using blocks as you have described. Note that you must also synchronize when reading from the list in order to be totally safe.
An alternative (and IMO better) approach is to use one of the following:
Alternative 1: Collections.synchronizedList:
List<SomeType> sList = Collections.synchronizedList(arrayList);
sList.add(...); // synchronized, no synchronized-block needed
The returned list will be synchronized for updates but iterations must still be in a synchronized block:
// Iterating...
synchronized (sList) {
for (SomeType s : sList) {
// do stuff
}
}
You can find the JavaDocs here
Alternative 2: CopyOnWriteArrayList:
You can find the JavaDocs here
A thread-safe variant of ArrayList in which all mutative operations (add, set, and so on) are implemented by making a fresh copy of the underlying array.
The second alternative is obviously more memory-consuming if you perform a lot of writes.

Related

Understanding collections concurrency and Collections.synchronized*

I learned yesterday that I've been incorrectly using collections with concurrency for many, many years.
Whenever I create a collection that needs to be accessed by more than one thread I wrap it in one of the Collections.synchronized* methods. Then, whenever mutating the collection I also wrap it in a synchronized block (I don't know why I was doing this, I must have thought I read it somewhere).
However, after reading the API more closely, it seems you need the synchronized block when iterating the collection. From the API docs (for Map):
It is imperative that the user manually synchronize on the returned map when iterating over any of its collection views:
And here's a small example:
List<O> list = Collections.synchronizedList(new ArrayList<O>());
...
synchronized(list) {
for(O o: list) { ... }
}
So, given this, I have two questions:
Why is this even necessary? The only explanation I can think of is they're using a default iterator instead of a managed thread-safe iterator, but they could have created a thread-safe iterator and fixed this mess, right?
More importantly, what is this accomplishing? By putting the iteration in a synchronized block you are preventing multiple threads from iterating at the same time. But another thread could mutate the list while iterating so how does the synchronized block help there? Wouldn't mutating the list somewhere else screw with the iteration whether it's synchronized or not? What am I missing?
Thanks for the help!

Why is this even necessary? The only explanation I can think of is
they're using a default iterator instead of a managed thread-safe
iterator, but they could have created a thread-safe iterator and fixed
this mess, right?
Iterating works with one element at a time. For the Iterator to be thread-safe, they'd need to make a copy of the collection. Failing that, any changes to the underlying Collection would affect how you iterate with unpredictable or undefined results.
More importantly, what is this accomplishing? By putting the iteration
in a synchronized block you are preventing multiple threads from
iterating at the same time. But another thread could mutate the list
while iterating so how does the synchronized block help there?
Wouldn't mutating the list somewhere else screw with the iteration
whether it's synchronized or not? What am I missing?
The methods of the object returned by synchronizedList(List) work by synchronizing on the instance. So no other thread could be adding/removing from the same List while you are inside a synchronized block on the List.

The basic case
All of the methods of the object returned by Collections.synchronizedList() are synchronized to the list object itself. Whenever a method is called from one thread, every other thread calling any method of it is blocked until the first call finishes.
So far so good.
Iterare necesse est
But that doesn't stop another thread from modifying the collection when you're between calls to next() on its Iterator. And if that happens, your code will fail with a ConcurrentModificationException. But if you do the iteration in a synchronized block too, and you synchronize on the same object (i.e. the list), this will stop other threads from calling any mutator methods on the list, they have to wait until your iterating thread releases the monitor for the list object. The key is that the mutator methods are synchronized to the same object as your iterator block, this is what's stopping them.
We're not out of the woods yet...
Note though that while the above guarantees basic integrity, it doesn't guarantee correct behaviour at all times. You might have other parts of your code that make assumptions which don't hold up in a multi-threaded environment:
List<Object> list = Collections.synchronizedList( ... );
...
if (!list.contains( "foo" )) {
// there's nothing stopping another thread from adding "foo" here itself, resulting in two copies existing in the list
list.add( "foo" );
}
...
synchronized( list ) { //this block guarantees that "foo" will only be added once
if (!list.contains( "foo" )) {
list.add( "foo" );
}
}
Thread-safe Iterator?
As for the question about a thread-safe iterator, there is indeed a list implementation with it, it's called CopyOnWriteArrayList. It is incredibly useful but as indicated in the API doc, it is limited to a handful of use cases only, specifically when your list is only modified very rarely but iterated over so frequently (and by so many threads) that synchronizing iterations would cause a serious bottle-neck. If you use it inappropriately, it can vastly degrade the performance of your application, as each and every modification of the list creates an entire new copy.

Synchronizing on the returned list is necessary, because internal operations synchronize on a mutex, and that mutex is this, i.e. the synchronized collection itself.
Here's some relevant code from Collections, constructors for SynchronizedCollection, the root of the synchronized collection hierarchy.
SynchronizedCollection(Collection<E> c) {
if (c==null)
throw new NullPointerException();
this.c = c;
mutex = this;
}
(There is another constructor that takes a mutex, used to initialize synchronized "view" collections from methods such as subList.)
If you synchronize on the synchronized list itself, then that does prevent another thread from mutating the list while you're iterating over it.
The imperative that you synchronize of the synchronized collection itself exists because if you synchronize on anything else, then what you have imagined could happen - another thread mutating the collection while you're iterating over it, because the objects locked are different.

Sotirios Delimanolis answered your second question "What is this accomplishing?" effectively. I wanted to amplify his answer to your first question:
Why is this even necessary? The only explanation I can think of is they're using a default iterator instead of a managed thread-safe iterator, but they could have created a thread-safe iterator and fixed this mess, right?
There are several ways to approach making a "thread-safe" iterator. As is typical with software systems, there are multiple possibilities, and they offer different tradeoffs in terms of performance (liveness) and consistency. Off the top of my head I see three possibilities.
1. Lockout + Fail-fast
This is what's suggested by the API docs. If you lock the synchronized wrapper object while iterating it (and the rest of the code in the system written correctly, so that mutation method calls also all go through the synchronized wrapper object), the iteration is guaranteed to see a consistent view of the contents of the collection. Each element will be traversed exactly once. The downside, of course, is that other threads are prevented from modifying or even reading the collection while it's being iterated.
A variation of this would use a reader-writer lock to allow reads but not writes during iteration. However, the iteration itself can mutate the collection, so this would spoil consistency for readers. You'd have to write your own wrapper to do this.
The fail-fast comes into play if the lock isn't taken around the iteration and somebody else modifies the collection, or if the lock is taken and somebody violates the locking policy. In this case if the iteration detects that the collection has been mutated out from under it, it throws ConcurrentModificationException.
2. Copy-on-write
This is the strategy employed by CopyOnWriteArrayList among others. An iterator on such a collection does not require locking, it will always show consistent results during iterator, and it will never throw ConcurrentModificationException. However, writes will always copy the entire array, which can be expensive. Perhaps more importantly, the notion of consistency is altered. The contents of the collection might have changed while you were iterating it -- more precisely, while you were iterating a snapshot of its state some time in the past -- so any decisions you might make now are potentially out of date.
3. Weakly Consistent
This strategy is employed by ConcurrentLinkedDeque and similar collections. The specification contains the definition of weakly consistent. This approach also doesn't require any locking, and iteration will never throw ConcurrentModificationException. But the consistency properties are extremely weak. For example, you might attempt to copy the contents of a ConcurrentLinkedDeque by iterating over it and adding each element encountered to a newly created List. But other threads might be modifying the deque while you're iterating it. In particular, if a thread removes an element "behind" where you've already iterated, and then adds an element "ahead" of where you're iterating, the iteration will probably observe both the removed element and the added element. The copy will thus have a "snapshot" that never actually existed at any point in time. Ya gotta admit that's a pretty weak notion of consistency.
The bottom line is that there's no simple notion of making an iterator thread safe that would "fix this mess". There are several different ways -- possibly more than I've explained here -- and they all involve differing tradeoffs. It's unlikely that any one policy will "do the right thing" in all circumstances for all programs.

Is collection synchronizing (via Collections.synchronizedX) necessary when access methods are synchronized?

There is a lot of topics when synchronization in Java appears. In many of them is recommended to using invokation of Collections.synchronized{Collecation, List, Map, Set, SortedMap, SortedSet} instead of Collection, List, etc. in case of multithreading work to thread-safe access.
Lets imagine situation when some threads exist and all of them need to access collection via methods that have synchronized block in their bodies.
So then, is it necessary to use:
Collection collection = Collections.synchronizedCollection(new ArrayList<T>());
or only
Collection collection = new ArrayList<String>();
need to?
Maybe you can show me an example when second attempt instead of first will cause evidently incorrect behaviour?

To the contrary, Collections.synchronizedCollection() is generally not sufficient because many operations (like iterating, check then add, etc.) need additional, explicit synchronization.
If every access to the collection is already done through properly synchronized methods, then wrapping the collection again into a syncronized proxy is useless.

No, if your access methods are synchronized there is no need to also use a synchronized collection.
Collection collection = new ArrayList<String>();
will do just fine in that scenario.

If you have already arranged for proper synchronization of your code, you definitely do not need another layer of synchronization on the lower level of granularity.
Just make sure when you say
all of them need to access collection via methods that have synchronized block in their bodies.
that all these blocks use the same lock. It is not enough to just involve some synchronized block.

Thread-safe iteration over a collection

We all know when using Collections.synchronizedXXX (e.g. synchronizedSet()) we get a synchronized "view" of the underlying collection.
However, the document of these wrapper generation methods states that we have to explicitly synchronize on the collection when iterating of the collections using an iterator.
Which option do you choose to solve this problem?
I can only see the following approaches:
Do it as the documentation states: synchronize on the collection
Clone the collection before calling iterator()
Use a collection which iterator is thread-safe (I am only aware of CopyOnWriteArrayList/Set)
And as a bonus question: when using a synchronized view - is the use of foreach/Iterable thread-safe?

You've already answered your bonus question really: no, using an enhanced for loop isn't safe - because it uses an iterator.
As for which is the most appropriate approach - it really depends on how your context:
Are writes very infrequent? If so, CopyOnWriteArrayList may be most appropriate.
Is the collection reasonably small, and the iteration quick? (i.e. you're not doing much work in the loop) If so, synchronizing may well be fine - especially if this doesn't happen too often (i.e. you won't have much contention for the collection).
If you're doing a lot of work and don't want to block other threads working at the same time, the hit of cloning the collection may well be acceptable.

Depends on your access model. If you have low concurrency and frequent writes, 1 will have the best performance. If you have high concurrency with and infrequent writes, 3 will have the best performance. Option 2 is going to perform badly in almost all cases.
foreach calls iterator(), so exactly the same things apply.

You could use one of the newer collections added in Java 5.0 which support concurrent access while iterating. Another approach is to take a copy using toArray which is thread safe (during the copy).
Collection<String> words = ...
// enhanced for loop over an array.
for(String word: words.toArray(new String[0])) {
}

I might be totally off with your requirements, but if you are not aware of them, check out google-collections with "Favor immutability" in mind.

I suggest dropping Collections.synchronizedXXX and handle all locking uniformly in the client code. The basic collections don't support the sort of compound operations useful in threaded code, and even if you use java.util.concurrent.* the code is more difficult. I suggest keeping as much code as possible thread-agnostic. Keep difficult and error-prone thread-safe (if we are very lucky) code to a minimum.

All three of your options will work. Choosing the right one for your situation will depend on what your situation is.
CopyOnWriteArrayList will work if you want a list implementation and you don't mind the underlying storage being copied every time you write. This is pretty good for performance as long as you don't have very big collections.
ConcurrentHashMap or "ConcurrentHashSet" (using Collections.newSetFromMap) will work if you need a Map or Set interface, obviously you don't get random access this way. One great! thing about these two is that they will work well with large data sets - when mutated they just copy little bits of the underlying data storage.

It does depend on the result one needs to achieve cloning/copying/toArray(), new ArrayList(..) and the likes obtain a snapshot and does not lock the the collection.
Using synchronized(collection) and iteration through ensure by the end of the iteration would be no modification, i.e. effectively locking it.
side note:(toArray() is usually preferred with some exceptions when internally it needs to create a temporary ArrayList). Also please note, anything but toArray() should be wrapped in synchronized(collection) as well, provided using Collections.synchronizedXXX.

This Question is rather old (sorry, i am a bit late..) but i still want to add my Answer.
I would choose your second choice (i.e. Clone the collection before calling iterator()) but with a major twist.
Asuming, you want to iterate using iterator, you do not have to coppy the Collection before calling .iterator() and sort of negating (i am using the term "negating" loosely) the idea of the iterator pattern, but you could write a "ThreadSafeIterator".
It would work on the same premise, coppying the Collection, but without letting the iterating class know, that you did just that. Such an Iterator might look like this:
class ThreadSafeIterator<T> implements Iterator<T> {
private final Queue<T> clients;
private T currentElement;
private final Collection<T> source;
AsynchronousIterator(final Collection<T> collection) {
clients = new LinkedList<>(collection);
this.source = collection;
}
#Override
public boolean hasNext() {
return clients.peek() != null;
}
#Override
public T next() {
currentElement = clients.poll();
return currentElement;
}
#Override
public void remove() {
synchronized(source) {
source.remove(currentElement);
}
}
}
Taking this a Step furhter, you might use the Semaphore Class to ensure thread-safety or something. But take the remove method with a grain of salt.
The point is, by using such an Iterator, no one, neither the iterating nor the iterated Class (is that a real word) has to worrie about Thread safety.

How to clone a synchronized Collection?

Imagine a synchronized Collection:
Set s = Collections.synchronizedSet(new HashSet())
What's the best approach to clone this Collection?
It's prefered that the cloning doesn't need any synchronization on the original Collection but required that iterating over the cloned Collection does not need any synchronization on the original Collection.

Use a copy-constructor inside a synchronized block:
synchronized (s) {
Set newSet = new HashSet(s); //preferably use generics
}
If you need the copy to be synchronized as well, then use Collections.synchronizedSet(..) again.
As per Peter's comment - you'll need to do this in a synchronized block on the original set. The documentation of synchronizedSet is explicit about this:
It is imperative that the user manually synchronize on the returned set when iterating over it

When using synchronized sets, do understand that you will incur synchronization overhead accessing every element in the set. The Collections.synchronizedSet() merely wraps your set with a shell that forces every method to be synchronized. Probably not what you really intended. A ConcurrentSkipListSet will give you better performance in a multithreaded environment where multiple threads will be writing to the set.
The ConcurrentSkipListSet will allow you to perform the following:
Set newSet = s.clone();//preferably use generics
It's not uncommon to use a clone of a set for snapshot processing. If that's what you are after, you might add a little code to handle the case where the item is already processed. The overhead involved with the occasional object included in more than one copy set is usually less than the consistent overhead of using Collections.concurrentSet().
EDIT: I just noticed that ConcurrentSkipListSet is Cloneable and provides a threadsafe clone() method. I changed my answer because I really believe this is the best option--instead of losing scalability and performance to Collections.concurrentSet().

You can avoid synchronizing the set by doing the following which avoids exposing an Iterator on the original set.
Set newSet = new HashSet(Arrays.asList(s.toArray()));
EDIT From Collections.SynchronizedCollection
public Object[] toArray() {
synchronized(mutex) {return c.toArray();}
}
As you can see, the lock is held for the entire time the operation is performed. As such a safe copy of the data is taken. It doesn't matter if an Iterator is used internally. The array returned can be used in a thread safe manner as only the local thread has a reference to it.
NOTE: If you want to avoid these issues I suggest you use a Set from the concurrency library added in Java 5.0 in 2004. I also suggest you use generics as this can make your collections more type safe.

Does a synchronized block prevent other thread access to object?

If I do something to a list inside a synchronized block, does it prevent other threads from accessing that list elsewhere?
List<String> myList = new ArrayList<String>();
synchronized {
mylist.add("Hello");
}
Does this prevent other threads from iterating over myList and removing/adding values?
I'm looking to add/remove values from a list, but at the same time protect it from other threads/methods from iterating over it (as the values in the list might be invalidated)

No, it does not.
The synchronized block only prevents other threads from entering the block (more accurately, it prevents other threads from entering all blocks synchronized on the same object instance - in this case blocks synchronized on this).
You need to use the instance you want to protect in the synchronized block:
synchronized(myList) {
mylist.add("Hello");
}
The whole area is quite well explained in the Java tutorial:
http://download.oracle.com/javase/tutorial/essential/concurrency/syncmeth.html

Yes, but only if all other accesses to myList are protected by synchronized blocks on the same object. The code sample you posted is missing an object on which you synchronize (i.e., the object whose mutex lock you acquire). If you synchronize on different objects or fail to synchronize at all in one instance, then other threads may very well access the list concurrently. Therefore, you must ensure that all threads have to enter a synchronized block on the same object (e.g., using synchronized (myList) { ... } consistently) before accessing the list. In fact, there is already a factory method that will wrap each method of your list with synchronized methods for you: Collections.synchronizedList.
However, you can certainly use Collections.synchronizedList to wrap your list so that all of its methods are individually synchronized, but that doesn't necessarily mean that your application's invariants are maintained. Individually marking each method of the list as synchronized will ensure that the list's internal state remains consistent, but your application may wish for more, in which case you will need to write some more complex synchronization logic or see if you can take advantage of the Concurrency API (highly recommended).

here the sychronized makes sure that only one thread is adding Hello to the myList at a time...
to be more specific about synchronizing wrt objects yu can use
synchronized( myList ) //object name
{
//other code
}
vinod

From my limited understanding of concurrency control in Java I would say that it is unlikely that the code above would present the behaviour you are looking for.
The synchronised block would use the lock of whatever object you are calling said code in, which would in no way stop any other code from accessing that list unless said other code was also synchronised using the same lock object.
I have no idea if this would work, or if its in any way advised, but I think that:
List myList = new ArrayList();
synchronized(myList) {
mylist.add("Hello");
}
would give the behaviour you describe, by synchronizing on the lock object of the list itself.
However, the Java documentation recommends this way to get a synchronized list:
List list = Collections.synchronizedList(new ArrayList(...));
See: http://download.oracle.com/javase/1.4.2/docs/api/java/util/ArrayList.html

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.