How to avoid ConcurrentModificationException in multi-threaded code - java

Whenever we use java.util Collection classes, we have that if one thread changes a collection while another thread is traversing through it using an iterator, then any call to iterator.hasNext() or iterator.next() will throw ConcurrentModificationException. Even the synchronized collection wrapper classes SynchronizedMap and SynchronizedList are only conditionally thread-safe, which means all individual operations are thread-safe but compound operations where flow of control depends on the results of previous operations may be subject to threading issues. The question is: How to avoid this problem without affecting the performance. Note: I am aware of CopyOnWriteArrayList.

You can use CopyOnWriteArrayList or ConcurrentHashMap etc. as you mentioned above or you can use Atomic* classes which are working with CAS.
If you weren't aware of Atomic* classes they definitely worth a look! You may check out this question.
So to answer your question you have to choose the right tools for the task. Since you do not share the context with us I can just guess. In some situations CAS will perform better in others the concurrent Collections will.
If something isn't clear you can always check out the official Oracle Trails: Lesson: Concurrency

I think you raised an interesting question.
I tried thinking whether ConcurrentHashMap for example, as was suggested by others can help, but I'm not sure as the lock is segment-based.
What I would do in this case , and I do hope I understood your question well, is to lock access to your collection, using a ReaderWriterLock.
The reason I chose this lock is because I do feel this needs locking (as you explained - iteration is composed from several operations) ,
And because in case of reader threads, I do not want them to wait on lock , if no writer thread is working on the collection.
Thanks to #Adam Arold I paid attention that you suggested the "synchronized decorator" - but I feel this decorator is "too strong" for your needs, as it uses a synchronized and will not diffrentiate between cases of N readers and M writers.

This is because the "standard" Java collections are not thread safe as they are not synchronized. When working with multiple threads accessing your collections, you should look at the java.util.concurrent packages.
Without this package, before Java 5, one had to perform a manual synchronization :
synchronized(list) {
Iterator i = list.iterator(); // Must be in synchronized block
while (i.hasNext())
foo(i.next());
}
or using
Collections.synchronizedList(arrayList);
but neither could really offer a complete thread safety feature.
With this package, all access to the collection is made atomically and some classes provide a snapshot of the state of the list when the iterator was constructed (see CopyOnWriteArrayList. The CopyOnWriteArrayList is fast on read, but if you are performing many writes, this might affect performance.
Thus, if CopyOnWriteArrayList is not desired, take a look at ConcurrentLinkedQueue which offers a "weakly consistent" iterator that will never throw ConcurrentModificationException, and guarantees to traverse elements as they existed upon construction of the iterator. This one is efficient in all point, unless you have to access elements at specific indexes more often than traversing the entire collection.
Another option would be ConcurrentSkipListSet which provides expected average log(n) time cost for the contains, add, and remove operations and their variants. Insertion, removal, and access operations safely execute concurrently by multiple threads and iterators are weakly consistent as well.
Which concurrent (thread-safe) collections depend of what type of operations you perform the most with. And since they are all part of the Java Collection framework, you can swap them to which ever you need.

If you have an unencapsulated Collection object of a non thread-safe class, it is impossible to prevent misuse of the Collection, and thus the possibility of a ConcurrentModificationException.
Other answers have suggested use of a thread-safe Collection class, such as those provided by java.util.concurrent. You should however consider encapsulating the Collection object: have the object be private, and have your class provide higher level abstractions (such as addPerson and removePerson) that manipulate the Collection on behalf of the callers, and do not have any getter methods that return references to the Collection. It is then fairly easy to enforce invariants on the encapsulated data (such as "every person has a non empty name") and provide thread-safety using synchronized.

Related

Is collection synchronizing (via Collections.synchronizedX) necessary when access methods are synchronized?

There is a lot of topics when synchronization in Java appears. In many of them is recommended to using invokation of Collections.synchronized{Collecation, List, Map, Set, SortedMap, SortedSet} instead of Collection, List, etc. in case of multithreading work to thread-safe access.
Lets imagine situation when some threads exist and all of them need to access collection via methods that have synchronized block in their bodies.
So then, is it necessary to use:
Collection collection = Collections.synchronizedCollection(new ArrayList<T>());
or only
Collection collection = new ArrayList<String>();
need to?
Maybe you can show me an example when second attempt instead of first will cause evidently incorrect behaviour?
To the contrary, Collections.synchronizedCollection() is generally not sufficient because many operations (like iterating, check then add, etc.) need additional, explicit synchronization.
If every access to the collection is already done through properly synchronized methods, then wrapping the collection again into a syncronized proxy is useless.
No, if your access methods are synchronized there is no need to also use a synchronized collection.
Collection collection = new ArrayList<String>();
will do just fine in that scenario.
If you have already arranged for proper synchronization of your code, you definitely do not need another layer of synchronization on the lower level of granularity.
Just make sure when you say
all of them need to access collection via methods that have synchronized block in their bodies.
that all these blocks use the same lock. It is not enough to just involve some synchronized block.

ConcurrentHashMap operations

Following are some lines from the java docs of ConcurrentHashMap
This class obeys the same functional specification as Hashtable, and
includes versions of methods corresponding to each method of
Hashtable. However, even though all operations are thread-safe,
retrieval operations do not entail locking, and there is not any
support for locking the entire table in a way that prevents all
access.
What is the meaning of the statement
though all operations are thread-safe
from above paragraph?
Can anyone explain with any example of put() or get() methods?
The ConcurrentHashMap allows concurrent modification of the Map from several threads without the need to block them. Collections.synchronizedMap(map) creates a blocking Map which will degrade performance, albeit ensure consistency (if used properly).
Use the second option if you need to ensure data consistency, and each thread needs to have an up-to-date view of the map. Use the first if performance is critical, and each thread only inserts data to the map, with reads happening less frequently.
Your question is odd. If you understand what "thread safety" means then you would be able to understand how it applies to get() and put() on your own. If you don't understand thread safety then there is no point to explain it specifically in relation to get() and put(). Are you sure this isn't a homework question?
However, answering your question anyway, the fact that ConcurrentHashMap is thread safe means that if you have several threads executing put()s on the same map at the same time, then: a) no damage will occur to the internal data structures of the map and: b) some other thread doing a get() will see all of the values put in by the other threads. With a non-thread safe Map such as HashMap neither of those are guaranteed.

Is java.util.Hashtable thread safe?

It's been a while since I've used hashtable for anything significant, but I seem to recall the get() and put() methods being synchronized.
The JavaDocs don't reflect this. They simply say that the class Hashtable is synchronized. What can I assume? If several threads access the hashtable at the same time (assuming they are not modifying the same entry), the operations will succeed, right? I guess what I'm asking is "Is java.util.Hashtable thread safe?"
Please Guide me to get out of this issue...
It is threadsafe because the get, put, contains methods etc are synchronized. Furthermore, Several threads will not be able to access the hashtable at the same time, regardless of which entries they are modifying.
edit - amended to include the provisio that synchronization makes the hashtable internally threadsafe in that it is modified atomically; it doesn't guard against race conditions in outside code caused by concurrent access to the hashtable by multiple threads.
For general usage it is thread safe.
But you have to understand that it doesent make your application logic around it thread-safe. For e.g. consider implementing to put a value in a map, if its not there already.
This idiom is called putIfAbsent. Its difficult to implement this in a thread-safe manner using HashTable alone. Similarly for the idiom replace(k,V,V).
Hence for certain idioms like putIfAbsent and and replace(K,V,V), I would recommend using ConcurrentHashMap
Hashtable is deprecated. Forget it. If you want to use synchronized collections, use Collections.syncrhonize*() wrapper for that purpose. But these ones are not recommended. In Java 5, 6 new concurrent algorithms have been implemented. Copy-on-write, CAS, lock-free algorithms.
For Map interface there are two concurrent implementations. ConcurrentHashMap (concurrent hash map) and ConcurrentSkipListMap - concurrent sorted map implementaion.
The first one is optimized for reading, so retrievals do not block even while the table is being updated. Writes are also work much faster comparing with synchronized wrappers cause a ConcurrentHashMap consists of not one but a set of tables, called segments. It can be managed by the last argument in the constructor:
public ConcurrentHashMap(int initialCapacity,
float loadFactor,
int concurrencyLevel);
ConcurrentHashMap is indispensable in highly concurrent contexts, where it performs far better than any available alternative.
No. It is 'threadsafe' only to the extent that its methods are synchronized. However it is not threadsafe in general, and it can't be, because classes that export internal state such as Iterators or Enumerations require the use of the internal state to be synchronized as well. That's why the new Collections classes are not synchronized, as the Java designers recognized that thread-safety is up to the user of the class, not the class itself.
I'm asking is "Is java.util.Hashtable thread safe?".
Yes Hashtable is thread safe, If a thread safe is not needed in your application then go through HashMap, In case, If a thread-safe implementation is desired,then it is recommended to use ConcurrentHashMap in place of Hashtable.
Note, that a lot of the answers state that Hashtable is synchronised. but this will give you a very little. The synchronization is on the accessor / mutator methods will stop two threads adding or removing from the map concurrently, but in the real world you will often need additional synchronisation.
Even iterating over a Hashtable's entries is not thread safe unless you also guard the Map from being modified through additional synchronization.
If you look into Hashtable code, you will see that methods are synchronized such as:
public synchronized V get(Object key)
public synchronized V put(K key, V value)
public synchronized boolean containsKey(Object key)
You can keep pressing on control key (command for mac) and then click on any method name in the eclipse to go to the java source code.
Unlike the new collection implementations, Hashtable is synchronized. *If a thread-safe implementation is not needed, it is recommended to use HashMap* in place of Hashtable. If a thread-safe highly-concurrent implementation is desired, then it is recommended to use ConcurrentHashMap in place of Hashtable.
http://download.oracle.com/javase/7/docs/api/java/util/Hashtable.html
Yes, Hashtable thread safe, so only one thread can access a hashtable at any time
HashMap, on the other side, is not thread safe (and thus 'faster').

Are concurrent classes provided by the JDK required to use their instance's own intrinstic lock for synchronization?

The JDK provides a set of thread-safe classes like ConcurrentHashMap, ConcurrentLinkedQueue and AtomicInteger.
Are these classes required to synchronize on this to implement their thread-safe behavior?
Provided that they do we can implement our own synchronized operations on these objects and mix them with the built-in ones?
In other words is it safe to do:
ConcurrentMap<Integer, Account> accounts
= new ConcurrentHashMap<Integer, Account>();
// Add an account atomically
synchronized(accounts) {
if (!accounts.containsKey(id)) {
Account account = new Account();
accounts.put(id, account);
}
}
And in another thread
// Access the object expecting it to synchronize(this){…} internally
accounts.get(id);
Note that the simple synchronized block above could probably be replaced by putIfAbsent() but I can see other cases where synchronizing on the object could be useful.
Are these classes required to
synchronize on this to implement their
thread-safe behavior.
No and, not only that, the various code inspection tools will warn you if you do try to use the object lock.
In the case of the put method above, note the javadoc:
A hash table supporting full
concurrency of retrievals and
adjustable expected concurrency for
updates. This class obeys the same
functional specification as Hashtable,
and includes versions of methods
corresponding to each method of
Hashtable. However, even though all
operations are thread-safe, retrieval
operations do not entail locking, and
there is not any support for locking
the entire table in a way that
prevents all access. This class is
fully interoperable with Hashtable in
programs that rely on its thread
safety but not on its synchronization
details.
This means that the options are thread safe and there isn't a way to do what you're trying to do above (lock the whole table). Furthermore, for the operations that you use (put and get), neither of them will require such locking.
I particularly like this quote from the javadoc from the values() method:
The view's iterator is a "weakly
consistent" iterator that will never
throw ConcurrentModificationException,
and guarantees to traverse elements as
they existed upon construction of the
iterator, and may (but is not
guaranteed to) reflect any
modifications subsequent to
construction.
So, if you use this method, you'll get a reasonable list: it will have the data as of the request time and might or might not have any later updates. The assurance that you won't have to worry about the ConcurrentModificationExceptions is a huge one: you can write simple code without the synchronized block that you show above and know that things will just work.

Detecting concurrent modifications?

In a multi-threaded application I'm working on, we occasionally see ConcurrentModificationExceptions on our Lists (which are mostly ArrayList, sometimes Vectors). But there are other times when I think concurrent modifications are happening because iterating through the collection appears to be missing items, but no exceptions are thrown. I know that the docs for ConcurrentModificationException says you can't rely on it, but how would I go about making sure I'm not concurrently modifying a List? And is wrapping every access to the collection in a synchronized block the only way to prevent it?
Update: Yes, I know about Collections.synchronizedCollection, but it doesn't guard against somebody modifying the collection while you're iterating through it. I think at least some of my problem is happening when somebody adds something to a collection while I'm iterating through it.
Second Update If somebody wants to combine the mention of the synchronizedCollection and cloning like Jason did with a mention of the java.util.concurrent and the apache collections frameworks like jacekfoo and Javamann did, I can accept an answer.
Depending on your update frequency one of my favorites is the CopyOnWriteArrayList or CopyOnWriteArraySet. They create a new list/set on updates to avoid concurrent modification exception.
Your original question seems to be asking for an iterator that sees live updates to the underlying collection while remaining thread-safe. This is an incredibly expensive problem to solve in the general case, which is why none of the standard collection classes do it.
There are lots of ways of achieving partial solutions to the problem, and in your application, one of those may be sufficient.
Jason gives a specific way to achieve thread safety, and to avoid throwing a ConcurrentModificationException, but only at the expense of liveness.
Javamann mentions two specific classes in the java.util.concurrent package that solve the same problem in a lock-free way, where scalability is critical. These only shipped with Java 5, but there have been various projects that backport the functionality of the package into earlier Java versions, including this one, though they won't have such good performance in earlier JREs.
If you are already using some of the Apache Commons libraries, then as jacekfoo points out, the apache collections framework contains some helpful classes.
You might also consider looking at the Google collections framework.
Check out java.util.concurrent for versions of the standard Collections classes that are engineered to handle concurrency better.
Yes you have to synchronize access to collections objects.
Alternatively, you can use the synchronized wrappers around any existing object. See Collections.synchronizedCollection(). For example:
List<String> safeList = Collections.synchronizedList( originalList );
However all code needs to use the safe version, and even so iterating while another thread modifies will result in problems.
To solve the iteration problem, copy the list first. Example:
for ( String el : safeList.clone() )
{ ... }
For more optimized, thread-safe collections, also look at java.util.concurrent.
Usually you get a ConcurrentModificationException if you're trying to remove an element from a list whilst it's being iterated through.
The easiest way to test this is:
List<Blah> list = new ArrayList<Blah>();
for (Blah blah : list) {
list.remove(blah); // will throw the exception
}
I'm not sure how you'd get around it. You may have to implement your own thread-safe list, or you could create copies of the original list for writing and have a synchronized class that writes to the list.
You could try using defensive copying so that modifications to one List don't affect others.
Wrapping accesses to the collection in a synchronized block is the correct way to do this. Standard programming practice dictates the use of some sort of locking mechanism (semaphore, mutex, etc) when dealing with state that is shared across multiple threads.
Depending on your use case however you can usually make some optimizations to only lock in certain cases. For example, if you have a collection that is frequently read but rarely written, then you can allow concurrent reads but enforce a lock whenever a write is in progress. Concurrent reads only cause conflicts if the collection is in the process of being modified.
ConcurrentModificationException is best-effort because what you're asking is a hard problem. There's no good way to do this reliably without sacrificing performance besides proving that your access patterns do not concurrently modify the list.
Synchronization would likely prevent concurrent modifications, and it may be what you resort to in the end, but it can end up being costly. The best thing to do is probably to sit down and think for a while about your algorithm. If you can't come up with a lock-free solution, then resort to synchronization.
See the implementation. It basically stores an int:
transient volatile int modCount;
and that is incremented when there is a 'structural modification' (like remove). If iterator detects that modCount changed it throws Concurrent modification exception.
Synchronizing (via Collections.synchronizedXXX) won't do good since it does not guarantee iterator safety it only synchronizes writes and reads via put, get, set ...
See java.util.concurennt and apache collections framework (it has some classes that are optimized do work correctly in concurrent environment when there is more reads (that are unsynchronized) than writes - see FastHashMap.
You can also synchronize over iteratins over the list.
List<String> safeList = Collections.synchronizedList( originalList );
public void doSomething() {
synchronized(safeList){
for(String s : safeList){
System.out.println(s);
}
}
}
This will lock the list on synchronization and block all threads that try to access the list while you edit it or iterate over it. The downside is that you create a bottleneck.
This saves some memory over the .clone() method and might be faster depending on what you're doing in the iteration...
Collections.synchronizedList() will render a list nominally thread-safe and java.util.concurrent has more powerful features.
This will get rid of your concurrent modification exception. I won't speak to the efficiency however ;)
List<Blah> list = fillMyList();
List<Blah> temp = new ArrayList<Blah>();
for (Blah blah : list) {
//list.remove(blah); would throw the exception
temp.add(blah);
}
list.removeAll(temp);

Categories