In a multi-threaded application I'm working on, we occasionally see ConcurrentModificationExceptions on our Lists (which are mostly ArrayList, sometimes Vectors). But there are other times when I think concurrent modifications are happening because iterating through the collection appears to be missing items, but no exceptions are thrown. I know that the docs for ConcurrentModificationException says you can't rely on it, but how would I go about making sure I'm not concurrently modifying a List? And is wrapping every access to the collection in a synchronized block the only way to prevent it?
Update: Yes, I know about Collections.synchronizedCollection, but it doesn't guard against somebody modifying the collection while you're iterating through it. I think at least some of my problem is happening when somebody adds something to a collection while I'm iterating through it.
Second Update If somebody wants to combine the mention of the synchronizedCollection and cloning like Jason did with a mention of the java.util.concurrent and the apache collections frameworks like jacekfoo and Javamann did, I can accept an answer.
Depending on your update frequency one of my favorites is the CopyOnWriteArrayList or CopyOnWriteArraySet. They create a new list/set on updates to avoid concurrent modification exception.
Your original question seems to be asking for an iterator that sees live updates to the underlying collection while remaining thread-safe. This is an incredibly expensive problem to solve in the general case, which is why none of the standard collection classes do it.
There are lots of ways of achieving partial solutions to the problem, and in your application, one of those may be sufficient.
Jason gives a specific way to achieve thread safety, and to avoid throwing a ConcurrentModificationException, but only at the expense of liveness.
Javamann mentions two specific classes in the java.util.concurrent package that solve the same problem in a lock-free way, where scalability is critical. These only shipped with Java 5, but there have been various projects that backport the functionality of the package into earlier Java versions, including this one, though they won't have such good performance in earlier JREs.
If you are already using some of the Apache Commons libraries, then as jacekfoo points out, the apache collections framework contains some helpful classes.
You might also consider looking at the Google collections framework.
Check out java.util.concurrent for versions of the standard Collections classes that are engineered to handle concurrency better.
Yes you have to synchronize access to collections objects.
Alternatively, you can use the synchronized wrappers around any existing object. See Collections.synchronizedCollection(). For example:
List<String> safeList = Collections.synchronizedList( originalList );
However all code needs to use the safe version, and even so iterating while another thread modifies will result in problems.
To solve the iteration problem, copy the list first. Example:
for ( String el : safeList.clone() )
{ ... }
For more optimized, thread-safe collections, also look at java.util.concurrent.
Usually you get a ConcurrentModificationException if you're trying to remove an element from a list whilst it's being iterated through.
The easiest way to test this is:
List<Blah> list = new ArrayList<Blah>();
for (Blah blah : list) {
list.remove(blah); // will throw the exception
}
I'm not sure how you'd get around it. You may have to implement your own thread-safe list, or you could create copies of the original list for writing and have a synchronized class that writes to the list.
You could try using defensive copying so that modifications to one List don't affect others.
Wrapping accesses to the collection in a synchronized block is the correct way to do this. Standard programming practice dictates the use of some sort of locking mechanism (semaphore, mutex, etc) when dealing with state that is shared across multiple threads.
Depending on your use case however you can usually make some optimizations to only lock in certain cases. For example, if you have a collection that is frequently read but rarely written, then you can allow concurrent reads but enforce a lock whenever a write is in progress. Concurrent reads only cause conflicts if the collection is in the process of being modified.
ConcurrentModificationException is best-effort because what you're asking is a hard problem. There's no good way to do this reliably without sacrificing performance besides proving that your access patterns do not concurrently modify the list.
Synchronization would likely prevent concurrent modifications, and it may be what you resort to in the end, but it can end up being costly. The best thing to do is probably to sit down and think for a while about your algorithm. If you can't come up with a lock-free solution, then resort to synchronization.
See the implementation. It basically stores an int:
transient volatile int modCount;
and that is incremented when there is a 'structural modification' (like remove). If iterator detects that modCount changed it throws Concurrent modification exception.
Synchronizing (via Collections.synchronizedXXX) won't do good since it does not guarantee iterator safety it only synchronizes writes and reads via put, get, set ...
See java.util.concurennt and apache collections framework (it has some classes that are optimized do work correctly in concurrent environment when there is more reads (that are unsynchronized) than writes - see FastHashMap.
You can also synchronize over iteratins over the list.
List<String> safeList = Collections.synchronizedList( originalList );
public void doSomething() {
synchronized(safeList){
for(String s : safeList){
System.out.println(s);
}
}
}
This will lock the list on synchronization and block all threads that try to access the list while you edit it or iterate over it. The downside is that you create a bottleneck.
This saves some memory over the .clone() method and might be faster depending on what you're doing in the iteration...
Collections.synchronizedList() will render a list nominally thread-safe and java.util.concurrent has more powerful features.
This will get rid of your concurrent modification exception. I won't speak to the efficiency however ;)
List<Blah> list = fillMyList();
List<Blah> temp = new ArrayList<Blah>();
for (Blah blah : list) {
//list.remove(blah); would throw the exception
temp.add(blah);
}
list.removeAll(temp);
Related
I have a Java class that contains an ArrayList of transaction info objects that get queried and modified by different threads on a frequent basis. At a basic level, the structure of the class looks something like this (currently no synchronization is present):
class Statistics
{
private List<TranInfo> tranInfoList = new ArrayList<TranInfo>();
// This method runs frequently - every time a transaction comes in.
void add(TranInfo tranInfo)
{
tranInfoList.add(tranInfo);
}
// This method acts like a cleaner and runs occasionally.
void removeBasedOnSomeCondition()
{
// Some code to determine which items to remove
tranInfoList.removeAll(listOfUnwantedTranInfos);
}
// Methods to query stats on the tran info.
// These methods are called frequently.
Stats getStatsBasedOnSomeCondition()
{
// Iterate over the list of tran info
// objects and return some stats
}
Stats getStatsBasedOnSomeOtherCondition()
{
// Iterate over the list of tran info
// objects and return some stats
}
}
I need to ensure that read/write operations on the list are synchronized correctly, however, performance is very important, so I don't want to end up locking in every method call (especially for concurrent read operations). I've looked at the following solutions:
CopyOnWriteArrayList
I've looked at the use of a CopyOnWriteArrayList to prevent ConcurrentModificationExceptions being thrown when the list is modified while iterating over it; the problem here is the copy required each time the list is modified... it seems too expensive given how often the list will be modified and the potential size of the list.
ReadWriteLock
A ReadWriteLock could be used to synchronize read/write operations while allowing concurrent read operations to take place. While this approach will work, it ends up resulting in a lot of synchronization code in the class (this isn't the end of the world though).
Are there any other clever ways of achieving this kind of synchronization without a big performance penalty, or are one of the above methods the recommended way? Any advice on this would be greatly appreciated.
I'd use Collections.synchronizedList() until you know for sure that it is indeed the crucial performance bottle neck of your application (needless to say I doubt it is ;-)). You can only know for sure through thorough testing. I assume you know about "premature optimization"...
If then you strive to optimize access to that list I'd say ReadWriteLock is a good approach.
Another solution that may make sense (especially under heavy read/write) is ConcurrentLinkedQueue (http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ConcurrentLinkedQueue.html). It is a pretty scalable implementation under contention, based on CAS operations.
The one change that's required to your code is that ConcurrentLinkedQueue does not implement the List interface, and you need to abide by either the Iterable or the Queue type. The only operation you lose really is random access via index, but I don't see that being an issue in your access pattern.
Whenever we use java.util Collection classes, we have that if one thread changes a collection while another thread is traversing through it using an iterator, then any call to iterator.hasNext() or iterator.next() will throw ConcurrentModificationException. Even the synchronized collection wrapper classes SynchronizedMap and SynchronizedList are only conditionally thread-safe, which means all individual operations are thread-safe but compound operations where flow of control depends on the results of previous operations may be subject to threading issues. The question is: How to avoid this problem without affecting the performance. Note: I am aware of CopyOnWriteArrayList.
You can use CopyOnWriteArrayList or ConcurrentHashMap etc. as you mentioned above or you can use Atomic* classes which are working with CAS.
If you weren't aware of Atomic* classes they definitely worth a look! You may check out this question.
So to answer your question you have to choose the right tools for the task. Since you do not share the context with us I can just guess. In some situations CAS will perform better in others the concurrent Collections will.
If something isn't clear you can always check out the official Oracle Trails: Lesson: Concurrency
I think you raised an interesting question.
I tried thinking whether ConcurrentHashMap for example, as was suggested by others can help, but I'm not sure as the lock is segment-based.
What I would do in this case , and I do hope I understood your question well, is to lock access to your collection, using a ReaderWriterLock.
The reason I chose this lock is because I do feel this needs locking (as you explained - iteration is composed from several operations) ,
And because in case of reader threads, I do not want them to wait on lock , if no writer thread is working on the collection.
Thanks to #Adam Arold I paid attention that you suggested the "synchronized decorator" - but I feel this decorator is "too strong" for your needs, as it uses a synchronized and will not diffrentiate between cases of N readers and M writers.
This is because the "standard" Java collections are not thread safe as they are not synchronized. When working with multiple threads accessing your collections, you should look at the java.util.concurrent packages.
Without this package, before Java 5, one had to perform a manual synchronization :
synchronized(list) {
Iterator i = list.iterator(); // Must be in synchronized block
while (i.hasNext())
foo(i.next());
}
or using
Collections.synchronizedList(arrayList);
but neither could really offer a complete thread safety feature.
With this package, all access to the collection is made atomically and some classes provide a snapshot of the state of the list when the iterator was constructed (see CopyOnWriteArrayList. The CopyOnWriteArrayList is fast on read, but if you are performing many writes, this might affect performance.
Thus, if CopyOnWriteArrayList is not desired, take a look at ConcurrentLinkedQueue which offers a "weakly consistent" iterator that will never throw ConcurrentModificationException, and guarantees to traverse elements as they existed upon construction of the iterator. This one is efficient in all point, unless you have to access elements at specific indexes more often than traversing the entire collection.
Another option would be ConcurrentSkipListSet which provides expected average log(n) time cost for the contains, add, and remove operations and their variants. Insertion, removal, and access operations safely execute concurrently by multiple threads and iterators are weakly consistent as well.
Which concurrent (thread-safe) collections depend of what type of operations you perform the most with. And since they are all part of the Java Collection framework, you can swap them to which ever you need.
If you have an unencapsulated Collection object of a non thread-safe class, it is impossible to prevent misuse of the Collection, and thus the possibility of a ConcurrentModificationException.
Other answers have suggested use of a thread-safe Collection class, such as those provided by java.util.concurrent. You should however consider encapsulating the Collection object: have the object be private, and have your class provide higher level abstractions (such as addPerson and removePerson) that manipulate the Collection on behalf of the callers, and do not have any getter methods that return references to the Collection. It is then fairly easy to enforce invariants on the encapsulated data (such as "every person has a non empty name") and provide thread-safety using synchronized.
I know you have to synchronize around anything that would change the structure of a hashmap (put or remove) but it seems to me you also have to synchronize around reads of the hashmap otherwise you might be reading while another thread is changing the structure of the hashmap.
So I sync around gets and puts to my hashmap.
The only machines I have available to me to test with all only have one processor so I never had any real concurrency until the system went to production and started failing. Items were missing out of my hashmap. I assume this is because two threads were writing at the same time, but based on the code below, this should not be possible. When I turned down the number of threads to 1 it started working flawlessly, so it's definitely a threading problem.
Details:
// something for all the threads to sync on
private static Object EMREPORTONE = new Object();
synchronized (EMREPORTONE)
{
reportdatacache.put("name.." + eri.recip_map_id, eri.name);
reportdatacache.put("subjec" + eri.recip_map_id, eri.subject);
etc...
}
... and elsewhere....
synchronized (EMREPORTONE)
{
eri.name = (String)reportdatacache.get("name.." + eri.recip_map_id);
eri.subject = (String)reportdatacache.get("subjec" + eri.recip_map_id);
etc...
}
and that's it. I pass around reportdatacache between functions, but that's just the reference to the hashmap.
Another important point is that this is running as a servlet in an appserver (iplanet to be specific, but I know none of you have ever heard of that)
But regardless, EMREPORTONE is global to the webserver process, no two threads should be able to step on each other, yet my hashmap is getting wrecked. Any thoughts?
In servlet container environment static variables depend on classloader. So you may think that you're dealing with same static instance, but in fact it could be completely different one.
Additionally, check if you do not use the map by escaped reference elsewhere and write/remove keys from it.
And yes, use ConcurrentHashMap instead.
Yes, synchronization is not only important when writing, but also when reading. While a write will be performed under mutually exclusion, a reader might access an errenous state of the map.
I cannot recommend you under any circumstances to synchronize the Java Collections manually, there are thread-safe counterparts: Collections.synchronizedMap and ConcurrentHashMap. Use them, they will ensure, that access to them is safe in a multithreaded environment.
Futher hints, it seems that everyone is accesing the datareportcache. Is there only one instance of that object? Why not synchronize then on the cache itself? But forget then when trying to solve your problems, use the sugar from java.util.concurrent.
As I see it there are 3 possibilities here:
You are locking on two different objects. EMREPORTONE is private static however and the code that accesses the reportdatacache is in one file only. Ok, that isn't it then. But I would recommend locking on reportdatacache instead of EMREPORTONE however. Cleaner code.
You are missing some read or write to reportdatacache somewhere. There are other accesses to the map that are not synchronized. Are things never removed from the cache?
This isn't a synchronization problem but rather a race condition issue. The data in the hashmap is fine but you are expecting things to be in the cache but they haven't be stored by the other thread yet. Maybe 2 requests come in for the same eri at the same time and they are both putting values into the cache? Maybe check to see if the old value returned by put(...) is always null? Maybe explaining more about how you know that items are missing from the map would help with this.
As an aside, you are doing this:
reportdatacache.put("name.." + eri.recip_map_id, eri.name);
reportdatacache.put("subjec" + eri.recip_map_id, eri.subject);
But it seems like you really should be storing the eri by its id.
reportdatacache.put(recip_map_id, eri);
Then you aren't creating fake keys with the "name.." prefix. Or maybe you should create a NameSubject private static class to store the name and subject in the cache. Cleaner.
Hope something here helps.
We all know when using Collections.synchronizedXXX (e.g. synchronizedSet()) we get a synchronized "view" of the underlying collection.
However, the document of these wrapper generation methods states that we have to explicitly synchronize on the collection when iterating of the collections using an iterator.
Which option do you choose to solve this problem?
I can only see the following approaches:
Do it as the documentation states: synchronize on the collection
Clone the collection before calling iterator()
Use a collection which iterator is thread-safe (I am only aware of CopyOnWriteArrayList/Set)
And as a bonus question: when using a synchronized view - is the use of foreach/Iterable thread-safe?
You've already answered your bonus question really: no, using an enhanced for loop isn't safe - because it uses an iterator.
As for which is the most appropriate approach - it really depends on how your context:
Are writes very infrequent? If so, CopyOnWriteArrayList may be most appropriate.
Is the collection reasonably small, and the iteration quick? (i.e. you're not doing much work in the loop) If so, synchronizing may well be fine - especially if this doesn't happen too often (i.e. you won't have much contention for the collection).
If you're doing a lot of work and don't want to block other threads working at the same time, the hit of cloning the collection may well be acceptable.
Depends on your access model. If you have low concurrency and frequent writes, 1 will have the best performance. If you have high concurrency with and infrequent writes, 3 will have the best performance. Option 2 is going to perform badly in almost all cases.
foreach calls iterator(), so exactly the same things apply.
You could use one of the newer collections added in Java 5.0 which support concurrent access while iterating. Another approach is to take a copy using toArray which is thread safe (during the copy).
Collection<String> words = ...
// enhanced for loop over an array.
for(String word: words.toArray(new String[0])) {
}
I might be totally off with your requirements, but if you are not aware of them, check out google-collections with "Favor immutability" in mind.
I suggest dropping Collections.synchronizedXXX and handle all locking uniformly in the client code. The basic collections don't support the sort of compound operations useful in threaded code, and even if you use java.util.concurrent.* the code is more difficult. I suggest keeping as much code as possible thread-agnostic. Keep difficult and error-prone thread-safe (if we are very lucky) code to a minimum.
All three of your options will work. Choosing the right one for your situation will depend on what your situation is.
CopyOnWriteArrayList will work if you want a list implementation and you don't mind the underlying storage being copied every time you write. This is pretty good for performance as long as you don't have very big collections.
ConcurrentHashMap or "ConcurrentHashSet" (using Collections.newSetFromMap) will work if you need a Map or Set interface, obviously you don't get random access this way. One great! thing about these two is that they will work well with large data sets - when mutated they just copy little bits of the underlying data storage.
It does depend on the result one needs to achieve cloning/copying/toArray(), new ArrayList(..) and the likes obtain a snapshot and does not lock the the collection.
Using synchronized(collection) and iteration through ensure by the end of the iteration would be no modification, i.e. effectively locking it.
side note:(toArray() is usually preferred with some exceptions when internally it needs to create a temporary ArrayList). Also please note, anything but toArray() should be wrapped in synchronized(collection) as well, provided using Collections.synchronizedXXX.
This Question is rather old (sorry, i am a bit late..) but i still want to add my Answer.
I would choose your second choice (i.e. Clone the collection before calling iterator()) but with a major twist.
Asuming, you want to iterate using iterator, you do not have to coppy the Collection before calling .iterator() and sort of negating (i am using the term "negating" loosely) the idea of the iterator pattern, but you could write a "ThreadSafeIterator".
It would work on the same premise, coppying the Collection, but without letting the iterating class know, that you did just that. Such an Iterator might look like this:
class ThreadSafeIterator<T> implements Iterator<T> {
private final Queue<T> clients;
private T currentElement;
private final Collection<T> source;
AsynchronousIterator(final Collection<T> collection) {
clients = new LinkedList<>(collection);
this.source = collection;
}
#Override
public boolean hasNext() {
return clients.peek() != null;
}
#Override
public T next() {
currentElement = clients.poll();
return currentElement;
}
#Override
public void remove() {
synchronized(source) {
source.remove(currentElement);
}
}
}
Taking this a Step furhter, you might use the Semaphore Class to ensure thread-safety or something. But take the remove method with a grain of salt.
The point is, by using such an Iterator, no one, neither the iterating nor the iterated Class (is that a real word) has to worrie about Thread safety.
Imagine a synchronized Collection:
Set s = Collections.synchronizedSet(new HashSet())
What's the best approach to clone this Collection?
It's prefered that the cloning doesn't need any synchronization on the original Collection but required that iterating over the cloned Collection does not need any synchronization on the original Collection.
Use a copy-constructor inside a synchronized block:
synchronized (s) {
Set newSet = new HashSet(s); //preferably use generics
}
If you need the copy to be synchronized as well, then use Collections.synchronizedSet(..) again.
As per Peter's comment - you'll need to do this in a synchronized block on the original set. The documentation of synchronizedSet is explicit about this:
It is imperative that the user manually synchronize on the returned set when iterating over it
When using synchronized sets, do understand that you will incur synchronization overhead accessing every element in the set. The Collections.synchronizedSet() merely wraps your set with a shell that forces every method to be synchronized. Probably not what you really intended. A ConcurrentSkipListSet will give you better performance in a multithreaded environment where multiple threads will be writing to the set.
The ConcurrentSkipListSet will allow you to perform the following:
Set newSet = s.clone();//preferably use generics
It's not uncommon to use a clone of a set for snapshot processing. If that's what you are after, you might add a little code to handle the case where the item is already processed. The overhead involved with the occasional object included in more than one copy set is usually less than the consistent overhead of using Collections.concurrentSet().
EDIT: I just noticed that ConcurrentSkipListSet is Cloneable and provides a threadsafe clone() method. I changed my answer because I really believe this is the best option--instead of losing scalability and performance to Collections.concurrentSet().
You can avoid synchronizing the set by doing the following which avoids exposing an Iterator on the original set.
Set newSet = new HashSet(Arrays.asList(s.toArray()));
EDIT From Collections.SynchronizedCollection
public Object[] toArray() {
synchronized(mutex) {return c.toArray();}
}
As you can see, the lock is held for the entire time the operation is performed. As such a safe copy of the data is taken. It doesn't matter if an Iterator is used internally. The array returned can be used in a thread safe manner as only the local thread has a reference to it.
NOTE: If you want to avoid these issues I suggest you use a Set from the concurrency library added in Java 5.0 in 2004. I also suggest you use generics as this can make your collections more type safe.