Imagine the following scenario:
I have a standard Java ArrayList<String>.
This ArrayList<String> is accessed by multiple threads with no explicit synchronisation, subject to the following constraints:
Multiple reads may occur at the same time (possibly concurrent with the write described below). All reads call the iterator() method on the ArrayList<String> and exclusively use the returned Iterator<E> (iterators are not shared between threads). The only methods called on the Iterator<String> are hasNext() and next() (the remove() method is not called).
One thread may write to the list (possibly concurrent with reads but not concurrent with other writes). Each write only calls the add(String) and remove(Object) methods on the ArrayList<E>.
I know the follow to be true:
The read threads may see outdated data.
The read threads may experience a ConcurrentModificationException.
Apart from the above two problems, can anything else go wrong?
I am looking for specific examples (of the form, if x and y are true, then z will occur, where z is bad). It has to be something that can actually happen in practice. Please provide citations where possible.
(I personally think that other failures are possible but I have been challenged to come up with specific examples of why the above scenario is not suitable for production code.)
I would expect that your readers or your writer could get an ArrayIndexOutOfBounds exception or a null pointer exception, as well, since they can be seeing inconsistent state of the underlying array.
You really are at the mercy of the detailed implementation of the ArrayList class-- which can vary from environment to environment. It is possible that the JVM could implement the class with native code that could cause worse undefined behavior (JVM crash) when reading without synchronization.
I'll add this from the text of the ConcurrentModificationException reference page:
Note that fail-fast behavior cannot be guaranteed as it is, generally speaking, impossible to make any hard guarantees in the presence of unsynchronized concurrent modification. Fail-fast operations throw ConcurrentModificationException on a best-effort basis. Therefore, it would be wrong to write a program that depended on this exception for its correctness: ConcurrentModificationException should be used only to detect bugs.
In short, you can't depend on nothing bad happening besides stale data and ConcurrentModificationException.
Related
I understand that modifying an ArrayList makes it not thread-safe.
➠ But if the ArrayList is not being modified, perhaps protected by a call to Collections.unmodifiableList, is calling ArrayList::get thread-safe?
For example, can an ArrayList be passed to a Java Stream for parallel-processing of its elements?
But if the ArrayList is not being modified is calling ArrayList::get thread-safe?
No it is not thread-safe.
The problems arise if you do something like the following:
Thread A creates and populates list.
Thread A passes reference to list to thread B (without a happens before relationship)
Thread B calls get on the list.
Unless there is a proper happens before chain between 1 and 3, thread B may see stale values ... occasionally ... on some platforms under certain work loads.
There are ways to address this. For example, if thread A starts thread B after step 1, there will be a happens before. Similarly, there will be happens before if A passes the list reference to B via properly synchronized setter / getter calls or a volatile variable.
But the bottom line is that (just) not changing the list is not sufficient to make it thread-safe.
... perhaps protected by a call to Collections.unmodifiableList
The creation of the Collections.unmodifiableList should provide the happens before relationship ... provided that you access the list via the wrapper not directly via ArrayList::get.
For example, can an ArrayList be passed to a Java Stream for parallel-processing of its elements?
That's a specific situation. The stream mechanisms will provide the happens before relationship. Provided they are used as intended. It is complicated.
This comes from the Spliterator interface javadoc.
"Despite their obvious utility in parallel algorithms, spliterators are not expected to be thread-safe; instead, implementations of parallel algorithms using spliterators should ensure that the spliterator is only used by one thread at a time. This is generally easy to attain via serial thread-confinement, which often is a natural consequence of typical parallel algorithms that work by recursive decomposition. A thread calling trySplit() may hand over the returned Spliterator to another thread, which in turn may traverse or further split that Spliterator. The behaviour of splitting and traversal is undefined if two or more threads operate concurrently on the same spliterator. If the original thread hands a spliterator off to another thread for processing, it is best if that handoff occurs before any elements are consumed with tryAdvance(), as certain guarantees (such as the accuracy of estimateSize() for SIZED spliterators) are only valid before traversal has begun."
In other words, thread safety is a joint responsibility of the Spliterator implementation and Stream implementation.
The simple way to think about this is that "magic happens" ... because if it didn't then parallel streams would be unusable.
But note that the Spliterator is not necessarily using ArrayList::get at all.
Thread safety is only a concern, as you stated when values can change between the threads. If the elements aren't being added or removed, the object remains the same and all threads can easily operate on it. This is the same for most objects in Java.
You may be able to get away with adding to an ArrayList across threads as seen here but I wouldn't bank on it.
No, ArrayList.get() is not inherently thread-safe just because it does not modify the List. You still need something to create a happens-before relationship between each get() and each method invocation that does modify the list.
Suppose, however, that you instantiate and populate the list first, and then perform multiple get()s, never modifying it again, or at least not until after some synchronization point following all the get()s. You do not then need mutual synchronization between the various get()s, and you may be able to obtain cheap synchronization between the get()s and the end of the initialization phase. This is effectively the situation you will have with an otherwise non-shared List that you provide as input to a parallel stream computation.
I have no idea why a ConcurrentModificationException occurs when i iterate over an ArrayList. The ArrayList is methode scoped, so it should not be visible by other threads which execute the same code. At least if i understodd multi threading and variable scopes correctly.
Caused by: java.util.ConcurrentModificationException
at java.util.AbstractList$SimpleListIterator.next(AbstractList.java:64)
at com....StrategyHandler.applyStrategy(StrategyHandler.java:184)
private List<Order> applyStrategy(StorageObjectTree storageObjectTree) {
...
List<OrderHeader> finalList = new ArrayList<Order>();
for (StorageObject storageObject : storageObjectTree.getStorageObjects()) {
List<Order> currentOrders = strategy.process(storageObject);
...
if (currentOrders != null) {
Iterator<Order> iterator = currentOrders.iterator();
while (iterator.hasNext()) {
Order order = (Order) iterator.next(); // line 64
// read some values from order
}
finalList.addAll(currentOrders);
}
}
return finalList;
}
Can anybody give me an hint what could be the source of the problem?
If You have read the Java Doc for ConcurrentModifcationException :
It clearly states the condition in which it occurs:
This exception may be thrown by methods that have detected concurrent
modification of an object when such modification is not permissible.
For example, it is not generally permissible for one thread to modify
a Collection while another thread is iterating over it. In general,
the results of the iteration are undefined under these circumstances.
Some Iterator implementations (including those of all the general
purpose collection implementations provided by the JRE) may choose to
throw this exception if this behavior is detected. Iterators that do
this are known as fail-fast iterators, as they fail quickly and
cleanly, rather that risking arbitrary, non-deterministic behavior at
an undetermined time in the future.
Note that this exception does not always indicate that an object has
been concurrently modified by a different thread. If a single thread
issues a sequence of method invocations that violates the contract of
an object, the object may throw this exception. For example, if a
thread modifies a collection directly while it is iterating over the
collection with a fail-fast iterator, the iterator will throw this
exception.
Note that fail-fast behavior cannot be guaranteed as it is, generally
speaking, impossible to make any hard guarantees in the presence of
unsynchronized concurrent modification. Fail-fast operations throw
ConcurrentModificationException on a best-effort basis. Therefore, it
would be wrong to write a program that depended on this exception for
its correctness: ConcurrentModificationException should be used only
to detect bugs.
In your case as you said, you do not have multiple threads accessing this list. it might still be possible as per second paragraph above if your single thread that is reading from iterator might be trying to write to it as well.
Hope this helps.
This exception occurred when you changing/adding/removing values from your list and in the same time you are iterating it. And if you use many threads at the same time...
Try to surround your if by synchronized(currentOrders) { /* YOUR LAST CODE */ }.
I'm not sure of this but try it.
Depending on the implementation of strategy.process(..) it could be that this implementation has still a reference to the List it passed back as a result. If there are multiple Threads involved in this implementation it might be possible that the List is modified by one of these threads even after it is passed back as a result.
(If you know the "Future" pattern, you could perhaps imagine an implementation where a method immediately returns an empty List and adds the actual results later using another Thread)
You could try to create a new ArrayList "around" the result list and iterate over this copied list.
You might want to read this SO post. Basically switch and use CopyOnWriteArrayList if you can't see where the problem is coming from.
http://docs.oracle.com/javase/7/docs/api/java/util/LinkedList.html
Note that this implementation is not synchronized. If multiple threads access a linked list concurrently, and at least one of the threads modifies the list structurally, it must be synchronized externally.
What can happen if you don't? Can you crash the JVM, cause an exception or just produce inconsistent state?
What if there is one writer but concurrent reads happen unprotected? Can you still crash and mess up the state, or just produce a inconsistent read?
Is this implementation-specific, or does the spec guarantee a certain level of security and/or atomicity?
Using an unsynchronized collection in an multithreaded environment will cause problems like dirty reads (inconsistent data state) and ConcurrentModificationException (mostly when one thread has modified the contents of the collection while another was iterating through it).
Depending on your use case, this may cause your application to crash or deadlock (when one thread is shut down by the JVM due to the mentioned above, uncaught exception). Even worse, it may cause dodgy problems and erroneous results which may be difficult to trace. It will not crash the JVM itself, though.
I'd suggest taking a look at the java.util.concurrent package. You'll find a wide variety of thread-safe, efficient collections. Most of them have weakly consistent iterators, returning elements reflecting the state of the collection at some point at or since the creation of the iterator. This means they do not throw the ConcurrentModificationException, and may proceed concurrently with other operations.
For information regarding Java Memory Model and it's guarantees, please refer to this (well worth reading!).
The nasty thing about thread safety is that errors can occur rarely and be difficult to reproduce. Java largely delegates the handling of Threads to the operating system, so the programmer is not in control of exactly how and when Threads get paused and switched by the OS when a number of tasks are running simultaneously. The frequency of kinds of errors observed can be different depending on whether the CPU is a single core or a dual or quad-core.
It is rare that concurrency errors will "crash the system," the more likely problem is inconsistent state. However, if you are iterating through a collection using an Iterator while another Thread modifies the collection then you will get a ConcurrentModificationException. For example:
Set<String> words; //a field that can be accessed by other threads.
// may throw ConcurrentModificationException
public ArrayList<String> unsafeIteration() {
ArrayList<String> longWords = new ArrayList<>();
for(String word : words) {
if(word.length()>4)
longWords.add(word);
}
return longWords ;
}
The implementation of Iterator attempts to detect concurrent modifications of the collection it is iterating over, but this is just a best-effort attempt to "fail-fast". Having your program fail by throwing an exception is much better than having unpredictable behavior.
The javadocs make this disclaimer:
Note that fail-fast behavior cannot be guaranteed as it is, generally
speaking, impossible to make any hard guarantees in the presence of
unsynchronized concurrent modification. Fail-fast operations throw
ConcurrentModificationException on a best-effort basis. Therefore, it
would be wrong to write a program that depended on this exception for
its correctness: ConcurrentModificationException should be used only
to detect bugs.
If we are simply reading data from a collection using get then we aren't going to see this exception, but we do run the risk of inconsistent state. There may be times when this isn't a problem that requires fixing. If only one thread writes to a field and it is not vital that all threads always see the most recent up-to-date value in that field, I think you should be fine as long as you stay clear of iterators.
You may not crash the application but your different threads will suffer from DIRTY_READ problems.
The Javadoc of the root class of the collections framework, java.util.Collection, writes:
It is up to each collection to determine its own synchronization policy. In the absence of a stronger guarantee by the implementation, undefined behavior may result from the invocation of any method on a collection that is being mutated by another thread; this includes direct invocations, passing the collection to a method that might perform invocations, and using an existing iterator to examine the collection.
"Undefined behavior" implies that the collection may do whatever it pleases, the entire javadoc of the collections framework is null and void. For instance, an element might still be present in the collection after being removed, or not present after being added. For instance, if Thread 1 adds to a HashMap and triggers a resize while Thread 2 inserts something, the insert from Thread 2 might be lost.
However, I would be greatly surprised if lack of synchronization could crash the JVM itself.
Whenever we use java.util Collection classes, we have that if one thread changes a collection while another thread is traversing through it using an iterator, then any call to iterator.hasNext() or iterator.next() will throw ConcurrentModificationException. Even the synchronized collection wrapper classes SynchronizedMap and SynchronizedList are only conditionally thread-safe, which means all individual operations are thread-safe but compound operations where flow of control depends on the results of previous operations may be subject to threading issues. The question is: How to avoid this problem without affecting the performance. Note: I am aware of CopyOnWriteArrayList.
You can use CopyOnWriteArrayList or ConcurrentHashMap etc. as you mentioned above or you can use Atomic* classes which are working with CAS.
If you weren't aware of Atomic* classes they definitely worth a look! You may check out this question.
So to answer your question you have to choose the right tools for the task. Since you do not share the context with us I can just guess. In some situations CAS will perform better in others the concurrent Collections will.
If something isn't clear you can always check out the official Oracle Trails: Lesson: Concurrency
I think you raised an interesting question.
I tried thinking whether ConcurrentHashMap for example, as was suggested by others can help, but I'm not sure as the lock is segment-based.
What I would do in this case , and I do hope I understood your question well, is to lock access to your collection, using a ReaderWriterLock.
The reason I chose this lock is because I do feel this needs locking (as you explained - iteration is composed from several operations) ,
And because in case of reader threads, I do not want them to wait on lock , if no writer thread is working on the collection.
Thanks to #Adam Arold I paid attention that you suggested the "synchronized decorator" - but I feel this decorator is "too strong" for your needs, as it uses a synchronized and will not diffrentiate between cases of N readers and M writers.
This is because the "standard" Java collections are not thread safe as they are not synchronized. When working with multiple threads accessing your collections, you should look at the java.util.concurrent packages.
Without this package, before Java 5, one had to perform a manual synchronization :
synchronized(list) {
Iterator i = list.iterator(); // Must be in synchronized block
while (i.hasNext())
foo(i.next());
}
or using
Collections.synchronizedList(arrayList);
but neither could really offer a complete thread safety feature.
With this package, all access to the collection is made atomically and some classes provide a snapshot of the state of the list when the iterator was constructed (see CopyOnWriteArrayList. The CopyOnWriteArrayList is fast on read, but if you are performing many writes, this might affect performance.
Thus, if CopyOnWriteArrayList is not desired, take a look at ConcurrentLinkedQueue which offers a "weakly consistent" iterator that will never throw ConcurrentModificationException, and guarantees to traverse elements as they existed upon construction of the iterator. This one is efficient in all point, unless you have to access elements at specific indexes more often than traversing the entire collection.
Another option would be ConcurrentSkipListSet which provides expected average log(n) time cost for the contains, add, and remove operations and their variants. Insertion, removal, and access operations safely execute concurrently by multiple threads and iterators are weakly consistent as well.
Which concurrent (thread-safe) collections depend of what type of operations you perform the most with. And since they are all part of the Java Collection framework, you can swap them to which ever you need.
If you have an unencapsulated Collection object of a non thread-safe class, it is impossible to prevent misuse of the Collection, and thus the possibility of a ConcurrentModificationException.
Other answers have suggested use of a thread-safe Collection class, such as those provided by java.util.concurrent. You should however consider encapsulating the Collection object: have the object be private, and have your class provide higher level abstractions (such as addPerson and removePerson) that manipulate the Collection on behalf of the callers, and do not have any getter methods that return references to the Collection. It is then fairly easy to enforce invariants on the encapsulated data (such as "every person has a non empty name") and provide thread-safety using synchronized.
In a multi-threaded application I'm working on, we occasionally see ConcurrentModificationExceptions on our Lists (which are mostly ArrayList, sometimes Vectors). But there are other times when I think concurrent modifications are happening because iterating through the collection appears to be missing items, but no exceptions are thrown. I know that the docs for ConcurrentModificationException says you can't rely on it, but how would I go about making sure I'm not concurrently modifying a List? And is wrapping every access to the collection in a synchronized block the only way to prevent it?
Update: Yes, I know about Collections.synchronizedCollection, but it doesn't guard against somebody modifying the collection while you're iterating through it. I think at least some of my problem is happening when somebody adds something to a collection while I'm iterating through it.
Second Update If somebody wants to combine the mention of the synchronizedCollection and cloning like Jason did with a mention of the java.util.concurrent and the apache collections frameworks like jacekfoo and Javamann did, I can accept an answer.
Depending on your update frequency one of my favorites is the CopyOnWriteArrayList or CopyOnWriteArraySet. They create a new list/set on updates to avoid concurrent modification exception.
Your original question seems to be asking for an iterator that sees live updates to the underlying collection while remaining thread-safe. This is an incredibly expensive problem to solve in the general case, which is why none of the standard collection classes do it.
There are lots of ways of achieving partial solutions to the problem, and in your application, one of those may be sufficient.
Jason gives a specific way to achieve thread safety, and to avoid throwing a ConcurrentModificationException, but only at the expense of liveness.
Javamann mentions two specific classes in the java.util.concurrent package that solve the same problem in a lock-free way, where scalability is critical. These only shipped with Java 5, but there have been various projects that backport the functionality of the package into earlier Java versions, including this one, though they won't have such good performance in earlier JREs.
If you are already using some of the Apache Commons libraries, then as jacekfoo points out, the apache collections framework contains some helpful classes.
You might also consider looking at the Google collections framework.
Check out java.util.concurrent for versions of the standard Collections classes that are engineered to handle concurrency better.
Yes you have to synchronize access to collections objects.
Alternatively, you can use the synchronized wrappers around any existing object. See Collections.synchronizedCollection(). For example:
List<String> safeList = Collections.synchronizedList( originalList );
However all code needs to use the safe version, and even so iterating while another thread modifies will result in problems.
To solve the iteration problem, copy the list first. Example:
for ( String el : safeList.clone() )
{ ... }
For more optimized, thread-safe collections, also look at java.util.concurrent.
Usually you get a ConcurrentModificationException if you're trying to remove an element from a list whilst it's being iterated through.
The easiest way to test this is:
List<Blah> list = new ArrayList<Blah>();
for (Blah blah : list) {
list.remove(blah); // will throw the exception
}
I'm not sure how you'd get around it. You may have to implement your own thread-safe list, or you could create copies of the original list for writing and have a synchronized class that writes to the list.
You could try using defensive copying so that modifications to one List don't affect others.
Wrapping accesses to the collection in a synchronized block is the correct way to do this. Standard programming practice dictates the use of some sort of locking mechanism (semaphore, mutex, etc) when dealing with state that is shared across multiple threads.
Depending on your use case however you can usually make some optimizations to only lock in certain cases. For example, if you have a collection that is frequently read but rarely written, then you can allow concurrent reads but enforce a lock whenever a write is in progress. Concurrent reads only cause conflicts if the collection is in the process of being modified.
ConcurrentModificationException is best-effort because what you're asking is a hard problem. There's no good way to do this reliably without sacrificing performance besides proving that your access patterns do not concurrently modify the list.
Synchronization would likely prevent concurrent modifications, and it may be what you resort to in the end, but it can end up being costly. The best thing to do is probably to sit down and think for a while about your algorithm. If you can't come up with a lock-free solution, then resort to synchronization.
See the implementation. It basically stores an int:
transient volatile int modCount;
and that is incremented when there is a 'structural modification' (like remove). If iterator detects that modCount changed it throws Concurrent modification exception.
Synchronizing (via Collections.synchronizedXXX) won't do good since it does not guarantee iterator safety it only synchronizes writes and reads via put, get, set ...
See java.util.concurennt and apache collections framework (it has some classes that are optimized do work correctly in concurrent environment when there is more reads (that are unsynchronized) than writes - see FastHashMap.
You can also synchronize over iteratins over the list.
List<String> safeList = Collections.synchronizedList( originalList );
public void doSomething() {
synchronized(safeList){
for(String s : safeList){
System.out.println(s);
}
}
}
This will lock the list on synchronization and block all threads that try to access the list while you edit it or iterate over it. The downside is that you create a bottleneck.
This saves some memory over the .clone() method and might be faster depending on what you're doing in the iteration...
Collections.synchronizedList() will render a list nominally thread-safe and java.util.concurrent has more powerful features.
This will get rid of your concurrent modification exception. I won't speak to the efficiency however ;)
List<Blah> list = fillMyList();
List<Blah> temp = new ArrayList<Blah>();
for (Blah blah : list) {
//list.remove(blah); would throw the exception
temp.add(blah);
}
list.removeAll(temp);