I am ending up with occasional array index out of bounds exception when using the following code . Any leads ? The size of the array is always approximately around 29-30.
logger.info("devicetripmessageinfo size :{}",deviceMessageInfoList.size());
deviceMessageInfoList.parallelStream().forEach(msg->{
if(msg!=null && msg.getMessageVO()!=null)
{
DeviceTripMessageInfo currentDevTripMsgInfo =
(DeviceTripMessageInfo) msg.getMessageVO();
if(currentDevTripMsgInfo.getValueMap()!=null)
{mapsList.add(currentDevTripMsgInfo.getValueMap());}
}
});
j
ava.lang.ArrayIndexOutOfBoundsException: null
at java.base/jdk.internal.reflect.GeneratedConstructorAccessor26.newInstance(Unknown Source)
at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:490)
at java.base/java.util.concurrent.ForkJoinTask.getThrowableException(ForkJoinTask.java:603)
at java.base/java.util.concurrent.ForkJoinTask.reportException(ForkJoinTask.java:678)
at java.base/java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:737)
at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateParallel(ForEachOps.java:159)
at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateParallel(ForEachOps.java:173)
at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:233)
at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:497)
at java.base/java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:661)
at com.*.*.*.*.worker.*.process(*.java:96)
at com.*.jms.consumer.JMSWorker.processList(JMSWorker.java:279)
at com.*.jms.consumer.JMSWorker.process(JMSWorker.java:244)
at com.*.jms.consumer.JMSWorker.processMessages(JMSWorker.java:200)
at com.*.jms.consumer.JMSWorker.run(JMSWorker.java:136)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: java.lang.ArrayIndexOutOfBoundsException: null
Summary
The problem is that ArrayList is by design not safe for modification by multiple threads concurrently, but the parallel stream is writing to the list from multiple threads. A good solution is to switch to an idiomatic stream implementation:
List msgList = deviceMessageInfoList.parallelStream() // Declare generic type, e.g. List<Map<String, Object>>
.filter(Objects::nonNull)
.map(m -> (DeviceTripMessageInfo) m.getMessageVO())
.filter(Objects::nonNull)
.map(DeviceTripMessageInfo::getValueMap)
.filter(Objects::nonNull)
.collect(Collectors.toUnmodifiableList());
Issue: concurrent modification
The ArrayList Javadocs explain the concurrent modification issue:
Note that this implementation is not synchronized. If multiple threads access an ArrayList instance concurrently, and at least one of the threads modifies the list structurally, it must be synchronized externally. This is typically accomplished by synchronizing on some object that naturally encapsulates the list. If no such object exists, the list should be "wrapped" using the Collections.synchronizedList method. This is best done at creation time, to prevent accidental unsynchronized access to the list
Note that the exception you're seeing is not the only incorrect behavior you might encounter. In my own tests of your code against large lists, the resulting list often contained only some of the elements from the source list.
Note that while switching from a parallel stream to a sequential stream would likely fix the issue in practice, it is dependent on the stream implementation, and not guaranteed by the API. Therefore, such an approach is highly inadvisable, as it could break in future versions of the library. Per the forEach Javadocs:
For any given element, the action may be performed at whatever time and in whatever thread the library chooses. If the action accesses shared state, it is responsible for providing the required synchronization.
Issue: not idiomatic
Aside from the correctness issue, another issue with this approach is that it's not particularly idiomatic to use side effects within stream code. The stream documentation explicitly discourages them.
Side-effects in behavioral parameters to stream operations are, in general, discouraged, as they can often lead to unwitting violations of the statelessness requirement, as well as other thread-safety hazards.
[...]
Many computations where one might be tempted to use side effects can be more safely and efficiently expressed without side-effects, such as using reduction instead of mutable accumulators.
Of particular note, the documentation goes on to describe the exact scenario posted in this question as an inappropriate use of side-effects in a stream:
As an example of how to transform a stream pipeline that inappropriately uses side-effects to one that does not, the following code searches a stream of strings for those matching a given regular expression, and puts the matches in a list.
ArrayList<String> results = new ArrayList<>();
stream.filter(s -> pattern.matcher(s).matches())
.forEach(s -> results.add(s)); // Unnecessary use of side-effects!
This code unnecessarily uses side-effects. If executed in parallel, the non-thread-safety of ArrayList would cause incorrect results, and adding needed synchronization would cause contention, undermining the benefit of parallelism.
Aside: traditional non-stream solution
As an aside, this points to a solution one might use using traditional non-stream code. I will discuss it briefly, since it's helpful to understand traditional solutions to the issue of concurrent list modification. Traditionally, one might replace the ArrayList with either a wrapped syncnhronized version using Collections.synchronizedList or an inherently concurrent collection type such as ConcurrentLinkedQueue. Since these approaches are designed for concurrent insertion, they solve the parallel insert issue, though possibly with additional synchronization contention overhead.
Stream solution
The stream documentation continues on with a replacement for the inappropriate use of side effects:
Furthermore, using side-effects here is completely unnecessary; the forEach() can simply be replaced with a reduction operation that is safer, more efficient, and more amenable to parallelization:
List<String>results =
stream.filter(s -> pattern.matcher(s).matches())
.collect(Collectors.toList()); // No side-effects!
Applying this approach to your code, you get:
List msgList = deviceMessageInfoList.parallelStream() // Declare generic type, e.g. List<Map<String, Object>>
.filter(Objects::nonNull)
.map(m -> (DeviceTripMessageInfo) m.getMessageVO())
.filter(Objects::nonNull)
.map(DeviceTripMessageInfo::getValueMap)
.filter(Objects::nonNull)
.collect(Collectors.toUnmodifiableList());
Even if you change that to a synchronized (or better said a thread-safe List), with your current approach, you still don't have a guaranteed order of how the elements are going to be put in. The documentation, btw, is very clear to discourage such things via forEach, here. Just look-up Side-Effects.
This entire thing can be done in far better way (and easier to read too):
deviceMessageInfoList
.stream()
.parallel()
.filter(Objects::notNull)
.map(x -> x.getMessageVO())
.filter(Objects::notNull)
.map(x -> (DeviceTripMessageInfo) x.getMessageVO())
.map(DeviceTripMessageInfo::getValueMap)
.filter(Objects::notNull)
.collect(Collectors.toList());
Related
Does filter chaining change the outcome if i use parallelStream() instead of stream() ?
I tried with a few thousand records, and the output appeared consistent over a few iterations. But since this involves threads,(and I could not find enough relevant material that talks about this combination) I want to make doubly sure that parallel stream does not impact the output of filter chaining in any way. Example code:
List<Element> list = myList.parallelStream()
.filter(element -> element.getId() > 10)
.filter(element -> element.getName().contains("something"))
.collect(Collectors.toList());
Short answer: No.
The filter operation as documented expects a non-interferening and stateless predicate to apply to each element to determine if it should be included as part of the new stream.
Few aspects that you shall consider for that are -
With an exception to concurrent collections(what do you choose as myList in the existing code to be) -
For most data sources, preventing interference means ensuring that the
data source is not modified at all during the execution of the stream
pipeline.
The state of the data sources (myList and its elements within your filter operations are not mutated)
Note also that attempting to access mutable state from behavioral
parameters presents you with a bad choice with respect to safety and
performance;
Moreover, think around it, what is it in your filter operation that would be impacted by multiple threads. Given the current code, nothing functionally, as long as both the operations are executed, you would get a consistent result regardless of the thread(s) executing them.
Say I have this list of fruits:-
List<String> f = Arrays.asList("Banana", "Apple", "Grape", "Orange", "Kiwi");
I need to prepend a serial number to each fruit and print it. The order of fruit or serial number does not matter. So this is a valid output:-
4. Kiwi
3. Orange
1. Grape
2. Apple
5. Banana
Solution #1
AtomicInteger number = new AtomicInteger(0);
String result = f.parallelStream()
.map(i -> String.format("%d. %s", number.incrementAndGet(), i))
.collect(Collectors.joining("\n"));
Solution #2
String result = IntStream.rangeClosed(1, f.size())
.parallel()
.mapToObj(i -> String.format("%d. %s", i, f.get(i - 1)))
.collect(Collectors.joining("\n"));
Question
Why is solution #1 a bad practice? I have seen at a lot of places that AtomicInteger based solutions are bad (like in this answer), specially in parallel stream processing (that's the reason I used parallel streams above, to try run into issues).
I looked at these questions/answers:-
In which cases Stream operations should be stateful?
Is use of AtomicInteger for indexing in Stream a legit way?
Java 8: Preferred way to count iterations of a lambda?
They just mention (unless I missed something) "unexpected results can occur". Like what? Can it happen in this example? If not, can you provide me an example where it can happen?
As for "no guarantees are made as to the order in which the mapper function is applied", well, that's the nature of parallel processing, so I accept it, and also, the order doesn't matter in this particular example.
AtomicInteger is thread safe, so it shouldn't be a problem in parallel processing.
Can someone provide examples in which cases there will be issues while using such a state-based solution?
Well look at what the answer from Stuart Marks here - he is using a stateful predicate.
The are a couple of potential problems, but if you don't care about them or really understand them - you should be fine.
First is order, exhibited under the current implementation for parallel processing, but if you don't care about order, like in your example, you are ok.
Second one is potential speed AtomicInteger will be times slower to increment that a simple int, as said, if you care about this.
Third one is more subtle. Sometimes there is no guarantee that map will be executed, at all, for example since java-9:
someStream.map(i -> /* do something with i and numbers */)
.count();
The point here is that since you are counting, there is no need to do the mapping, so its skipped. In general, the elements that hit some intermediate operation are not guaranteed to get to the terminal one. Imagine a map.filter.map situation, the first map might "see" more elements compared to the second one, because some elements might be filtered. So it's not recommended to rely on this, unless you can reason exactly what is going on.
In your example, IMO, you are more than safe to do what you do; but if you slightly change your code, this requires additional reasoning to prove it's correctness. I would go with solution 2, just because it's a lot easier to understand for me and it does not have the potential problems listed above.
Note also that attempting to access mutable state from behavioral parameters presents you with a bad choice with respect to safety and performance; if you do not synchronize access to that state, you have a data race and therefore your code is broken, but if you do synchronize access to that state, you risk having contention undermine the parallelism you are seeking to benefit from. The best approach is to avoid stateful behavioral parameters to stream operations entirely; there is usually a way to restructure the stream pipeline to avoid statefulness.
Package java.util.stream, Stateless behaviors
From the perspective of thread-safety and correctness, there is nothing wrong with solution 1. Performance (as an advantage of parallel processing) might suffer, though.
Why is solution #1 a bad practice?
I wouldn't say it's a bad practice or something unacceptable. It's simply not recommended for the sake of performance.
They just mention (unless I missed something) "unexpected results can occur". Like what?
"Unexpected results" is a very broad term, and usually refers to improper synchronisation, "What's the hell just happened?"-like behaviour.
Can it happen in this example?
It's not the case. You are likely not going to run into issues.
If not, can you provide me an example where it can happen?
Change the AtomicInteger to an int*, replace number.incrementAndGet() with ++number, and you will have one.
*a boxed int (e.g. wrapper-based, array-based) so you can work with it within a lambda
Case 2 - In API notes of IntStream class returns a sequential ordered IntStream from startInclusive (inclusive) to endInclusive (inclusive) by an incremental step of 1 kind of for loop thus parallel stream are processing it one by one and providing the correct order.
* #param startInclusive the (inclusive) initial value
* #param endInclusive the inclusive upper bound
* #return a sequential {#code IntStream} for the range of {#code int}
* elements
*/
public static IntStream rangeClosed(int startInclusive, int endInclusive) {
Case 1 - It is obvious that the list will be processed in parallel thus the order will not be correct. Since mapping operation is performed in parallel, the results for the same input could vary from run to run, due to thread scheduling differences thus no guarantees that different operations on the "same" element within the same stream pipeline are executed in the same thread also there is no guarantee how a mapper function is also applied to the particular elements within the stream.
Source Java Doc
While reading the documentation about streams, I came across the following sentences:
... attempting to access mutable state from behavioral parameters presents you with a bad choice ... if you do not synchronize access to that state, you have a data race and therefore your code is broken ... [1]
If the behavioral parameters do have side-effects ... [there are no] guarantees that different operations on the "same" element within the same stream pipeline are executed in the same thread. [2]
For any given element, the action may be performed at whatever time and in whatever thread the library chooses. [3]
These sentences don't make a distinction between sequential and parallel streams. So my questions are:
In which thread is the pipeline of a sequential stream executed? Is it always the calling thread or is an implementation free to choose any thread?
In which thread is the action parameter of the forEach terminal operation executed if the stream is sequential?
Do I have to use any synchronization when using sequential streams?
[1+2] https://docs.oracle.com/javase/8/docs/api/java/util/stream/package-summary.html
[3] https://docs.oracle.com/javase/8/docs/api/java/util/stream/Stream.html#forEach-java.util.function.Consumer-
This all boils down to what is guaranteed based on the specification, and the fact that a current implementation may have additional behaviors beyond what is guaranteed.
Java Language Architect Brian Goetz made a relevant point regarding specifications in a related question:
Specifications exist to describe the minimal guarantees a caller can depend on, not to describe what the implementation does.
[...]
When a specification says "does not preserve property X", it does not mean that the property X may never be observed; it means the implementation is not obligated to preserve it. [...] (HashSet doesn't promise that iterating its elements preserves the order they were inserted, but that doesn't mean this can't accidentally happen -- you just can't count on it.)
This all means that even if the current implementation happens to have certain behavioral characteristics, they should not be relied upon nor assumed that they will not change in new versions of the library.
Sequential stream pipeline thread
In which thread is the pipeline of a sequential stream executed? Is it always the calling thread or is an implementation free to choose any thread?
Current stream implementations may or may not use the calling thread, and may use one or multiple threads. As none of this is specified by the API, this behavior should not be relied on.
forEach execution thread
In which thread is the action parameter of the forEach terminal operation executed if the stream is sequential?
While current implementations use the existing thread, this cannot be relied on, as the documentation states that the choice of thread is up to the implementation. In fact, there are no guarantees that the elements aren't processed by different threads for different elements, though that is not something the current stream implementation does either.
Per the API:
For any given element, the action may be performed at whatever time and in whatever thread the library chooses.
Note that while the API calls out parallel streams specifically when discussing encounter order, that was clarified by Brian Goetz to clarify the motivation of the behavior, and not that any of the behavior is specific to parallel streams:
The intent of calling out the parallel case explicitly here was pedagogical [...]. However, to a reader who is unaware of parallelism, it would be almost impossible to not assume that forEach would preserve encounter order, so this sentence was added to help clarify the motivation.
Synchronization using sequential streams
Do I have to use any synchronization when using sequential streams?
Current implementations will likely work since they use a single thread for the sequential stream's forEach method. However, as it is not guaranteed by the stream specification, it should not be relied on. Therefore, synchronization should be used as though the methods could be called by multiple threads.
That said, the stream documentation specifically recommends against using side-effects that would require synchronization, and suggest using reduction operations instead of mutable accumulators:
Many computations where one might be tempted to use side effects can be more safely and efficiently expressed without side-effects, such as using reduction instead of mutable accumulators. [...] A small number of stream operations, such as forEach() and peek(), can operate only via side-effects; these should be used with care.
As an example of how to transform a stream pipeline that inappropriately uses side-effects to one that does not, the following code searches a stream of strings for those matching a given regular expression, and puts the matches in a list.
ArrayList<String> results = new ArrayList<>();
stream.filter(s -> pattern.matcher(s).matches())
.forEach(s -> results.add(s)); // Unnecessary use of side-effects!
This code unnecessarily uses side-effects. If executed in parallel, the non-thread-safety of ArrayList would cause incorrect results, and adding needed synchronization would cause contention, undermining the benefit of parallelism. Furthermore, using side-effects here is completely unnecessary; the forEach() can simply be replaced with a reduction operation that is safer, more efficient, and more amenable to parallelization:
List<String>results =
stream.filter(s -> pattern.matcher(s).matches())
.collect(Collectors.toList()); // No side-effects!
Stream's terminal operations are blocking operations. In case there is no parallel excution, the thread that executes the terminal operation runs all the operations in the pipeline.
Definition 1.1. Pipeline is a couple of chained methods.
Definition 1.2. Intermediate operations will be located everywhere in the stream except at the end. They return a stream object and does not execute any operation in the pipeline.
Definition 1.3. Terminal operations will be located only at the end of the stream. They execute the pipeline. They does not return stream object so no other Intermidiate operations or terminal operations can be added after them.
From the first solution we can conclude that the calling thread will execute the action method inside the forEach terminal operation on each element in the calling stream.
Java 8 introduces us the Spliterator interface. It has the capabilities of Iterator but also a set of operations to help performing and spliting a task in parallel.
When calling forEach from primitive streams in sequential execution, the calling thread will invoke the Spliterator.forEachRemaining method:
#Override
public void forEach(IntConsumer action) {
if (!isParallel()) {
adapt(sourceStageSpliterator()).forEachRemaining(action);
}
else {
super.forEach(action);
}
}
You can read more on Spliterator in my tutorial: Part 6 - Spliterator
As long as you don't mutate any shared state between multiple threads in one of the stream operations(and it is forbidden - explained soon), you do not need to use any additional synchronization tool or algorithm when you want to run parallel streams.
Stream operations like reduce use accumulator and combiner functions for executing parallel streams. The streams library by definition forbids mutation. You should avoid it.
There are a lot of definitions in concurrent and parallel programming. I will introduce a set of definitions that will serve us best.
Definition 8.1. Concurrent programming is the ability to solve a task using additional synchronization algorithms.
Definition 8.2. Parallel programming is the ability to solve a task without using additional synchronization algorithms.
You can read more about it in my tutorial: Part 7 - Parallel Streams.
Does the non-interference requirement for using streams of non-concurrent data structure sources mean that we can't change the state of an element of the data structure during the execution of a stream pipeline (in addition to that we can't change the source data structure itself)? (Question 1)
In the section about non-interference, in the stream package description, its said:
"For most data sources, preventing interference means ensuring that the data source is not modified at all during the execution of the stream pipeline."
This passage does not mention modifying the state of elements?
For example, assuming "shapes" is non-thread-safe collection (such as ArrayList), is the code below considered to have an interference? (Question 2)
shapes.stream()
.filter(s -> s.getColor() == BLUE)
.forEach(s -> s.setColor(RED));
This example is taken from a reliable source (to say the least), so it should be correct.
But what if I changed stream() to be parallelStream(), will it still be safe and correct? (Question 3)
On the other hand, "Mastering Lambdas" by Naftalin Maurice, another reliable source, makes it clear that changing the state (value) of elements by the pipeline operation is indeed interference. From the section about non-interference (3.2.3):
"But the rules for streams forbid any modification of stream sources—including, for example, changing the value of an element— by any thread, not only pipeline operations."
If what's said in the book is correct, does it mean we can't use the Stream API to modify state of elements (using forEach), and have to do that using the regular iterator (or for-each, or Iterable.forEach)? (Question 4)
There's a bigger class of functions called "functions with side effects". The JavaDoc statement is correct and complete: here interference means modifying the mutable source. Another case is stateful expressions: expressions which depend on the application state or change this state. You may read the Parallelism tutorial on Oracle site.
In general you can modify the stream elements themselves and it should not be called as "interference". Beware though if you have the same mutable object produced several times by the stream source (for example, using Collections.nCopies(10, new MyMutableObject()).parallelStream(). While it's ensured that the same stream element is not processed concurrently by several threads, if your stream produces the same element twice, you may surely have a race condition when modifying it in the forEach, for example.
So while stateful expressions are sometimes smell and should be used with care and avoided if there's a stateless alternative, they are probably ok if they don't interfere with the stream source. When the stateless expression is required (for example, in Stream.map method), it's specially mentioned in the API docs. In forEach documentation only non-interference is required.
So back to your questions:
Question 1: no we can change the element state, and it's not called interference (though called statefullness)
Question 2: no it has no interference unless you have repeating objects in your stream source)
Question 3: you can safely use parallelStream() there
Question 4: no, you can use Stream API in this case.
Modifying the state of an object stored in a data structure is different from reassigning an element of a data structure.
When the other writes "changing the value of an element" presumably they mean as if assigning a new object to an index of an existing List.
From your link:
It is best to avoid any side-effects in the lambdas passed to stream methods. While some side-effects, such as debugging statements that print out values are usually safe, accessing mutable state from these lambdas can cause data races or surprising behavior since lambdas may be executed from many threads simultaneously, and may not see elements in their natural encounter order. Non-interference includes not only not interfering with the source, but not interfering with other lambdas; this sort of interference can arise when one lambda modifies mutable state and another lambda reads it.
As long as the non-interference requirement is satisfied, we can execute parallel operations safely and with predictable results even on non-thread-safe sources such as ArrayList.
This pertains specifically to parallelism and is no different than any other concurrent programming. Modifying state can cause issues with visibility amongst threads.
In a multi-threaded application I'm working on, we occasionally see ConcurrentModificationExceptions on our Lists (which are mostly ArrayList, sometimes Vectors). But there are other times when I think concurrent modifications are happening because iterating through the collection appears to be missing items, but no exceptions are thrown. I know that the docs for ConcurrentModificationException says you can't rely on it, but how would I go about making sure I'm not concurrently modifying a List? And is wrapping every access to the collection in a synchronized block the only way to prevent it?
Update: Yes, I know about Collections.synchronizedCollection, but it doesn't guard against somebody modifying the collection while you're iterating through it. I think at least some of my problem is happening when somebody adds something to a collection while I'm iterating through it.
Second Update If somebody wants to combine the mention of the synchronizedCollection and cloning like Jason did with a mention of the java.util.concurrent and the apache collections frameworks like jacekfoo and Javamann did, I can accept an answer.
Depending on your update frequency one of my favorites is the CopyOnWriteArrayList or CopyOnWriteArraySet. They create a new list/set on updates to avoid concurrent modification exception.
Your original question seems to be asking for an iterator that sees live updates to the underlying collection while remaining thread-safe. This is an incredibly expensive problem to solve in the general case, which is why none of the standard collection classes do it.
There are lots of ways of achieving partial solutions to the problem, and in your application, one of those may be sufficient.
Jason gives a specific way to achieve thread safety, and to avoid throwing a ConcurrentModificationException, but only at the expense of liveness.
Javamann mentions two specific classes in the java.util.concurrent package that solve the same problem in a lock-free way, where scalability is critical. These only shipped with Java 5, but there have been various projects that backport the functionality of the package into earlier Java versions, including this one, though they won't have such good performance in earlier JREs.
If you are already using some of the Apache Commons libraries, then as jacekfoo points out, the apache collections framework contains some helpful classes.
You might also consider looking at the Google collections framework.
Check out java.util.concurrent for versions of the standard Collections classes that are engineered to handle concurrency better.
Yes you have to synchronize access to collections objects.
Alternatively, you can use the synchronized wrappers around any existing object. See Collections.synchronizedCollection(). For example:
List<String> safeList = Collections.synchronizedList( originalList );
However all code needs to use the safe version, and even so iterating while another thread modifies will result in problems.
To solve the iteration problem, copy the list first. Example:
for ( String el : safeList.clone() )
{ ... }
For more optimized, thread-safe collections, also look at java.util.concurrent.
Usually you get a ConcurrentModificationException if you're trying to remove an element from a list whilst it's being iterated through.
The easiest way to test this is:
List<Blah> list = new ArrayList<Blah>();
for (Blah blah : list) {
list.remove(blah); // will throw the exception
}
I'm not sure how you'd get around it. You may have to implement your own thread-safe list, or you could create copies of the original list for writing and have a synchronized class that writes to the list.
You could try using defensive copying so that modifications to one List don't affect others.
Wrapping accesses to the collection in a synchronized block is the correct way to do this. Standard programming practice dictates the use of some sort of locking mechanism (semaphore, mutex, etc) when dealing with state that is shared across multiple threads.
Depending on your use case however you can usually make some optimizations to only lock in certain cases. For example, if you have a collection that is frequently read but rarely written, then you can allow concurrent reads but enforce a lock whenever a write is in progress. Concurrent reads only cause conflicts if the collection is in the process of being modified.
ConcurrentModificationException is best-effort because what you're asking is a hard problem. There's no good way to do this reliably without sacrificing performance besides proving that your access patterns do not concurrently modify the list.
Synchronization would likely prevent concurrent modifications, and it may be what you resort to in the end, but it can end up being costly. The best thing to do is probably to sit down and think for a while about your algorithm. If you can't come up with a lock-free solution, then resort to synchronization.
See the implementation. It basically stores an int:
transient volatile int modCount;
and that is incremented when there is a 'structural modification' (like remove). If iterator detects that modCount changed it throws Concurrent modification exception.
Synchronizing (via Collections.synchronizedXXX) won't do good since it does not guarantee iterator safety it only synchronizes writes and reads via put, get, set ...
See java.util.concurennt and apache collections framework (it has some classes that are optimized do work correctly in concurrent environment when there is more reads (that are unsynchronized) than writes - see FastHashMap.
You can also synchronize over iteratins over the list.
List<String> safeList = Collections.synchronizedList( originalList );
public void doSomething() {
synchronized(safeList){
for(String s : safeList){
System.out.println(s);
}
}
}
This will lock the list on synchronization and block all threads that try to access the list while you edit it or iterate over it. The downside is that you create a bottleneck.
This saves some memory over the .clone() method and might be faster depending on what you're doing in the iteration...
Collections.synchronizedList() will render a list nominally thread-safe and java.util.concurrent has more powerful features.
This will get rid of your concurrent modification exception. I won't speak to the efficiency however ;)
List<Blah> list = fillMyList();
List<Blah> temp = new ArrayList<Blah>();
for (Blah blah : list) {
//list.remove(blah); would throw the exception
temp.add(blah);
}
list.removeAll(temp);