I understand that modifying an ArrayList makes it not thread-safe.
➠ But if the ArrayList is not being modified, perhaps protected by a call to Collections.unmodifiableList, is calling ArrayList::get thread-safe?
For example, can an ArrayList be passed to a Java Stream for parallel-processing of its elements?
But if the ArrayList is not being modified is calling ArrayList::get thread-safe?
No it is not thread-safe.
The problems arise if you do something like the following:
Thread A creates and populates list.
Thread A passes reference to list to thread B (without a happens before relationship)
Thread B calls get on the list.
Unless there is a proper happens before chain between 1 and 3, thread B may see stale values ... occasionally ... on some platforms under certain work loads.
There are ways to address this. For example, if thread A starts thread B after step 1, there will be a happens before. Similarly, there will be happens before if A passes the list reference to B via properly synchronized setter / getter calls or a volatile variable.
But the bottom line is that (just) not changing the list is not sufficient to make it thread-safe.
... perhaps protected by a call to Collections.unmodifiableList
The creation of the Collections.unmodifiableList should provide the happens before relationship ... provided that you access the list via the wrapper not directly via ArrayList::get.
For example, can an ArrayList be passed to a Java Stream for parallel-processing of its elements?
That's a specific situation. The stream mechanisms will provide the happens before relationship. Provided they are used as intended. It is complicated.
This comes from the Spliterator interface javadoc.
"Despite their obvious utility in parallel algorithms, spliterators are not expected to be thread-safe; instead, implementations of parallel algorithms using spliterators should ensure that the spliterator is only used by one thread at a time. This is generally easy to attain via serial thread-confinement, which often is a natural consequence of typical parallel algorithms that work by recursive decomposition. A thread calling trySplit() may hand over the returned Spliterator to another thread, which in turn may traverse or further split that Spliterator. The behaviour of splitting and traversal is undefined if two or more threads operate concurrently on the same spliterator. If the original thread hands a spliterator off to another thread for processing, it is best if that handoff occurs before any elements are consumed with tryAdvance(), as certain guarantees (such as the accuracy of estimateSize() for SIZED spliterators) are only valid before traversal has begun."
In other words, thread safety is a joint responsibility of the Spliterator implementation and Stream implementation.
The simple way to think about this is that "magic happens" ... because if it didn't then parallel streams would be unusable.
But note that the Spliterator is not necessarily using ArrayList::get at all.
Thread safety is only a concern, as you stated when values can change between the threads. If the elements aren't being added or removed, the object remains the same and all threads can easily operate on it. This is the same for most objects in Java.
You may be able to get away with adding to an ArrayList across threads as seen here but I wouldn't bank on it.
No, ArrayList.get() is not inherently thread-safe just because it does not modify the List. You still need something to create a happens-before relationship between each get() and each method invocation that does modify the list.
Suppose, however, that you instantiate and populate the list first, and then perform multiple get()s, never modifying it again, or at least not until after some synchronization point following all the get()s. You do not then need mutual synchronization between the various get()s, and you may be able to obtain cheap synchronization between the get()s and the end of the initialization phase. This is effectively the situation you will have with an otherwise non-shared List that you provide as input to a parallel stream computation.
Related
While reading the documentation about streams, I came across the following sentences:
... attempting to access mutable state from behavioral parameters presents you with a bad choice ... if you do not synchronize access to that state, you have a data race and therefore your code is broken ... [1]
If the behavioral parameters do have side-effects ... [there are no] guarantees that different operations on the "same" element within the same stream pipeline are executed in the same thread. [2]
For any given element, the action may be performed at whatever time and in whatever thread the library chooses. [3]
These sentences don't make a distinction between sequential and parallel streams. So my questions are:
In which thread is the pipeline of a sequential stream executed? Is it always the calling thread or is an implementation free to choose any thread?
In which thread is the action parameter of the forEach terminal operation executed if the stream is sequential?
Do I have to use any synchronization when using sequential streams?
[1+2] https://docs.oracle.com/javase/8/docs/api/java/util/stream/package-summary.html
[3] https://docs.oracle.com/javase/8/docs/api/java/util/stream/Stream.html#forEach-java.util.function.Consumer-
This all boils down to what is guaranteed based on the specification, and the fact that a current implementation may have additional behaviors beyond what is guaranteed.
Java Language Architect Brian Goetz made a relevant point regarding specifications in a related question:
Specifications exist to describe the minimal guarantees a caller can depend on, not to describe what the implementation does.
[...]
When a specification says "does not preserve property X", it does not mean that the property X may never be observed; it means the implementation is not obligated to preserve it. [...] (HashSet doesn't promise that iterating its elements preserves the order they were inserted, but that doesn't mean this can't accidentally happen -- you just can't count on it.)
This all means that even if the current implementation happens to have certain behavioral characteristics, they should not be relied upon nor assumed that they will not change in new versions of the library.
Sequential stream pipeline thread
In which thread is the pipeline of a sequential stream executed? Is it always the calling thread or is an implementation free to choose any thread?
Current stream implementations may or may not use the calling thread, and may use one or multiple threads. As none of this is specified by the API, this behavior should not be relied on.
forEach execution thread
In which thread is the action parameter of the forEach terminal operation executed if the stream is sequential?
While current implementations use the existing thread, this cannot be relied on, as the documentation states that the choice of thread is up to the implementation. In fact, there are no guarantees that the elements aren't processed by different threads for different elements, though that is not something the current stream implementation does either.
Per the API:
For any given element, the action may be performed at whatever time and in whatever thread the library chooses.
Note that while the API calls out parallel streams specifically when discussing encounter order, that was clarified by Brian Goetz to clarify the motivation of the behavior, and not that any of the behavior is specific to parallel streams:
The intent of calling out the parallel case explicitly here was pedagogical [...]. However, to a reader who is unaware of parallelism, it would be almost impossible to not assume that forEach would preserve encounter order, so this sentence was added to help clarify the motivation.
Synchronization using sequential streams
Do I have to use any synchronization when using sequential streams?
Current implementations will likely work since they use a single thread for the sequential stream's forEach method. However, as it is not guaranteed by the stream specification, it should not be relied on. Therefore, synchronization should be used as though the methods could be called by multiple threads.
That said, the stream documentation specifically recommends against using side-effects that would require synchronization, and suggest using reduction operations instead of mutable accumulators:
Many computations where one might be tempted to use side effects can be more safely and efficiently expressed without side-effects, such as using reduction instead of mutable accumulators. [...] A small number of stream operations, such as forEach() and peek(), can operate only via side-effects; these should be used with care.
As an example of how to transform a stream pipeline that inappropriately uses side-effects to one that does not, the following code searches a stream of strings for those matching a given regular expression, and puts the matches in a list.
ArrayList<String> results = new ArrayList<>();
stream.filter(s -> pattern.matcher(s).matches())
.forEach(s -> results.add(s)); // Unnecessary use of side-effects!
This code unnecessarily uses side-effects. If executed in parallel, the non-thread-safety of ArrayList would cause incorrect results, and adding needed synchronization would cause contention, undermining the benefit of parallelism. Furthermore, using side-effects here is completely unnecessary; the forEach() can simply be replaced with a reduction operation that is safer, more efficient, and more amenable to parallelization:
List<String>results =
stream.filter(s -> pattern.matcher(s).matches())
.collect(Collectors.toList()); // No side-effects!
Stream's terminal operations are blocking operations. In case there is no parallel excution, the thread that executes the terminal operation runs all the operations in the pipeline.
Definition 1.1. Pipeline is a couple of chained methods.
Definition 1.2. Intermediate operations will be located everywhere in the stream except at the end. They return a stream object and does not execute any operation in the pipeline.
Definition 1.3. Terminal operations will be located only at the end of the stream. They execute the pipeline. They does not return stream object so no other Intermidiate operations or terminal operations can be added after them.
From the first solution we can conclude that the calling thread will execute the action method inside the forEach terminal operation on each element in the calling stream.
Java 8 introduces us the Spliterator interface. It has the capabilities of Iterator but also a set of operations to help performing and spliting a task in parallel.
When calling forEach from primitive streams in sequential execution, the calling thread will invoke the Spliterator.forEachRemaining method:
#Override
public void forEach(IntConsumer action) {
if (!isParallel()) {
adapt(sourceStageSpliterator()).forEachRemaining(action);
}
else {
super.forEach(action);
}
}
You can read more on Spliterator in my tutorial: Part 6 - Spliterator
As long as you don't mutate any shared state between multiple threads in one of the stream operations(and it is forbidden - explained soon), you do not need to use any additional synchronization tool or algorithm when you want to run parallel streams.
Stream operations like reduce use accumulator and combiner functions for executing parallel streams. The streams library by definition forbids mutation. You should avoid it.
There are a lot of definitions in concurrent and parallel programming. I will introduce a set of definitions that will serve us best.
Definition 8.1. Concurrent programming is the ability to solve a task using additional synchronization algorithms.
Definition 8.2. Parallel programming is the ability to solve a task without using additional synchronization algorithms.
You can read more about it in my tutorial: Part 7 - Parallel Streams.
I have a program with 3 threads (excluding the main thread). The first thread moves an object across the window, the second thread checks for object collisions, and the third is supposed to add to the ArrayList of objects periodically. All three of these threads are manipulating the same list of objects (Though the first 2 are not actually changing the list, just the objects inside). However, when the thread meant to add to the list tries to add an object, I receive an error. Is it possible to manipulate an ArrayList from a different thread?
You can prevent the race conditions by placing the code that manipulates the array list inside synchronized(arrayList) { ... } blocks.
There is nothing special about ArrayList which prevents it from being read and written from multiple threads. However, note the warning in the Javadoc:
Note that this implementation is not synchronized. If multiple threads access an ArrayList instance concurrently, and at least one of the threads modifies the list structurally, it must be synchronized externally. (A structural modification is any operation that adds or deletes one or more elements, or explicitly resizes the backing array; merely setting the value of an element is not a structural modification.) This is typically accomplished by synchronizing on some object that naturally encapsulates the list. If no such object exists, the list should be "wrapped" using the Collections.synchronizedList method. This is best done at creation time, to prevent accidental unsynchronized access to the list:
List list = Collections.synchronizedList(new ArrayList(...));
It is also worth reading through the Synchronization Tutorial.
Yes you can handle the array in multiple threads. You can read more in the Java documentation about using the synchronized keyword with objects.
First, If you have a multithreaded application...prefer to use something like Vector instead of ArrayList since ArrayList is not considered thread safe.
Also, for handling concurrency,
You can used make a synchronized method and perform operations to that, or use a synchronized block.
The collect operation in Java 8 Stream API is defined as a mutable reduction that can be safely executed in parallel, even if the resulting Collection is not thread safe.
Can we say the same about the Stream.toArray() method?
Is this method a mutable reduction that is thread safe even if the Stream is a parallel stream and the resulting array is not thread safe?
According to https://docs.oracle.com/javase/8/docs/api/java/util/stream/Stream.html#toArray-java.util.function.IntFunction- it should be since they create
... any additional arrays that might be required for a partitioned execution or for resizing
And in deduction, since Stream.toArray() is nothing but stream.toArray(Object[]::new) it should hold for Stream.toArray() too.
The toArray operation is a kind of Mutable Reduction, though not implemented exactly like the collect operation. Instead, it’s more efficient in some cases. But these are unspecified implementation details. The documentation of toArray itself does not say anything about how it is implemented, so regarding your question, you have to resort to more general statements:
package documentation, “Parallelism”
… All streams operations can execute either in serial or in parallel.
…
Except for operations identified as explicitly nondeterministic, such as findAny(), whether a stream executes sequentially or in parallel should not change the result of the computation.
…
Most stream operations accept parameters that describe user-specified behavior, which are often lambda expressions. To preserve correct behavior, these behavioral parameters must be non-interfering, and in most cases must be stateless.
So regardless of how it’s implemented, toArray is a stream operation that can run in parallel and since it’s not specified to have any restrictions or nondeterministic behavior, it will produce the same (correct) result as in sequential mode. That’s the only thing you have to think about.
But if you use the overloaded method toArray(IntFunction), it’s your responsibility to provide an appropriate function, e.g. SomeType[]::new is always non-interfering and stateless so the form toArray(SomeType[]::new) is also thread safe.
Does the non-interference requirement for using streams of non-concurrent data structure sources mean that we can't change the state of an element of the data structure during the execution of a stream pipeline (in addition to that we can't change the source data structure itself)? (Question 1)
In the section about non-interference, in the stream package description, its said:
"For most data sources, preventing interference means ensuring that the data source is not modified at all during the execution of the stream pipeline."
This passage does not mention modifying the state of elements?
For example, assuming "shapes" is non-thread-safe collection (such as ArrayList), is the code below considered to have an interference? (Question 2)
shapes.stream()
.filter(s -> s.getColor() == BLUE)
.forEach(s -> s.setColor(RED));
This example is taken from a reliable source (to say the least), so it should be correct.
But what if I changed stream() to be parallelStream(), will it still be safe and correct? (Question 3)
On the other hand, "Mastering Lambdas" by Naftalin Maurice, another reliable source, makes it clear that changing the state (value) of elements by the pipeline operation is indeed interference. From the section about non-interference (3.2.3):
"But the rules for streams forbid any modification of stream sources—including, for example, changing the value of an element— by any thread, not only pipeline operations."
If what's said in the book is correct, does it mean we can't use the Stream API to modify state of elements (using forEach), and have to do that using the regular iterator (or for-each, or Iterable.forEach)? (Question 4)
There's a bigger class of functions called "functions with side effects". The JavaDoc statement is correct and complete: here interference means modifying the mutable source. Another case is stateful expressions: expressions which depend on the application state or change this state. You may read the Parallelism tutorial on Oracle site.
In general you can modify the stream elements themselves and it should not be called as "interference". Beware though if you have the same mutable object produced several times by the stream source (for example, using Collections.nCopies(10, new MyMutableObject()).parallelStream(). While it's ensured that the same stream element is not processed concurrently by several threads, if your stream produces the same element twice, you may surely have a race condition when modifying it in the forEach, for example.
So while stateful expressions are sometimes smell and should be used with care and avoided if there's a stateless alternative, they are probably ok if they don't interfere with the stream source. When the stateless expression is required (for example, in Stream.map method), it's specially mentioned in the API docs. In forEach documentation only non-interference is required.
So back to your questions:
Question 1: no we can change the element state, and it's not called interference (though called statefullness)
Question 2: no it has no interference unless you have repeating objects in your stream source)
Question 3: you can safely use parallelStream() there
Question 4: no, you can use Stream API in this case.
Modifying the state of an object stored in a data structure is different from reassigning an element of a data structure.
When the other writes "changing the value of an element" presumably they mean as if assigning a new object to an index of an existing List.
From your link:
It is best to avoid any side-effects in the lambdas passed to stream methods. While some side-effects, such as debugging statements that print out values are usually safe, accessing mutable state from these lambdas can cause data races or surprising behavior since lambdas may be executed from many threads simultaneously, and may not see elements in their natural encounter order. Non-interference includes not only not interfering with the source, but not interfering with other lambdas; this sort of interference can arise when one lambda modifies mutable state and another lambda reads it.
As long as the non-interference requirement is satisfied, we can execute parallel operations safely and with predictable results even on non-thread-safe sources such as ArrayList.
This pertains specifically to parallelism and is no different than any other concurrent programming. Modifying state can cause issues with visibility amongst threads.
I have a controller class that runs in thread A and composes a local variable list like this
Thread A
list = new ArrayList<Map<String, Order>>();
list.add(...);
list.add(...);
where Order is a java bean with several primitive properties like String, int, long, etc.
Once this list is constructed, its reference is passed to a UI thread (thread B) of Activity and accessed there. The cross-thread communication is done using a Handler class + post() method.
So the question is, can I access the list data from thread B without synchronization at all? Please note, that after constructed in thread A, the list will not be accessed/modified at all. It just exists like a local variable and is passed to thread B afterwards.
It is safe. The synchronization done at the message queue establishes a happens-before relationship. This of course assumes that you don't modify the Maps either afterwards. Also any objects contained in the maps, and so on must not be modified by other threads without proper synchronization.
In short, if the list and none of the data within it are not modified by other threads than B, you don't need any further synchronization.
It is not clear from the context you provide that where does this happen:
list = new ArrayList<Map<String, Order>>();
list.add(...);
list.add(...);
If it is in a constructor and list is final and the this reference does not leak from the constructor and you are absolutely sure that list won't change (for example by using the unmodifiableList decorator method) and the references to the Order instances are not accessible from elsewhere than it may be OK to not use synchronization. Otherwise you have Sword of Damocles over your head.
I mentioned the Order references because you may not get exceptions if you change them from somewhere else but it may lead to data inconsistency/corruption.
If you can guarantee that the list will not be modified then you don't need synchonization since all threads will always see the same List.
Yes, no need to synchronize if you're only going to read the data.
Note that even if thread A is going to eventually modify the list while thread B (or any other number of threads) is accessing it, you still do not have to synchronize because there's only one writer at any given time.
Sorry, the above statement is not completely correct. As stated in the JavaDoc:
If multiple threads access an ArrayList instance concurrently, and at
least one of the threads modifies the list structurally, it must be
synchronized externally. (A structural modification is any operation
that adds or deletes one or more elements, or explicitly resizes the
backing array; merely setting the value of an element is not a
structural modification.)
Also note I'm not taking into account element modification but purely list modifications.