I understand that these methods differ the order of execution but in all my test I cannot achieve different order execution.
Example:
System.out.println("forEach Demo");
Stream.of("AAA","BBB","CCC").forEach(s->System.out.println("Output:"+s));
System.out.println("forEachOrdered Demo");
Stream.of("AAA","BBB","CCC").forEachOrdered(s->System.out.println("Output:"+s));
Output:
forEach Demo
Output:AAA
Output:BBB
Output:CCC
forEachOrdered Demo
Output:AAA
Output:BBB
Output:CCC
Please provide examples when 2 methods will produce different outputs.
Stream.of("AAA","BBB","CCC").parallel().forEach(s->System.out.println("Output:"+s));
Stream.of("AAA","BBB","CCC").parallel().forEachOrdered(s->System.out.println("Output:"+s));
The second line will always output
Output:AAA
Output:BBB
Output:CCC
whereas the first one is not guaranted since the order is not kept. forEachOrdered will processes the elements of the stream in the order specified by its source, regardless of whether the stream is sequential or parallel.
Quoting from forEach Javadoc:
The behavior of this operation is explicitly nondeterministic. For parallel stream pipelines, this operation does not guarantee to respect the encounter order of the stream, as doing so would sacrifice the benefit of parallelism.
When the forEachOrdered Javadoc states (emphasis mine):
Performs an action for each element of this stream, in the encounter order of the stream if the stream has a defined encounter order.
Although forEach shorter and looks prettier, I'd suggest to use forEachOrdered in every place where order matters to explicitly specify this. For sequential streams the forEach seems to respect the order and even stream API internal code uses forEach (for stream which is known to be sequential) where it's semantically necessary to use forEachOrdered! Nevertheless you may later decide to change your stream to parallel and your code will be broken. Also when you use forEachOrdered the reader of your code sees the message: "the order matters here". Thus it documents your code better.
Note also that for parallel streams the forEach not only executed in non-determenistic order, but you can also have it executed simultaneously in different threads for different elements (which is not possible with forEachOrdered).
Finally both forEach/forEachOrdered are rarely useful. In most of the cases you actually need to produce some result, not just side-effect, thus operations like reduce or collect should be more suitable. Expressing reducing-by-nature operation via forEach is usually considered as a bad style.
forEach() method performs an action for each element of this stream. For parallel stream, this operation does not guarantee to maintain order of the stream.
forEachOrdered() method performs an action for each element of this stream, guaranteeing that each element is processed in encounter order for streams that have a defined encounter order.
take the example below:
String str = "sushil mittal";
System.out.println("****forEach without using parallel****");
str.chars().forEach(s -> System.out.print((char) s));
System.out.println("\n****forEach with using parallel****");
str.chars().parallel().forEach(s -> System.out.print((char) s));
System.out.println("\n****forEachOrdered with using parallel****");
str.chars().parallel().forEachOrdered(s -> System.out.print((char) s));
Output:
****forEach without using parallel****
sushil mittal
****forEach with using parallel****
mihul issltat
****forEachOrdered with using parallel****
sushil mittal
We may lose the benefits of parallelism if we use forEachOrdered() with parallel Streams.
As we know, In parallel programming element will print parallelly if we use forEach() method. so the order will not be fixed. But In the use of forEachOrdered() fixed order in parallel Streams.
Stream.of("AAA","BBB","CCC").forEachOrdered(s->System.out.println("Output:"+s));
Output:AAA
Output:BBB
Output:CCC
Related
Does filter chaining change the outcome if i use parallelStream() instead of stream() ?
I tried with a few thousand records, and the output appeared consistent over a few iterations. But since this involves threads,(and I could not find enough relevant material that talks about this combination) I want to make doubly sure that parallel stream does not impact the output of filter chaining in any way. Example code:
List<Element> list = myList.parallelStream()
.filter(element -> element.getId() > 10)
.filter(element -> element.getName().contains("something"))
.collect(Collectors.toList());
Short answer: No.
The filter operation as documented expects a non-interferening and stateless predicate to apply to each element to determine if it should be included as part of the new stream.
Few aspects that you shall consider for that are -
With an exception to concurrent collections(what do you choose as myList in the existing code to be) -
For most data sources, preventing interference means ensuring that the
data source is not modified at all during the execution of the stream
pipeline.
The state of the data sources (myList and its elements within your filter operations are not mutated)
Note also that attempting to access mutable state from behavioral
parameters presents you with a bad choice with respect to safety and
performance;
Moreover, think around it, what is it in your filter operation that would be impacted by multiple threads. Given the current code, nothing functionally, as long as both the operations are executed, you would get a consistent result regardless of the thread(s) executing them.
While reading the documentation about streams, I came across the following sentences:
... attempting to access mutable state from behavioral parameters presents you with a bad choice ... if you do not synchronize access to that state, you have a data race and therefore your code is broken ... [1]
If the behavioral parameters do have side-effects ... [there are no] guarantees that different operations on the "same" element within the same stream pipeline are executed in the same thread. [2]
For any given element, the action may be performed at whatever time and in whatever thread the library chooses. [3]
These sentences don't make a distinction between sequential and parallel streams. So my questions are:
In which thread is the pipeline of a sequential stream executed? Is it always the calling thread or is an implementation free to choose any thread?
In which thread is the action parameter of the forEach terminal operation executed if the stream is sequential?
Do I have to use any synchronization when using sequential streams?
[1+2] https://docs.oracle.com/javase/8/docs/api/java/util/stream/package-summary.html
[3] https://docs.oracle.com/javase/8/docs/api/java/util/stream/Stream.html#forEach-java.util.function.Consumer-
This all boils down to what is guaranteed based on the specification, and the fact that a current implementation may have additional behaviors beyond what is guaranteed.
Java Language Architect Brian Goetz made a relevant point regarding specifications in a related question:
Specifications exist to describe the minimal guarantees a caller can depend on, not to describe what the implementation does.
[...]
When a specification says "does not preserve property X", it does not mean that the property X may never be observed; it means the implementation is not obligated to preserve it. [...] (HashSet doesn't promise that iterating its elements preserves the order they were inserted, but that doesn't mean this can't accidentally happen -- you just can't count on it.)
This all means that even if the current implementation happens to have certain behavioral characteristics, they should not be relied upon nor assumed that they will not change in new versions of the library.
Sequential stream pipeline thread
In which thread is the pipeline of a sequential stream executed? Is it always the calling thread or is an implementation free to choose any thread?
Current stream implementations may or may not use the calling thread, and may use one or multiple threads. As none of this is specified by the API, this behavior should not be relied on.
forEach execution thread
In which thread is the action parameter of the forEach terminal operation executed if the stream is sequential?
While current implementations use the existing thread, this cannot be relied on, as the documentation states that the choice of thread is up to the implementation. In fact, there are no guarantees that the elements aren't processed by different threads for different elements, though that is not something the current stream implementation does either.
Per the API:
For any given element, the action may be performed at whatever time and in whatever thread the library chooses.
Note that while the API calls out parallel streams specifically when discussing encounter order, that was clarified by Brian Goetz to clarify the motivation of the behavior, and not that any of the behavior is specific to parallel streams:
The intent of calling out the parallel case explicitly here was pedagogical [...]. However, to a reader who is unaware of parallelism, it would be almost impossible to not assume that forEach would preserve encounter order, so this sentence was added to help clarify the motivation.
Synchronization using sequential streams
Do I have to use any synchronization when using sequential streams?
Current implementations will likely work since they use a single thread for the sequential stream's forEach method. However, as it is not guaranteed by the stream specification, it should not be relied on. Therefore, synchronization should be used as though the methods could be called by multiple threads.
That said, the stream documentation specifically recommends against using side-effects that would require synchronization, and suggest using reduction operations instead of mutable accumulators:
Many computations where one might be tempted to use side effects can be more safely and efficiently expressed without side-effects, such as using reduction instead of mutable accumulators. [...] A small number of stream operations, such as forEach() and peek(), can operate only via side-effects; these should be used with care.
As an example of how to transform a stream pipeline that inappropriately uses side-effects to one that does not, the following code searches a stream of strings for those matching a given regular expression, and puts the matches in a list.
ArrayList<String> results = new ArrayList<>();
stream.filter(s -> pattern.matcher(s).matches())
.forEach(s -> results.add(s)); // Unnecessary use of side-effects!
This code unnecessarily uses side-effects. If executed in parallel, the non-thread-safety of ArrayList would cause incorrect results, and adding needed synchronization would cause contention, undermining the benefit of parallelism. Furthermore, using side-effects here is completely unnecessary; the forEach() can simply be replaced with a reduction operation that is safer, more efficient, and more amenable to parallelization:
List<String>results =
stream.filter(s -> pattern.matcher(s).matches())
.collect(Collectors.toList()); // No side-effects!
Stream's terminal operations are blocking operations. In case there is no parallel excution, the thread that executes the terminal operation runs all the operations in the pipeline.
Definition 1.1. Pipeline is a couple of chained methods.
Definition 1.2. Intermediate operations will be located everywhere in the stream except at the end. They return a stream object and does not execute any operation in the pipeline.
Definition 1.3. Terminal operations will be located only at the end of the stream. They execute the pipeline. They does not return stream object so no other Intermidiate operations or terminal operations can be added after them.
From the first solution we can conclude that the calling thread will execute the action method inside the forEach terminal operation on each element in the calling stream.
Java 8 introduces us the Spliterator interface. It has the capabilities of Iterator but also a set of operations to help performing and spliting a task in parallel.
When calling forEach from primitive streams in sequential execution, the calling thread will invoke the Spliterator.forEachRemaining method:
#Override
public void forEach(IntConsumer action) {
if (!isParallel()) {
adapt(sourceStageSpliterator()).forEachRemaining(action);
}
else {
super.forEach(action);
}
}
You can read more on Spliterator in my tutorial: Part 6 - Spliterator
As long as you don't mutate any shared state between multiple threads in one of the stream operations(and it is forbidden - explained soon), you do not need to use any additional synchronization tool or algorithm when you want to run parallel streams.
Stream operations like reduce use accumulator and combiner functions for executing parallel streams. The streams library by definition forbids mutation. You should avoid it.
There are a lot of definitions in concurrent and parallel programming. I will introduce a set of definitions that will serve us best.
Definition 8.1. Concurrent programming is the ability to solve a task using additional synchronization algorithms.
Definition 8.2. Parallel programming is the ability to solve a task without using additional synchronization algorithms.
You can read more about it in my tutorial: Part 7 - Parallel Streams.
The collect operation in Java 8 Stream API is defined as a mutable reduction that can be safely executed in parallel, even if the resulting Collection is not thread safe.
Can we say the same about the Stream.toArray() method?
Is this method a mutable reduction that is thread safe even if the Stream is a parallel stream and the resulting array is not thread safe?
According to https://docs.oracle.com/javase/8/docs/api/java/util/stream/Stream.html#toArray-java.util.function.IntFunction- it should be since they create
... any additional arrays that might be required for a partitioned execution or for resizing
And in deduction, since Stream.toArray() is nothing but stream.toArray(Object[]::new) it should hold for Stream.toArray() too.
The toArray operation is a kind of Mutable Reduction, though not implemented exactly like the collect operation. Instead, it’s more efficient in some cases. But these are unspecified implementation details. The documentation of toArray itself does not say anything about how it is implemented, so regarding your question, you have to resort to more general statements:
package documentation, “Parallelism”
… All streams operations can execute either in serial or in parallel.
…
Except for operations identified as explicitly nondeterministic, such as findAny(), whether a stream executes sequentially or in parallel should not change the result of the computation.
…
Most stream operations accept parameters that describe user-specified behavior, which are often lambda expressions. To preserve correct behavior, these behavioral parameters must be non-interfering, and in most cases must be stateless.
So regardless of how it’s implemented, toArray is a stream operation that can run in parallel and since it’s not specified to have any restrictions or nondeterministic behavior, it will produce the same (correct) result as in sequential mode. That’s the only thing you have to think about.
But if you use the overloaded method toArray(IntFunction), it’s your responsibility to provide an appropriate function, e.g. SomeType[]::new is always non-interfering and stateless so the form toArray(SomeType[]::new) is also thread safe.
Premise: I've already read this question and others, but I need some clarifications.
I understand that Stream.forEach method makes the difference (not only) when dealing with parallel streams, and this explains why this
//1
Stream.of("one ","two ","three ","four ","five ","six ")
.parallel()
.forEachOrdered(item -> System.out.print(item));
prints
one two three four five six
But when it comes to intermediate operations, the order is not guaranteed anymore when stream is parallelized. So this code
//2
Stream.of("one ","two ","three ","four ","five ","six ")
.parallel()
.peek(item -> System.out.print(item))
.forEachOrdered(item -> System.out.print(""));
will print something like
four six five one three two
Is it correct to say that forEachOrdered method only affects order of elements in its own execution? Intuitively, I'm thinking of //1 example being exactly the same as
//3
Stream.of("one ","two ","three ","four ","five ","six ")
.parallel()
.peek(item -> System.out.print("")) //or any other intermediate operation
.sequential()
.forEach(item -> System.out.print(item));
Is my intuition wrong? Am I missing something about the whole mechanism?
You are right in that the guarantees made for the action of forEachOrdered only apply to that action and nothing else. But it’s wrong to assume that this is the same as .sequential().forEach(…).
sequential will turn the entire stream pipeline into sequential mode, thus, the action passed to forEach will be executed by the same thread, but also the preceding peek’s action. For most intermediate operations, the exact placement of parallel or sequential is irrelevant and specifying both makes no sense as only the last one will be relevant.
Also, there is still no guaranty made about the ordering when using forEach, even if it hasn’t any consequences in the current implementation. This is discussed in “Does Stream.forEach respect the encounter order of sequential streams?”
The documentation of Stream.forEachOrdered states:
This operation processes the elements one at a time, in encounter order if one exists. Performing the action for one element happens-before performing the action for subsequent elements, but for any given element, the action may be performed in whatever thread the library chooses.
So the action may get invoked by different threads, as perceivable by Thread.currentThread() but not run concurrently.
Further, if the stream has an encounter order, it will get reconstituted at this place. This answer sheds some light one the difference of encounter order and processing order.
The #3 code is not conceptually equivalent to #1, because parallel() or sequential() calls affect the whole stream, not just the subsequent operations. So in #3 case the whole procedure will be performed sequentially. Actually #3 case resembles the early design of the Stream API when you actually could change the parallel/sequential mode. This was considered to be unnecessary complication (and actually added problems, see, for example, this discussion) as usually you only need to change mode to make the terminal operation ordered (but not necessarily sequential). So forEachOrdered() was added and parallel()/sequential() semantics was changed to affect the whole stream (see this changeset).
Basically you're right: in parallel stream there's no order guarantee for intermediate operations. If you need to perform them in particular order, you have to use the sequential stream.
Consider the following example:
IntStream.of(-1, 1)
.parallel()
.flatMap(i->IntStream.range(0,1000).parallel())
.forEach(System.out::println);
Does it matter whether I set the inner flag to parallel? The results look very similar if I leave it away or not.
Also why does the code (ReferencePipeline) sequentialize the mapping?
I am confused by the line:
result.sequential().forEach(downstream);
In the current JDK (jdk1.8.0_25), the answer is no, it doesn't matter you set the inner flag to parallel,
because even you set it, the .flatMap() implementation set's back the stream to sequential here:
result.sequential().forEach(downstream);
("result" is the inner stream and it's sequential() method's doc says: Returns an equivalent stream that is sequential. May return itself, either because the stream was already sequential, or because the underlying stream state was modified to be sequential.)
In most cases there could be no effort to make the inner stream parallel; if outer stream has at least same number of items as number of threads that can run parallel (ForkJoinPool.commonPool().getParallelism() = 3 in my computer).
For anyone like me, who has a dire need to parallelize flatMap and needs some practical solution, not only history and theory.
The simplest solution I came up with is to do flattening by hand, basically by replacing it with map + reduce(Stream::concat).
Already posted an answer with details in another thread:
https://stackoverflow.com/a/66386078/3606820