Parallelism and Flatmap in Java 8 Streams

Parallelism and Flatmap in Java 8 Streams - java

Consider the following example:
IntStream.of(-1, 1)
.parallel()
.flatMap(i->IntStream.range(0,1000).parallel())
.forEach(System.out::println);
Does it matter whether I set the inner flag to parallel? The results look very similar if I leave it away or not.
Also why does the code (ReferencePipeline) sequentialize the mapping?
I am confused by the line:
result.sequential().forEach(downstream);

In the current JDK (jdk1.8.0_25), the answer is no, it doesn't matter you set the inner flag to parallel,
because even you set it, the .flatMap() implementation set's back the stream to sequential here:
result.sequential().forEach(downstream);
("result" is the inner stream and it's sequential() method's doc says: Returns an equivalent stream that is sequential. May return itself, either because the stream was already sequential, or because the underlying stream state was modified to be sequential.)
In most cases there could be no effort to make the inner stream parallel; if outer stream has at least same number of items as number of threads that can run parallel (ForkJoinPool.commonPool().getParallelism() = 3 in my computer).

For anyone like me, who has a dire need to parallelize flatMap and needs some practical solution, not only history and theory.
The simplest solution I came up with is to do flattening by hand, basically by replacing it with map + reduce(Stream::concat).
Already posted an answer with details in another thread:
https://stackoverflow.com/a/66386078/3606820

Related

Why AtomicInteger based Stream solutions are not recommended?

Say I have this list of fruits:-
List<String> f = Arrays.asList("Banana", "Apple", "Grape", "Orange", "Kiwi");
I need to prepend a serial number to each fruit and print it. The order of fruit or serial number does not matter. So this is a valid output:-
4. Kiwi
3. Orange
1. Grape
2. Apple
5. Banana
Solution #1
AtomicInteger number = new AtomicInteger(0);
String result = f.parallelStream()
.map(i -> String.format("%d. %s", number.incrementAndGet(), i))
.collect(Collectors.joining("\n"));
Solution #2
String result = IntStream.rangeClosed(1, f.size())
.parallel()
.mapToObj(i -> String.format("%d. %s", i, f.get(i - 1)))
.collect(Collectors.joining("\n"));
Question
Why is solution #1 a bad practice? I have seen at a lot of places that AtomicInteger based solutions are bad (like in this answer), specially in parallel stream processing (that's the reason I used parallel streams above, to try run into issues).
I looked at these questions/answers:-
In which cases Stream operations should be stateful?
Is use of AtomicInteger for indexing in Stream a legit way?
Java 8: Preferred way to count iterations of a lambda?
They just mention (unless I missed something) "unexpected results can occur". Like what? Can it happen in this example? If not, can you provide me an example where it can happen?
As for "no guarantees are made as to the order in which the mapper function is applied", well, that's the nature of parallel processing, so I accept it, and also, the order doesn't matter in this particular example.
AtomicInteger is thread safe, so it shouldn't be a problem in parallel processing.
Can someone provide examples in which cases there will be issues while using such a state-based solution?

Well look at what the answer from Stuart Marks here - he is using a stateful predicate.
The are a couple of potential problems, but if you don't care about them or really understand them - you should be fine.
First is order, exhibited under the current implementation for parallel processing, but if you don't care about order, like in your example, you are ok.
Second one is potential speed AtomicInteger will be times slower to increment that a simple int, as said, if you care about this.
Third one is more subtle. Sometimes there is no guarantee that map will be executed, at all, for example since java-9:
someStream.map(i -> /* do something with i and numbers */)
.count();
The point here is that since you are counting, there is no need to do the mapping, so its skipped. In general, the elements that hit some intermediate operation are not guaranteed to get to the terminal one. Imagine a map.filter.map situation, the first map might "see" more elements compared to the second one, because some elements might be filtered. So it's not recommended to rely on this, unless you can reason exactly what is going on.
In your example, IMO, you are more than safe to do what you do; but if you slightly change your code, this requires additional reasoning to prove it's correctness. I would go with solution 2, just because it's a lot easier to understand for me and it does not have the potential problems listed above.

Note also that attempting to access mutable state from behavioral parameters presents you with a bad choice with respect to safety and performance; if you do not synchronize access to that state, you have a data race and therefore your code is broken, but if you do synchronize access to that state, you risk having contention undermine the parallelism you are seeking to benefit from. The best approach is to avoid stateful behavioral parameters to stream operations entirely; there is usually a way to restructure the stream pipeline to avoid statefulness.
Package java.util.stream, Stateless behaviors
From the perspective of thread-safety and correctness, there is nothing wrong with solution 1. Performance (as an advantage of parallel processing) might suffer, though.
Why is solution #1 a bad practice?
I wouldn't say it's a bad practice or something unacceptable. It's simply not recommended for the sake of performance.
They just mention (unless I missed something) "unexpected results can occur". Like what?
"Unexpected results" is a very broad term, and usually refers to improper synchronisation, "What's the hell just happened?"-like behaviour.
Can it happen in this example?
It's not the case. You are likely not going to run into issues.
If not, can you provide me an example where it can happen?
Change the AtomicInteger to an int*, replace number.incrementAndGet() with ++number, and you will have one.
*a boxed int (e.g. wrapper-based, array-based) so you can work with it within a lambda

Case 2 - In API notes of IntStream class returns a sequential ordered IntStream from startInclusive (inclusive) to endInclusive (inclusive) by an incremental step of 1 kind of for loop thus parallel stream are processing it one by one and providing the correct order.
* #param startInclusive the (inclusive) initial value
* #param endInclusive the inclusive upper bound
* #return a sequential {#code IntStream} for the range of {#code int}
* elements
*/
public static IntStream rangeClosed(int startInclusive, int endInclusive) {
Case 1 - It is obvious that the list will be processed in parallel thus the order will not be correct. Since mapping operation is performed in parallel, the results for the same input could vary from run to run, due to thread scheduling differences thus no guarantees that different operations on the "same" element within the same stream pipeline are executed in the same thread also there is no guarantee how a mapper function is also applied to the particular elements within the stream.
Source Java Doc

Java Stream: difference between forEach and forEachOrdered

Premise: I've already read this question and others, but I need some clarifications.
I understand that Stream.forEach method makes the difference (not only) when dealing with parallel streams, and this explains why this
//1
Stream.of("one ","two ","three ","four ","five ","six ")
.parallel()
.forEachOrdered(item -> System.out.print(item));
prints
one two three four five six
But when it comes to intermediate operations, the order is not guaranteed anymore when stream is parallelized. So this code
//2
Stream.of("one ","two ","three ","four ","five ","six ")
.parallel()
.peek(item -> System.out.print(item))
.forEachOrdered(item -> System.out.print(""));
will print something like
four six five one three two
Is it correct to say that forEachOrdered method only affects order of elements in its own execution? Intuitively, I'm thinking of //1 example being exactly the same as
//3
Stream.of("one ","two ","three ","four ","five ","six ")
.parallel()
.peek(item -> System.out.print("")) //or any other intermediate operation
.sequential()
.forEach(item -> System.out.print(item));
Is my intuition wrong? Am I missing something about the whole mechanism?

You are right in that the guarantees made for the action of forEachOrdered only apply to that action and nothing else. But it’s wrong to assume that this is the same as .sequential().forEach(…).
sequential will turn the entire stream pipeline into sequential mode, thus, the action passed to forEach will be executed by the same thread, but also the preceding peek’s action. For most intermediate operations, the exact placement of parallel or sequential is irrelevant and specifying both makes no sense as only the last one will be relevant.
Also, there is still no guaranty made about the ordering when using forEach, even if it hasn’t any consequences in the current implementation. This is discussed in “Does Stream.forEach respect the encounter order of sequential streams?”
The documentation of Stream.forEachOrdered states:
This operation processes the elements one at a time, in encounter order if one exists. Performing the action for one element happens-before performing the action for subsequent elements, but for any given element, the action may be performed in whatever thread the library chooses.
So the action may get invoked by different threads, as perceivable by Thread.currentThread() but not run concurrently.
Further, if the stream has an encounter order, it will get reconstituted at this place. This answer sheds some light one the difference of encounter order and processing order.

The #3 code is not conceptually equivalent to #1, because parallel() or sequential() calls affect the whole stream, not just the subsequent operations. So in #3 case the whole procedure will be performed sequentially. Actually #3 case resembles the early design of the Stream API when you actually could change the parallel/sequential mode. This was considered to be unnecessary complication (and actually added problems, see, for example, this discussion) as usually you only need to change mode to make the terminal operation ordered (but not necessarily sequential). So forEachOrdered() was added and parallel()/sequential() semantics was changed to affect the whole stream (see this changeset).
Basically you're right: in parallel stream there's no order guarantee for intermediate operations. If you need to perform them in particular order, you have to use the sequential stream.

forEach vs forEachOrdered in Java 8 Stream

I understand that these methods differ the order of execution but in all my test I cannot achieve different order execution.
Example:
System.out.println("forEach Demo");
Stream.of("AAA","BBB","CCC").forEach(s->System.out.println("Output:"+s));
System.out.println("forEachOrdered Demo");
Stream.of("AAA","BBB","CCC").forEachOrdered(s->System.out.println("Output:"+s));
Output:
forEach Demo
Output:AAA
Output:BBB
Output:CCC
forEachOrdered Demo
Output:AAA
Output:BBB
Output:CCC
Please provide examples when 2 methods will produce different outputs.

Stream.of("AAA","BBB","CCC").parallel().forEach(s->System.out.println("Output:"+s));
Stream.of("AAA","BBB","CCC").parallel().forEachOrdered(s->System.out.println("Output:"+s));
The second line will always output
Output:AAA
Output:BBB
Output:CCC
whereas the first one is not guaranted since the order is not kept. forEachOrdered will processes the elements of the stream in the order specified by its source, regardless of whether the stream is sequential or parallel.
Quoting from forEach Javadoc:
The behavior of this operation is explicitly nondeterministic. For parallel stream pipelines, this operation does not guarantee to respect the encounter order of the stream, as doing so would sacrifice the benefit of parallelism.
When the forEachOrdered Javadoc states (emphasis mine):
Performs an action for each element of this stream, in the encounter order of the stream if the stream has a defined encounter order.

Although forEach shorter and looks prettier, I'd suggest to use forEachOrdered in every place where order matters to explicitly specify this. For sequential streams the forEach seems to respect the order and even stream API internal code uses forEach (for stream which is known to be sequential) where it's semantically necessary to use forEachOrdered! Nevertheless you may later decide to change your stream to parallel and your code will be broken. Also when you use forEachOrdered the reader of your code sees the message: "the order matters here". Thus it documents your code better.
Note also that for parallel streams the forEach not only executed in non-determenistic order, but you can also have it executed simultaneously in different threads for different elements (which is not possible with forEachOrdered).
Finally both forEach/forEachOrdered are rarely useful. In most of the cases you actually need to produce some result, not just side-effect, thus operations like reduce or collect should be more suitable. Expressing reducing-by-nature operation via forEach is usually considered as a bad style.

forEach() method performs an action for each element of this stream. For parallel stream, this operation does not guarantee to maintain order of the stream.
forEachOrdered() method performs an action for each element of this stream, guaranteeing that each element is processed in encounter order for streams that have a defined encounter order.
take the example below:
String str = "sushil mittal";
System.out.println("****forEach without using parallel****");
str.chars().forEach(s -> System.out.print((char) s));
System.out.println("\n****forEach with using parallel****");
str.chars().parallel().forEach(s -> System.out.print((char) s));
System.out.println("\n****forEachOrdered with using parallel****");
str.chars().parallel().forEachOrdered(s -> System.out.print((char) s));
Output:
****forEach without using parallel****
sushil mittal
****forEach with using parallel****
mihul issltat
****forEachOrdered with using parallel****
sushil mittal

We may lose the benefits of parallelism if we use forEachOrdered() with parallel Streams.
As we know, In parallel programming element will print parallelly if we use forEach() method. so the order will not be fixed. But In the use of forEachOrdered() fixed order in parallel Streams.
Stream.of("AAA","BBB","CCC").forEachOrdered(s->System.out.println("Output:"+s));
Output:AAA
Output:BBB
Output:CCC

Can Stream#limit return fewer elements than expected?

If the Stream s below has at least n elements, what are the situations where the stream sLimit may have less than n elements, if any?
Stream sLimit = s.limit(n);
Reason for the question: in this answer, I read that:
Despite the appearances, using limit(10) doesn't necessarily result in a SIZED stream with exactly 10 elements -- it might have fewer.

You misunderstood the statement. If the Stream has at least n elements and you invoke limit(n) on it, it will have exactly n elements but the Stream implementation might not be aware of it and hence have a less than optimal performance.
In contrast, certain Stream sources (Spliterators) know for sure that they have a fixed size, e.g. when creating a Stream for an array or an IntStream via IntStream.range. They can be optimized better than a Stream with a limit(n).
When you create a parallel Stream via Stream.generate(MyClass::new).limit(10), the constructor will still be invoked sequentially and only follow-up operations might run in parallel. In contrast, when using IntStream.range(0, n).mapToObj(i -> new MyClass()), the entire Stream operation, including the constructor calls, can run in parallel.

I think Holger's and Sotirios' answers are accurate, but inasmuch as I'm the guy who made the statement, I guess I should explain myself.
I'm mainly talking about spliterator characteristics, in particular the SIZED characteristic. This is basically "static" information about the stream stages that is known at pipeline setup time, but before the stream actually executes. Indeed, it's used for determining the execution strategy for the stream, so it has to be known before the stream executes.
The limit() operation creates a spliterator that wraps its upstream spliterator, so the limit spliterator needs to determine what characteristics to return. Even if its upstream spliterator is SIZED, it doesn't know the exact size, so it has to turn off the SIZED characteristic.
So if you, the programmer, were to write:
IntStream.range(0, 100).limit(10)
you'd say of course that stream has exactly 10 elements. (And it will.) But the resulting spliterator is still not SIZED. After all, the limit operator doesn't know the difference between the above and this:
IntStream.range(0, 1).limit(10)
at least in terms of spliterator characteristics.
So that's why, even though there are times when it seems like it ought to, the limit operator doesn't return a stream of known size. This in turn affects the splitting strategy, which impacts parallel efficiency.

Mutable parameters in Java 8 Streams

Looking at this question: How to dynamically do filtering in Java 8?
The issue is to truncate a stream after a filter has been executed. I cant use limit because I dont know how long the list is after the filter. So, could we count the slements after the filter?
So, I thought I could create a class that counts and pass the stream through a map.The code is in this answer.
I created a class that counts but leave the elements unaltered, I use a Function here, to avoid to use the lambdas I used in the other answer:
class DoNothingButCount<T > implements Function<T, T> {
AtomicInteger i;
public DoNothingButCount() {
i = new AtomicInteger(0);
}
public T apply(T p) {
i.incrementAndGet();
return p;
}
}
So my Stream was finally:
persons.stream()
.filter(u -> u.size > 12)
.filter(u -> u.weitght > 12)
.map(counter)
.sorted((p1, p2) -> p1.age - p2.age)
.collect(Collectors.toList())
.stream()
.limit((int) (counter.i.intValue() * 0.5))
.sorted((p1, p2) -> p2.length - p1.length)
.limit((int) (counter.i.intValue() * 0.5 * 0.2)).forEach((p) -> System.out.println(p));
But my question is about another part of the my example.
collect(Collectors.toList()).stream().
If I remove that line the consequences are that the counter is ZERO when I try to execute limit. I am somehow cheating the "efectively final" requirement by using a mutable object.
I may be wrong, but I iunderstand that the stream is build first, so if we used mutable objects to pass parameters to any of the steps in the stream these will be taken when the stream is created.
My question is, if my assumption is right, why is this needed? The stream (if non parallel) could be pass sequentially through all the steps (filter, map..) so this limitation is not needed.

Short answer
My question is, if my assumption is right, why is this needed? The
stream (if non parallel) could be pass sequentially through all the
steps (filter, map..) so this limitation is not needed.
As you already know, for parallel streams, this sounds pretty obvious: this limitation is needed because otherwise the result would be non deterministic.
Regarding non-parallel streams, it is not possible because of their current design: each item is only visited once. If streams did work as you suggest, they would do each step on the whole collection before going to the next step, which would probably have an impact on performance, I think. I suspect that's why the language designers made that decision.
Why it technically does not work without collect
You already know that, but here is the explanation for other readers.
From the docs:
Streams are lazy; computation on the source data is only performed
when the terminal operation is initiated, and source elements are
consumed only as needed.
Every intermediate operation of Stream, such as filter() or limit() is actually just some kind of setter that initializes the stream's options.
When you call a terminal operation, such as forEach(), collect() or count(), that's when the computation happens, processing items following the pipeline previously built.
This is why limit()'s argument is evaluated before a single item has gone through the first step of the stream. That's why you need to end the stream with a terminal operation, and start a new one with the limit() you'll then know.
More detailed answer about why not allow it for parallel streams
Let your stream pipeline be step X > step Y > step Z.
We want parallel treatment of our items. Therefore, if we allow step Y's behavior to depend on the items that already went through X, then Y is non deterministic. This is because at the moment an item arrives at step Y, the set of items that have already gone through X won't be the same across multiple executions (because of the threading).
More detailed answer about why not allow it for non-parallel streams
A stream, by definition, is used to process the items in a flow. You could think of a non-parallel stream as follows: one single item goes through all the steps, then the next one goes through all the steps, etc. In fact, the doc says it all:
The elements of a stream are only visited once during the life of a
stream. Like an Iterator, a new stream must be generated to revisit
the same elements of the source.
If streams didn't work like this, it wouldn't be any better than just do each step on the whole collection before going to the next step. That would actually allow mutable parameters in non-parallel streams, but it would probably have a performance impact (because we would iterate multiple times over the collection). Anyway, their current behavior does not allow what you want.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.