I've gone through several previous questions like Encounter order preservation in java stream, this answer by Brian Goetz, as well as the javadoc for Stream.reduce(), and the java.util.stream package javadoc, and yet I still can't grasp the following:
Take this piece of code:
public static void main(String... args) {
final String[] alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ".split("");
System.out.println("Alphabet: ".concat(Arrays.toString(alphabet)));
System.out.println(new HashSet<>(Arrays.asList(alphabet))
.parallelStream()
.unordered()
.peek(System.out::println)
.reduce("", (a,b) -> a + b, (a,b) -> a + b));
}
Why is the reduction always* preserving the encounter order?
So far, after several dozen runs, output is the same
First of all unordered does not imply an actual shuffling; all it does it sets a flag for the Stream pipeline - that could later be leveraged.
A shuffle of the source elements could potentially be much more expensive then the operations on the stream pipeline themselves, so the implementation might choose not to do this(like in this case).
At the moment (tested and looked at the sources) of jdk-8 and jdk-9 - reduce does not take that into account. Notice that this could very well change in a future build or release.
Also when you say unordered - you actually mean that you don't care about that order and the stream returning the same result is not a violation of that rule.
For example notice this question/answer that explains that findFirst for example (just another terminal operation) changed to take unordered into consideration in java-9 as opposed to java-8.
To help explain this, I am going to reduce the scope of this string to ABCD.
The parallel stream will divide the string into two pieces: AB and CD. When we go to combine these later, the result of the AB side will be the first argument passed into the function, while the result of the CD side will be the second argument passed into the function. This is regardless of which of the two actually finishes first.
The unordered operator will affect some operations on a stream, such as a limit operation, it does not affect a simple reduce.
TLDR: .reduce() is not always preserving order, its result is based on the stream spliterator characteristics.
Spliterator
The encounter order of the stream depends on stream spliterator (None of the answers mentioned that before).
There are different spliterators based on the source stream. You can get the types of spliterators from the source code of those collections.
HashSet -> HashMap#KeySpliterator = Not ordered
ArrayDeque = Ordered
ArrayList = Ordered
TreeSet -> TreeMap#Spliterator = Ordered and sorted
logicbig.com - Ordering
logicbig.com - Stateful vs Stateless
Additionally you can apply .unordered() intermediate stream operation that specifies following operations in the stream should not rely on ordering.
Stream operations (mostly stateful) that are affected by spliterator and usage of .unordered() method are:
.findFirst()
.limit()
.skip()
.distinct()
Those operations will give us different results based on the order property of the stream and its spliterator.
.peek() method does not take ordering into consideration, if stream is executed in parallel it will always print/receive elements in unordered manner.
.reduce()
Now for the terminal .reduce() method. Intermediate operation .unordered() doesn't have any affect on type of spliterator (as #Eugene mentioned). But important notice, it still stays the same as it is in the source spliterator. If source spliterator is ordered, result of the .reduce() will be ordered, if source was unordered result of .reduce() will be unordered.
You are using new HashSet<>(Arrays.asList(alphabet)) to get the instance of the stream. Its spliterator is unordered. It was just a coincidence that you are getting your result ordered because you are using the single alphabet Strings as elements of the stream and unordered result is actually the same. Now if you would mix that with numbers or mix it with lower case and upper case then this doesn't hold true anymore. For example take following inputs, the first one is subset of the example you posted:
HashSet .reduce() - Unordered
"A","B","C","D","E","F" -> "ABCDEF"
"a","b","c","1","2","3","A","B","C" -> "a1Ab2Bc3C"
"Apple","Orange","Banana","Mango" -> "AppleMangoOrangeBanana"
TreeSet .reduce() - Ordered, Sorted
"A","B","C","D","E","F" -> "ABCDEF"
"a","b","c","1","2","3","A","B","C" -> "123ABCabc"
"Apple","Orange","Banana","Mango" -> "AppleBananaMangoOrange"
ArrayList .reduce() - Ordered
"A","B","C","D","E","F" -> "ABCDEF"
"a","b","c","1","2","3","A","B","C" -> "abc123ABC"
"Apple","Orange","Banana","Mango" -> "AppleOrangeBananaMango"
You see that testing .reduce() operation only with an alphabet source stream can lead to false conclusions.
The answer is .reduce() is not always preserving order, its result is based on the stream spliterator characteristics.
If the input size is too small the library automatically serializes the execution of the maps in the stream, but this automation doesn't and can't take in account how heavy is the map operation. Is there a way to force parallelStream() to actually parallelize CPU heavy maps?
There seems to be a fundamental misunderstanding. The linked Q&A discusses that the stream apparently doesn’t work in parallel, due to the OP not seeing the expected speedup. The conclusion is that there is no benefit in parallel processing if the workload is too small, not that there was an automatic fallback to sequential execution.
It’s actually the opposite. If you request parallel, you get parallel, even if it actually reduces the performance. The implementation does not switch to the potentially more efficient sequential execution in such cases.
So if you are confident that the per-element workload is high enough to justify the use of a parallel execution regardless of the small number of elements, you can simply request a parallel execution.
As can easily demonstrated:
Stream.of(1, 2).parallel()
.peek(x -> System.out.println("processing "+x+" in "+Thread.currentThread()))
.forEach(System.out::println);
On Ideone, it prints
processing 2 in Thread[main,5,main]
2
processing 1 in Thread[ForkJoinPool.commonPool-worker-1,5,main]
1
but the order of messages and details may vary. It may even be possible that in some environments, both task may happen to get executed by the same thread, if it can steel the second task before another thread gets started to pick it up. But of course, if the tasks are expensive enough, this won’t happen. The important point is that the overall workload has been split and enqueued to be potentially picked up by other worker threads.
If execution by a single thread happens in your environment for the simple example above, you may insert simulated workload like this:
Stream.of(1, 2).parallel()
.peek(x -> System.out.println("processing "+x+" in "+Thread.currentThread()))
.map(x -> {
LockSupport.parkNanos("simulated workload", TimeUnit.SECONDS.toNanos(3));
return x;
})
.forEach(System.out::println);
Then, you may also see that the overall execution time will be shorter than “number of elements”דprocessing time per element” if the “processing time per element” is high enough.
Update: the misunderstanding might be cause by Brian Goetz’ misleading statement: “In your case, your input set is simply too small to be decomposed”.
It must be emphasized that this is not a general property of the Stream API, but the Map that has been used. A HashMap has a backing array and the entries are distributed within that array depending on their hash code. It might be the case that splitting the array into n ranges doesn’t lead to a balanced split of the contained element, especially, if there are only two. The implementors of the HashMap’s Spliterator considered searching the array for elements to get a perfectly balanced split to be too expensive, not that splitting two elements was not worth it.
Since the HashMap’s default capacity is 16 and the example had only two elements, we can say that the map was oversized. Simply fixing that would also fix the example:
long start = System.nanoTime();
Map<String, Supplier<String>> input = new HashMap<>(2);
input.put("1", () -> {
System.out.println(Thread.currentThread());
LockSupport.parkNanos("simulated workload", TimeUnit.SECONDS.toNanos(2));
return "a";
});
input.put("2", () -> {
System.out.println(Thread.currentThread());
LockSupport.parkNanos("simulated workload", TimeUnit.SECONDS.toNanos(2));
return "b";
});
Map<String, String> results = input.keySet()
.parallelStream().collect(Collectors.toConcurrentMap(
key -> key,
key -> input.get(key).get()));
System.out.println("Time: " + TimeUnit.NANOSECONDS.toMillis(System.nanoTime()- start));
on my machine, it prints
Thread[main,5,main]
Thread[ForkJoinPool.commonPool-worker-1,5,main]
Time: 2058
The conclusion is that the Stream implementation always tries to use parallel execution, if you request it, regardless of the input size. But it depends on the input’s structure how well the workload can be distributed to the worker threads. Things could be even worse, e.g. if you stream lines from a file.
If you think that the benefit of a balanced splitting is worth the cost of a copying step, you could also use new ArrayList<>(input.keySet()).parallelStream() instead of input.keySet().parallelStream(), as the distribution of elements within ArrayList always allows a perflectly balanced split.
stream.parallel().skip(1)
vs
stream.skip(1).parallel()
This is about Java 8 streams.
Are both of these skipping the 1st line/entry?
The example is something like this:
import java.io.BufferedReader;
import java.io.IOException;
import java.io.StringReader;
import java.util.concurrent.atomic.AtomicLong;
public class Test010 {
public static void main(String[] args) {
String message =
"a,b,c\n1,2,3\n4,5,6\n7,8,9\n1,2,3\n4,5,6\n7,8,9\n1,2,3\n4,5,6\n7,8,9\n1,2,3\n4,5,6\n7,8,9\n1,2,3\n4,5,6\n7,8,9\n1,2,3\n4,5,6\n7,8,9\n1,2,3\n4,5,6\n7,8,9\n1,2,3\n4,5,6\n7,8,9\n1,2,3\n4,5,6\n7,8,9\n1,2,3\n4,5,6\n7,8,9\n1,2,3\n4,5,6\n7,8,9\n1,2,3\n4,5,6\n7,8,9\n1,2,3\n4,5,6\n7,8,9\n1,2,3\n4,5,6\n7,8,9\n1,2,3\n4,5,6\n7,8,9\n1,2,3\n4,5,6\n7,8,9\n1,2,3\n4,5,6\n7,8,9\n1,2,3\n4,5,6\n7,8,9\n1,2,3\n4,5,6\n7,8,9\n1,2,3\n4,5,6\n7,8,9\n1,2,3\n4,5,6\n7,8,9\n1,2,3\n4,5,6\n7,8,9\n1,2,3\n4,5,6\n7,8,9\n1,2,3\n4,5,6\n7,8,9\n1,2,3\n4,5,6\n7,8,9\n1,2,3\n4,5,6\n7,8,9\n1,2,3\n4,5,6\n7,8,9\n1,2,3\n4,5,6\n7,8,9\n1,2,3\n4,5,6\n7,8,9\n1,2,3\n4,5,6\n7,8,9\n1,2,3\n4,5,6\n7,8,9\n1,2,3\n4,5,6\n7,8,9\n1,2,3\n4,5,6\n7,8,9\n1,2,3\n4,5,6\n7,8,9\n1,2,3\n4,5,6\n7,8,9\n1,2,3\n4,5,6\n7,8,9\n1,2,3\n4,5,6\n7,8,9\n1,2,3\n4,5,6\n7,8,9\n";
try(BufferedReader br = new BufferedReader(new StringReader(message))){
AtomicLong cnt = new AtomicLong(1);
br.lines().parallel().skip(1).forEach(
s -> {
System.out.println(cnt.getAndIncrement() + "->" + s);
}
);
}catch (IOException e) {
e.printStackTrace();
}
}
}
Earlier today, I was sometimes getting the header line "a,b,c" in the lambda expression. This was a surprise since I was expecting to have skipped it already. Now I cannot get that example to work i.e. I cannot get the header line in the lambda expression. So I am pretty confused now, maybe something else was influencing that behavior. Of course this is just an example. In the real world the message is being read from a CSV file. The message is the full content of that CSV file.
You actually have two questions in one, the first being whether it makes a difference in writing stream.parallel().skip(1) or stream.skip(1).parallel(), the second being whether either or both will always skip the first element. See also “loaded question”.
The first answer is that it makes no difference, because specifying a .sequential() or .parallel() execution policy affects the entire Stream pipeline, regardless of where you place it in the call chain—of course, unless you specify multiple contradicting policies, in which case the last one wins.
So in either case you are requesting a parallel execution which might affect the outcome of the skip operation, which is subject of the second question.
The answer is not that simple. If the Stream has no defined encounter order in the first place, an arbitrary element might get skipped, which is a consequence of the fact that there is no “first” element, even if there might be an element you encounter first when iterating over the source.
If you have an ordered Stream, skip(1) should skip the first element, but this has been laid down only recently. As discussed in “Stream.skip behavior with unordered terminal operation”, chaining an unordered terminal operation had an effect on the skip operation in earlier implementations and there was some uncertainty of whether this could even be intentional, as visible in “Is this a bug in Files.lines(), or am I misunderstanding something about parallel streams?”, which happens to be close to your code; apparently skipping the first line is a common case.
The final word is that the behavior of earlier JREs is a bug and skip(1) on an ordered stream should skip the first element, even when the stream pipeline is executed in parallel and the terminal operation is unordered. The associated bug report names jdk1.8.0_60 as first fixed version, which I could verify. So if you are using on older implementation, you might experience the Stream skipping different elements when using .parallel() and the unordered .forEach(…) terminal operation. It’s not contradicting if the implementation occasionally skips the expected element, that’s the unpredictability of multi-threading.
So the answer still is that stream.parallel().skip(1) and stream.skip(1).parallel() have the same behavior, even when being used in earlier versions, as both are equally unpredictable when being used with an unordered terminal operation like forEach. They should always skip the first element with ordered Streams and when being used with 1.8.0_60 or newer, they do.
Yes, but skip(n) is slower as n is larger with a parallel stream.
Here's the API note from skip():
While skip() is generally a cheap operation on sequential stream pipelines, it can be quite expensive on ordered parallel pipelines, especially for large values of n, since skip(n) is constrained to skip not just any n elements, but the first n elements in the encounter order. Using an unordered stream source (such as generate(Supplier)) or removing the ordering constraint with BaseStream.unordered() may result in significant speedups of skip() in parallel pipelines, if the semantics of your situation permit. If consistency with encounter order is required, and you are experiencing poor performance or memory utilization with skip() in parallel pipelines, switching to sequential execution with BaseStream.sequential() may improve performance.
So essentially, if you want better performance with skip(), don't use a parellel stream, or use an unordered stream.
As for it seeming to not work with parallel streams, perhaps you're actually seeing that the elements are no longer ordered? For example, an output of this code:
Stream.of("Hello", "How", "Are", "You?")
.parallel()
.skip(1)
.forEach(System.out::println);
Is
Are
You?
How
Ideone Demo
This is perfectly fine because forEach doesn't enforce the encounter order in a parallel stream. If you want it to enforce the encounter order, use a sequential stream (and perhaps use forEachOrdered so that your intent is obvious).
Stream.of("Hello", "How", "Are", "You?")
.skip(1)
.forEachOrdered(System.out::println);
How
Are
You?
I actually tried to answer this question How to skip even lines of a Stream<String> obtained from the Files.lines. So I though this collector wouldn't work well in parallel:
private static Collector<String, ?, List<String>> oddLines() {
int[] counter = {1};
return Collector.of(ArrayList::new,
(l, line) -> {
if (counter[0] % 2 == 1) l.add(line);
counter[0]++;
},
(l1, l2) -> {
l1.addAll(l2);
return l1;
});
}
but it works.
EDIT: It didn't actually work; I got fooled by the fact that my input set was too small to trigger any parallelism; see discussion in comments.
I thought it wouldn't work because of the two following plans of executions comes to my mind.
1. The counter array is shared among all threads.
Thread t1 read the first element of the Stream, so the if condition is satisfied. It adds the first element to its list. Then the execution stops before he has the time to update the array value.
Thread t2, which says started at the 4th element of the stream add it to its list. So we end up with a non-wanted element.
Of course since this collector seems to works, I guess it doesn't work like that. And the updates are not atomic anyway.
2. Each Thread has its own copy of the array
In this case there is no more problems for the update, but nothing prevents me that the thread t2 will not start at the 4th element of the stream. So he doesn't work like that either.
So it seems that it doesn't work like that at all, which brings me to the question... how the collector is used in parallel?
Can someone explain me basically how it works and why my collector works when ran in parallel?
Thank you very much!
Passing a parallel() source stream into your collector is enough to break the logic because your shared state (counter) may be incremented from different tasks. You can verify that, because it is never returning the correct result for any finite stream input:
Stream<String> lines = IntStream.range(1, 20000).mapToObj(i -> i + "");
System.out.println(lines.isParallel());
lines = lines.parallel();
System.out.println(lines.isParallel());
List<String> collected = lines.collect(oddLines());
System.out.println(collected.size());
Note that for infinite streams (e.g. when reading from Files.lines()) you need to generate some significant amount of data in the stream, so it actually forks a task to run some chunks concurrently.
Output for me is:
false
true
12386
Which is clearly wrong.
As #Holger in the comments correctly pointed out, there is a different race that can happen when your collector is specifying CONCURRENT and UNORDERED, in which case they operate on a single shared collection across tasks (ArrayList::new called once per stream), where-as with only parallel() it will run the accumulator on a collection per task and then later combine the result using your defined combiner.
If you'd add the characteristics to the collector, you might run into the following result due to the shared state in a single collection:
false
true
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 73
at java.util.ArrayList.add(ArrayList.java:459)
at de.jungblut.stuff.StreamPallel.lambda$0(StreamPallel.java:18)
at de.jungblut.stuff.StreamPallel$$Lambda$3/1044036744.accept(Unknown Source)
at java.util.stream.ReferencePipeline.lambda$collect$207(ReferencePipeline.java:496)
at java.util.stream.ReferencePipeline$$Lambda$6/2003749087.accept(Unknown Source)
at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
at java.util.stream.IntPipeline$4$1.accept(IntPipeline.java:250)
at java.util.stream.Streams$RangeIntSpliterator.forEachRemaining(Streams.java:110)
at java.util.Spliterator$OfInt.forEachRemaining(Spliterator.java:693)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:512)
at java.util.stream.ForEachOps$ForEachTask.compute(ForEachOps.java:291)
at java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:731)
at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
at java.util.concurrent.ForkJoinTask.doInvoke(ForkJoinTask.java:401)
at java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:734)
at java.util.stream.ForEachOps$ForEachOp.evaluateParallel(ForEachOps.java:160)
at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateParallel(ForEachOps.java:174)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:233)
at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)
at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:496)
at de.jungblut.stuff.StreamPallel.main(StreamPallel.java:32)12386
Actually it's just a coincidence that this collector work. It doesn't work with custom data source. Consider this example:
List<String> list = IntStream.range(0, 10).parallel().mapToObj(String::valueOf)
.collect(oddLines());
System.out.println(list);
This produces always different result. The real cause is just because when BufferedReader.lines() stream is split by at least java.util.Spliterators.IteratorSpliterator.BATCH_UNIT number of lines which is 1024. If you have substantially bigger number of lines, it may fail even with BufferedReader:
String data = IntStream.range(0, 10000).mapToObj(String::valueOf)
.collect(Collectors.joining("\n"));
List<String> list = new BufferedReader(new StringReader(data)).lines().parallel()
.collect(oddLines());
list.stream().mapToInt(Integer::parseInt).filter(x -> x%2 != 0)
.forEach(System.out::println);
Were collector working normally this should not print anything. But sometimes it prints.